In this paper we introduce a system to track pedestrians using a combined input from RGB and thermal cameras. Two major contributions are presented here. First is the novel probabilistic model of the scene background where each pixel is represented as a multi-modal distribution with the changing number of modalities for both color and thermal input. We demonstrate how to eliminate the influence of shadows with this type of fusion. Second, based on our background model we introduce a pedestrian tracker designed as a particle filter. We further develop a number of informed reversible transformations to sample the model probability space in order to maximize our model posterior probability. The novelty of our tracking approach also comes from a way we formulate observation likelihoods to account for 3D locations of the bodies with respect to the camera and occlusions by other tracked human bodies as well as static objects. The results of tracking on color and thermal sequences demonstrate that our algorithm is robust to illumination noise and performs well in the outdoor environments.
Knowing the visual attention field of a monitored subjectis of great value for many applications including surveil- lance and marketing. This paper proposes first to track peo- ple’s bodies, and then estimates visual attention field for each human using head pose information. The proposed head pose technique aims at estimating the yaw angle only. The method is shown to operate on monocular color cam- era sequences and is further refined with the data from a thermal sensor. In typical monocular tracking sequences the resolution of the head is very low and parts of the head are occluded with the face often invisible to the camera. We propose a method of combining a skin color detector with the direction of motion in a probabilistic way. We show how head profile obtained from the thermal sequence can be used to further improve the result.
This chapter presents a novel system for pedestrian surveillance, including tasks such as detection, tracking, classi?cation and possibly activity analysis. The system we propose ?rst builds a background model as a multimodal distribution of colors and temperatures. It then constructs a particle ?lter scheme that makes a number of informed reversible transformations to sample the model probability space in order to maximize posterior probability of the scene model. Observation likelihoods of moving objects account their 3D locations with respect to the camera and occlusions by other tracked objects as well as static obstacles. After capturing the coordinates and dimensions of moving objects we apply a classi?er based on periodic gait analysis. To differentiate humans from other moving objects such as cars, we detect a symmetrical double helical pattern in human gait. Such pattern can then be analyzed using the Frieze Group theory. The results of tracking in color and thermal sequences demonstrate that our algorithm is robust to illumination change and performs well in the outdoor environments.
In this thesis we present a system for automatic human tracking and activity recognition from video sequences. The problem of automated analysis of visual information in order to derive descriptors of high level human activities has intrigued computer vision community for decades and is considered to be largely unsolved. A part of this interest is derived from the vast range of applications in which such a solution may be useful. We attempt to find efficient formulations of these tasks as applied to the extracting customer behavior information in a retail marketing context. Based on these formulations, we present a system that visually tracks customers in a retail store and performs a number of activity analysis tasks based on the output from the tracker.
In tracking we introduce new techniques for pedestrian detection, initialization of the body model and a formulation of the temporal tracking as a global trans-dimensional optimization problem. Initial human detection is addressed by a novel method for head detection, which incorporates the knowledge of the camera projection model.The initialization of the human body model is addressed by newly developed shape and appearance descriptors. Temporal tracking of customer trajectories is performed by employing a human body tracking system designed as a Bayesian jump-diffusion filter. This approach demonstrates the ability to overcome model dimensionality ambiguities as people are leaving and entering the scene.
Following the tracking, we developed a two-stage group activity formulation based upon the ideas from swarming research. For modelling purposes, all moving actors in the scene are viewed here as simplistic agents in the swarm. This allows to effectively define a set of inter-agent interactions, which combine to derive a distance metric used in further swarm clustering. This way, in the first stage the shoppers that belong to the same group are identified by deterministically clustering bodies to detect short term events and in the second stage events are post-processed to form clusters of group activities with fuzzy memberships.
Quantitative analysis of the tracking subsystem shows an improvement over the state of the art methods, if used under similar conditions. Finally, based on the output from the tracker, the activity recognition procedure achieves over 80% correct shopper group detection, as validated by the human generated ground truth results.
Keywords: Human Tracking, Human Activity Modeling and Recognition, Swarming, Background Subtraction, Camera Calibration
We present a generalized extensible framework for automated recognition of swarming activities in video sequences. The trajectory of each individual is produced by the visual tracking sub-system and is further analyzed to detect certain types of high-level grouping behavior. We utilize recent findings in swarming behavior analysis to formulate a problem in terms of the specific distance function that we subsequently apply as part of the two-stage agglomerative clustering method to create a set of swarming events followed by a set of swarming activities. In this paper we present results for one particular type of swarming: shopper grouping. As part of this work the events detected in a relatively short time interval are furtherintegrated into activities, the manifestation of prolonged high-level swarming behavior. The results demonstrate the ability of our method to detect such activities in congested surveillance videos. In particular in three hours of indoor retail store video, our method has correctly identified over 85% of valid "`shopper-groups"' with a very low level of false positives, validated against human coded ground truth.
The paper presents a fusion-tracker and pedestrian classifier for color and thermal cameras. The tracker builds a background model as a multi-modal distribution of colors and temperatures. It is constructed as a particle filter that makes a number of informed reversible transformations to sample the model probability space in order to maximize posterior probability of the scene model. Observation likelihoods of moving objects account their 3D locations with respect to the camera and occlusions by other tracked objects as well as static obstacles. After capturing the coordinates and dimensions of moving objects we apply a pedestrian classifier based on periodic gait analysis. To separate humans from other moving objects, such as cars, we detect,in human gait, a symmetrical double helical pattern, that can then be analyzed using the Frieze Group theory. The results of tracking on color and thermal sequences demonstrate that our algorithm is robust to illumination noise and performs well in the outdoor environments.
In this paper we introduce a system to track pedestrians using a combined input from RGB and thermal cameras. Two major contributions are presented here. First is the novel model of the scene background where each pixel is represented as a multi-modal distribution with the changing number of modalities for both color and thermal input. We demonstrate how to eliminate the influence of shadows with this type of fusion. Second, based on our background model we introduce a pedestrian tracker designed as a particle filter. We further develop a number of informed reversible transformations to sample the model probability space in order to maximize our model posterior probability. The novelty of our tracking approach also comes from a way we formulate observation likelihoods to account for 3D locations of the bodies with respect to the camera and occlusions by other tracked human bodies as well as static objects. The results of tracking on color and thermal sequences demonstrate that our algorithm is robust to illumination noise and performs well in the outdoor environments.
We addressed the problem of automatically differentiating photographs of real scenes from photographs of paintings. We found that photographs differ from paintings in their color, edge, and texture properties. Based on these features, we trained and tested a classifier on a database of 6000 paintings and 6000 photographs. Using single features results in 70 -- 80% correct discrimination performance, whereas a classifier using multiple features exceeds 90% correct discrimination.
This report describes a visual approach to road detection. A field of dominant trend lines is established over the image, by convolution with Gabor filters; the trend lines are then grouped into "visual segments", providing evidence for macroscopic linear features. Sets of these macroscopic features are evaluated against a list of heuristic criteria, to determine their likelihood of representing a road in the image. This procedure was implemented as software, for use by the Indy Robot Racing Team, a participant in the DARPA Grand Challenge 2005. It is capable of road detection at rates appropriate for driving at moderate speeds on dirt roads (ca. 5 Hz on a 3.4 GHz processor).
In this paper we present a system that tracks customers in a store and performs a number of activity analysis tasks based on the output from the tracker. We obtain the trajectories by employing a human body tracking system designed as a Bayesian jump-diffusion filter. The customer travel trajectories on the floor map are extracted and post processed to remove noise. The shoppers that belong to the same group are identified by clustering their trajectories. The clustering is based on a distance metric that incorporates both time and location information. Our system also identifies shopper groups based on the proximity metric also presented in this paper. Further, store employees are detected as a separate group, based on a 2D color histogram analysis. Finally, dwelling customers, i.e the customers stopping to browse for products are detected by analyzing the behavior of the recorded trajectories.
We present the first stages of a system that tracks customers in a store with the goal of activity analysis. The ultimate goal is to provide a tool for making various marketing decisions. In this paper, we focus on the low level processing methods for determining the position of the customers in the store. We present a method to extract the low-level head coordinates to be further used for tracking customers in the crowded situations. The algorithm relies on the knowledge of image vanishing points that are used to compute a vanishing point projection histogram as well as to extract camera calibration parameters. Vanishing points and scale factor can be computed with the help of a simple interactive interface that we also present in this paper.
We present the first stages of a system that tracks customers in a store with the goal of activity analysis. The ultimate goal is to provide a tool for making various marketing decisions. In this paper, we focus on the low level processing methods for determining the position of the customers in the store. We present a method to extract the low-level head coordinates to be further used for tracking customers in the crowded situations. The algorithm relies on the knowledge of image vanishing points that are used to compute a vanishing point projection histogram as well as to extract camera calibration parameters. Vanishing points and scale factor can be computed with the help of a simple interactive interface that we also present in this paper.
This paper describes a pattern recognition approach to determine readability of text labels in augmented reality systems. In many augmented reality applications, one of the ways in which information is presented to the user is to place a text label over the area of interest. However, if this information is placed over very busy and textured backgrounds, this can affect the readability of the text. The goal of this work was to identify methods of quantitatively describing conditions under which such text would be readable or unreadable. We used texture properties and other visual features to determine if a text placed on a particular background would be readable or not. Based on these features, a supervised classifier was built that was trained using data collected from human subjects’ judgement of text readability. Using a rather small training set of about 400 human evaluations over 50 heterogeneous textures the system is able to achieve a correct classification rate of over 85%.
This paper outlines the theoretical background and presents a new approach to human body tracking with monocular static camera. A novel view-based representation is introduced at the feature extraction stage. We show that ambiguities in correspondence, such as the ones that occur as the result of occlusion, can be resolved by using this approach. In particular, we store color information for each object in a vector of views, where the number of elements is determined online, using unsupervised clustering followed by the cluster validity assessment. Based on this representation a tracking system was developed. The prelimiary results presented show the discriminative potential of the proposed system.
This paper concerns the application of pattern classcation techniques to the domain of augmented reality. In many augmented reality applications, one of the ways in which information is presented to the user is to place a text label over the area of interest. However, if this information is placed over very busy and textured backgrounds, this can affect the readability of the text. The goal of this work was to identify methods of quantitatively describing conditions under which such text would be readable or unreadable. We used texture properties and other visual features to predict if a text placed on a particular background would be readable or not. Based on these features, a supervised classer was built that was trained using data collected from human subjects judgement of text readability. Using a rather small training set of about 400 human evaluations over 50 heterogeneous textures the system is able to achieve a correct classcation rate of over 85%.
We compare the properties of intensity and color edges in photographs of real scenes and paintings. We demonstrate that paintings contain significantly more color-only edges, whereas the amount of intensity-only edges does not differ significantly between the two classes. In addition, color edge strength is significantly higher for paintings. The differences between paintings and photographs are more accentuated when high-resolution, lossless-compressed images are used. These distinguishing features can be used for the automatic differentiation between the two classes of images.
Automatic classification of an image as a photograph of a real-scene or as a painting is potentially useful for image retrieval and website filtering applications. The main con-tribution of this paper is the proposition of several features derived from the color, edge, and gray-scale-texture infor-mation of the image that effectively discriminate paintings from photographs. For example, we found that paintings contain significantly more pure-color edges, and that cer-tain gray-scale-texture measurements (mean and variance of Gabor filters) are larger for photographs. Using a large set of images (12000) collected from different web sites, the proposed features exhibit very promising classification performance (over 90%). A comparative analysis of the automatic classification results and psychophysical data is re-ported, suggesting that the proposed automatic classifer es-timates the perceptual photorealism of a given picture.
We addressed the problem of automatically differentiating photographs of real scenes from photographs of paintings. We found that photographs differ from paintings in their color, edge, and texture properties. Based on these features, we trained a classifier to separate a database of 12,000 images downloaded from the web into photographs and paintings. Single features result in 70-83% performance, whereas with a neural net classifier correct rates were around 92%.
We present a computational method for automatically distinguishing photographs from paintings. Our approach is based on two types of features: image color and image shape characteristics. The use of color in this classification task is based on the observation that while intensity edges tend to coincide with color edges in photographs, there are significantly more color edges than intensity edges in paintings. The second difference between photographs and pictures is at the level of image shape detail, as reflected in the intensity edge structure and texture properties. Intuitively, the bio-mechanical characteristics of the fine hand movements involved in producing the small-scale shape details result in paintings having quite different statistical characteristics than photographs of natural scenes. These characteristics were quantified using a wavelet transforms, and the classification proper was obtained by training a neural network. The results indicate that whereas each criterion in isolation is a rather weak classifier (65-70% correct), a conjunction of several weak criteria yields good classification performance.
3 MB
4 MB
3 MB
6 MB
6 MB
4 MB
4 MB
17 MB
26 MB
3 MB