Papers by Esther Koller-Meier
British Machine Vision Conference, 2008
We propose a novel multi-class object detector, that optimizes the de- tection costs while retain... more We propose a novel multi-class object detector, that optimizes the de- tection costs while retaining a desired detection rate. The detector uses a cascade that unites the handling of similar object classes while separating off classes at appropriate levels of the cascade. No prior knowledge about the relationship between classes is needed as the classifier structure is au- tomatically determined
This paper describes a self-learning prototype system for the real-time detection of unusual moti... more This paper describes a self-learning prototype system for the real-time detection of unusual motion patterns. The proposed surveillance system uses a three-step approach consisting of a tracking, a learning and a recognition part. In the rst step, an arbitrary, changing number of objects are tracked with an extension of the Condensation algorithm. From the history of the tracked object states,
Lecture Notes in Computer Science, 2005
This paper presents a new system for recognition, tracking and pose estimation of people in video... more This paper presents a new system for recognition, tracking and pose estimation of people in video sequences. It is based on the wavelet transform from the upper body part and uses Support Vector Machines (SVM) for classification. Recognition is carried out hierarchically by first recognizing people and then individual characters. The characteristic features that best discriminate one person from another are learned automatically. Tracking is solved via a particle filter that utilizes the SVM output and a first order kinematic model to obtain a robust scheme that successfully handles occlusion, different poses and camera zooms. For pose estimation a collection of SVM classifiers is evaluated to detect specific, learned poses.

This paper presents a multi-view tracker, meant to operate in smart rooms that are equipped with ... more This paper presents a multi-view tracker, meant to operate in smart rooms that are equipped with multiple cameras. The cameras are assumed to be calibrated 3 . In particular, we demonstrate a virtual classroom application, where the system automatically selects the camera with the 'best' view on the face of a person moving in the room. Realtime object tracking, which is needed to achieve this, is implemented by means of color-based particle filtering. The use of multiple model histograms for the target (human head) results robust tracking, even when the view on the target changes considerably like from the front to the back. Information is shared between the cameras, which adds robustness to the system. Once one camera has lost the target, it can be reinitialized with the help of the epipolar constraints suggested by the others. Experiments in our research environment corroborate the effectiveness of the approach.

We present a system for the robust real-time tracking of human faces. The system utilizes multipl... more We present a system for the robust real-time tracking of human faces. The system utilizes multiple cameras and is built with low-cost standard equipment. A 3D tracking module that uses the information from the multiple cameras is the core of the presented approach. Endowed with a virtual zooming utility, the system provides a close-up view of a face regardless of the person's position and orientation. This best matching front view is found by comparison of color histograms using the Bhattacharyya coefficient. The tracking initialization and learning of the target histograms are done automatically from training data. Results on image sequences of multiple persons demonstrate the versatility of the approach. Telepresence, teleteaching or face recognition systems are examples of possible applications. The system is scalable in terms of the number of computers and cameras, but one computer/laptop with three low-cost FireWire cameras is already sufficient.
Robust real-time tracking of non-rigid objects is a challenging task. Particle filtering has been... more Robust real-time tracking of non-rigid objects is a challenging task. Particle filtering has been proven very successful for non-linear and non-Gaussian estimation problems. However, for the tracking of non-rigid objects, the selection of reliable image features is also essential.
Lecture Notes in Computer Science, 2002
Color can provide an efficient visual feature for tracking nonrigid objects in real-time. However... more Color can provide an efficient visual feature for tracking nonrigid objects in real-time. However, the color of an object can vary over time dependent on the illumination, the visual angle and the camera parameters. To handle these appearance changes a color-based target model must be adapted during temporally stable image observations. This paper presents the integration of color distributions into particle filtering and shows how these distributions can be adapted over time. A particle filter tracks several hypotheses simultaneously and weights them according to their similarity to the target model. As similarity measure between two color distributions the popular Bhattacharyya coefficient is applied. In order to update the target model to slowly varying image conditions, frames where the object is occluded or too noisy must be discarded.

Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings., 2004
Solving the tracking of an articulated structure in a reasonable time is a complex task mainly du... more Solving the tracking of an articulated structure in a reasonable time is a complex task mainly due to the high dimensionality of the problem. A new optimization method, called Stochastic Meta-Descent (SMD), based on gradient descent with adaptive and parameter specific step sizes was introduced recently [1] to solve this challenging problem. While the local optimization works very well, reaching the global optimum is not guaranteed. We therefore propose a novel algorithm which combines the SMD optimization with a particle filter to form 'smart particles'. After propagating the particles, SMD is performed and the resulting new particle set is included such that the original Bayesian distribution is not altered. The resulting 'smart particle filter' (SPF) tracks high dimensional articulated structures with far fewer samples than previous methods. Additionally, it can handle multiple hypotheses, clutter and occlusion which pure optimization approaches have problems. The performance of the SMD particle filter is illustrated in challenging 3D hand tracking sequences demonstrating a better robustness and accuracy than those of a single SMD optimization or an annealed particle filter.

Procedings of the British Machine Vision Conference 2008, 2008
We propose a novel multi-class object detector, that optimizes the detection costs while retainin... more We propose a novel multi-class object detector, that optimizes the detection costs while retaining a desired detection rate. The detector uses a cascade that unites the handling of similar object classes while separating off classes at appropriate levels of the cascade. No prior knowledge about the relationship between classes is needed as the classifier structure is automatically determined during the training phase. The detection nodes in the cascade use Haar wavelet features and Gentle AdaBoost, however the approach is not dependent on the specific features used and can easily be extended to other cases. Experiments are presented for several numbers of object classes and the approach is compared to other classifying schemes. The results demonstrate a large efficiency gain that is particularly prominent for a greater number of classes. Also the complexity of the training scales well with the number of classes.
2004 Conference on Computer Vision and Pattern Recognition Workshop, 2004
Recently, an optimization approach for fast visual tracking of articulated structures based on St... more Recently, an optimization approach for fast visual tracking of articulated structures based on Stochastic Meta-Descent (SMD) has been presented. SMD is a gradient descent with local step size adaptation that combines rapid convergence with excellent scalability. Stochastic sampling helps to avoid local minima in the optimization process. We have extended the SMD algorithm with new features for fast and accurate tracking by adapting the different step sizes between as well as within video frames and by introducing a robust likelihood function which incorporates both depths and surface orientations. A realistic deformable hand model reinforces the accuracy of our tracker. The advantages of the resulting tracker over state-of-the-art methods are corroborated through experiments.

18th International Conference on Pattern Recognition (ICPR'06), 2006
At present, the object categorisation literature is still dominated by the use of individual clas... more At present, the object categorisation literature is still dominated by the use of individual class detectors. Detecting multiple classes then implies the subsequent application of multiple such detectors, but such an approach is not scalable towards high numbers of classes. This paper presents an alternative strategy, where multiple classes are detected in a combined way. This includes a decision tree approach, where ternary rather than binary nodes are used, and where nodes share features. This yields an efficient scheme, which scales much better. The paper proposes a strategy where the object samples are first distinguished from the background. Then, in a second stage, the actual object class membership of each sample is determined. The focus of the paper lies entirely on the first stage, i.e. the distinction from background. The tree approach for this step is compared against two alternative strategies, one of them being the popular cascade approach. While classification accuracy tends to be better or comparable, the speed of the proposed method is systematically better. This advantage gets more outspoken as the number of object classes increases. easy exemplars easy exemplars difficult exemplars
Video-Based Surveillance Systems, 2002
ABSTRACT

We present an algorithm for multi-person tracking-by-detection in a particle filtering framework.... more We present an algorithm for multi-person tracking-by-detection in a particle filtering framework. To address the unreliability of current state-of-the-art object detectors, our algorithm tightly couples object detection, classification, and tracking components. Instead of relying only on the final, sparse output from a detector, we additionally employ its continuous intermediate output to impart our approach with more flexibility to handle difficult situations. The resulting algorithm robustly tracks a variable number of dynamically moving persons in complex scenes with occlusions. The approach does not rely on background modeling and is based only on 2D information from a single camera, not requiring any camera or ground plane calibration. We evaluate the algorithm on the PETS¿09 tracking dataset and discuss the importance of the different algorithm components to robustly handle difficult situations.

The main challenge of tracking articulated structures like hands is their large number of degrees... more The main challenge of tracking articulated structures like hands is their large number of degrees of freedom (DOFs). A realistic 3D model of the human hand has at least 26 DOFs. The arsenal of tracking approaches that can track such structures fast and reliably is still very small. This paper proposes a tracker based on 'Stochastic Meta-Descent' (SMD) for optimizations in such high-dimensional state spaces. This new algorithm is based on a gradient descent approach with adaptive and parameter-specific step sizes. The SMD tracker facilitates the integration of constraints, and combined with a stochastic sampling technique, can get out of spurious local minima. Furthermore, the integration of a deformable hand model based on linear blend skinning and anthropometrical measurements reinforce the robustness of our tracker. Experiments show the efficiency of the SMD algorithm in comparison with common optimization methods.
2008 IEEE Workshop on Motion and video Computing, 2008
This paper describes a novel tracking performance evaluation metric based on the successful detec... more This paper describes a novel tracking performance evaluation metric based on the successful detection of events, rather than low-level image processing criteria. A general event metric is defined to measure whether the agents and actions in the scene given by the ground truth were correctly tracked by comparing two event lists using dynamic programming. This metric is suitable to evaluate and compare different tracking approaches where the underlying algorithm may be completely different.
2009 Workshop on Applications of Computer Vision (WACV), 2009
This paper presents a system to manipulate 3D objects or navigate through 3D models by detecting ... more This paper presents a system to manipulate 3D objects or navigate through 3D models by detecting the gestures and the movements of the hands of a user in front of a camera mounted on top of a screen. This paper more particularly introduces an improved skin color segmentation algorithm which combines an online and an offline model; and a Haarlet-based hand gesture recognition system, where the Haarlets are trained based on Average Neighborhood Margin Maximization (ANMM). The result is a real-time marker-less interaction system which is applied to two applications, one for manipulating 3D objects, and the other for navigating through a 3D model.

Lecture Notes in Computer Science, 2007
We present a method to simultaneously estimate 3d body pose and action categories from monocular ... more We present a method to simultaneously estimate 3d body pose and action categories from monocular video sequences. Our approach learns a lowdimensional embedding of the pose manifolds using Locally Linear Embedding (LLE), as well as the statistical relationship between body poses and their image appearance. In addition, the dynamics in these pose manifolds are modelled. Sparse kernel regressors capture the nonlinearities of these mappings efficiently. Body poses are inferred by a recursive Bayesian sampling algorithm with an activity-switching mechanism based on learned transfer functions. Using a rough foreground segmentation, we compare Binary PCA and distance transforms to encode the appearance. As a postprocessing step, the globally optimal trajectory through the entire sequence is estimated, yielding a single pose estimate per frame that is consistent throughout the sequence. We evaluate the algorithm on challenging sequences with subjects that are alternating between running and walking movements. Our experiments show how the dynamical model helps to track through poorly segmented low-resolution image sequences where tracking otherwise fails, while at the same time reliably classifying the activity type.
Uploads
Papers by Esther Koller-Meier