Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
…
5 pages
1 file
In tracking-by-detection paradigm for multi-target tracking, target association is modeled as an optimization problem that is usually solved through network flow formulation. In this paper, we proposed combinatorial optimization formulation and used a bipartite graph matching for associating the targets in the consecutive frames. Usually, the target of interest is represented in a bounding box and track the whole box as a single entity. However, in the case of humans, the body goes through complex articulation and occlusion that severely deteriorate the tracking performance. To partially tackle the problem of occlusion, we argue that tracking the rigid body organ could lead to better tracking performance compared to the whole body tracking. Based on this assumption, we generated the target hypothesis of only the spatial locations of person’s heads in every frame. After the localization of head location, a constant velocity motion model is used for the temporal evolution of the targe...
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015
In this paper we show that multiple object tracking (MOT) can be formulated in a framework, where the detection and data-association are performed simultaneously. Our method allows us to overcome the confinements of data association based MOT approaches; where the performance is dependent on the object detection results provided at input level. At the core of our method lies structured learning which learns a model for each target and infers the best location of all targets simultaneously in a video clip. The inference of our structured learning is done through a new Target Identity-aware Network Flow (TINF), where each node in the network encodes the probability of each target identity belonging to that node. The proposed Lagrangian relaxation optimization finds the high quality solution to the network. During optimization a soft spatial constraint is enforced between the nodes of the graph which helps reducing the ambiguity caused by nearby targets with similar appearance in crowded scenarios. We show that automatically detecting and tracking targets in a single framework can help resolve the ambiguities due to frequent occlusion and heavy articulation of targets. Our experiments involve challenging yet distinct datasets and show that our method can achieve results better than the state-of-art.
Data association is an essential component of the human detection and tracking system. The majority of the existing methods, such as Bi-partite matching and GMCP methods are incorporated the limited-temporal-locality of the sequence into data association problem. GMCP tracker is considered as an important complete representation of the tracking problem, where all pair wise relationships between the detections in temporal span of a video is considered and makes the input to the data association as a complete Bi-partite graph. In Bi-partite graph a track of a person will form a clique (a subgraph in which all the nodes are connected to each other). A cost is assigned to each clique and it maximizes the score function, which is selected as the best clique (track), but it is sub-optimal. GMCP tracker does not follow the joint optimization for all the tracks simultaneously and finds the tracks one by one which makes difficulties caused by cluttered background, and crowded scenes to detect and tracking Tracking-by-detection methods are used to track multiple targets with unified handling of complex scenarios, where current detection responses are linked to the previous trajectories. By adding the standard Hungarian algorithm, dummy nodes to each trajectory to allow nodes to temporally disappear and solve the data association implicitly in a global manner even though it is formulated between two consecutive frames. If a trajectory fails to find its matching detection, it is linked to its corresponding dummy nodes until its emergence of matching detection. The source nodes are also incorporated into the account of new targets. The dummy nodes tend to accumulate in fake or disappeared trajectories while they occasionally appear in real trajectories and improve detection inevitable failures, which include the miss detection, the false detection and the occlusion, where an object is partially or fully invisible because of the limited camera view. Extended hybrid Hungarian algorithm is relatively better when compared with GMCP and Hybrid Hungarian algorithm in accuracy. Experiments show that the proposed method makes significant improvement in tracking and detection of different length of videos, specifically with short length videos.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019
In this work, we propose a tracker that differs from most existing multi-target trackers in two major ways. Firstly, our tracker does not rely on a pre-trained object detector to get the initial object hypotheses. Secondly, our tracker's final output is the fine contours of the targets rather than traditional bounding boxes. Therefore, our tracker simultaneously solves three main problems: detection, data association and segmentation. This is especially important because the output of each of those three problems are highly correlated and the solution of one can greatly help improve the others. The proposed algorithm consists of two main components: structured learning and Lagrange dual decomposition. Our structured learning based tracker learns a model for each target and infers the best locations of all targets simultaneously in a video clip. The inference of our structured learning is achieved through a new Target Identity-aware Network Flow (TINF), where each node in the network encodes the probability of each target identity belonging to that node. The probabilities are obtained by training target specific models using a global structured learning technique. This is followed by proposed Lagrangian relaxation optimization to find the high quality solution to the network. This forms the first component of our tracker. The second component is Lagrange dual decomposition, which combines the structured learning tracker with a segmentation algorithm. For segmentation, multi-label Conditional Random Field (CRF) is applied to a superpixel based spatio-temporal graph in a segment of video, in order to assign background or target labels to every superpixel. We show how the multi-label CRF is combined with the structured learning tracker through our dual decomposition formulation. This leads to more accurate segmentation results and also helps better resolve typical difficulties in multiple target tracking, such as occlusion handling, ID-switch and track drifting. The experiments on diverse and challenging sequences show that our method achieves superior results compared to competitive approaches for detection, multiple target tracking as well as segmentation.
ArXiv, 2020
For multi-target tracking, target representation plays a crucial rule in performance. State-of-the-art approaches rely on the deep learning-based visual representation that gives an optimal performance at the cost of high computational complexity. In this paper, we come up with a simple yet effective target representation for human tracking. Our inspiration comes from the fact that the human body goes through severe deformation and inter/intra occlusion over the passage of time. So, instead of tracking the whole body part, a relative rigid organ tracking is selected for tracking the human over an extended period of time. Hence, we followed the tracking-by-detection paradigm and generated the target hypothesis of only the spatial locations of heads in every frame. After the localization of head location, a Kalman filter with a constant velocity motion model is instantiated for each target that follows the temporal evolution of the targets in the scene. For associating the targets in ...
International Journal of Computational Vision and Robotics, 2018
In this paper, we mainly describe how to formulate a network flows optimally for multi-object tracking. The network flows can be used to construct trajectories of objects (between frames) to achieve multi-object tracking. The most important issue to establish such network is to design nodes and edges in the network. In this work, we propose a method to fuse the object detector with object trackers in order to efficiently design the nodes and edges. The object trackers can give the information on robust classifiers or features of objects through training, which helps to design the edges. This approach is significant when a detector fails due to occluded objects. If an object failed to be detected, the object tracker will be substituted to the object detector. In this way, we employ the object tracker and the object detector to formulate a sophisticated network depending on the condition. The proposed approach enables to eliminate the clutters and thus overcome the heavy occlusion situations. We evaluated performance of the proposed method through several experiments using real-world video sequences. The experimental results demonstrated good performance of the proposed approach compared to state-of-the-art methods.
2012
We present a novel framework for multiple object tracking in which the problems of object detection and data association are expressed by a single objective function. The framework follows the Lagrange dual decomposition strategy, taking advantage of the often complementary nature of the two subproblems. Our coupling formulation avoids the problem of error propagation from which traditional "detection-tracking approaches" to multiple object tracking suffer. We also eschew common heuristics such as "nonmaximum suppression" of hypotheses by modeling the joint image likelihood as opposed to applying independent likelihood assumptions. Our coupling algorithm is guaranteed to converge and can handle partial or even complete occlusions. Furthermore, our method does not have any severe scalability issues but can process hundreds of frames at the same time. Our experiments involve challenging, notably distinct datasets and demonstrate that our method can achieve results comparable to those of state-of-art approaches, even without a heavily trained object detector.
Lecture Notes in Computer Science, 2015
Multi-target tracking of pedestrians is a challenging task due to uncertainty about targets, caused mainly by similarity between pedestrians, occlusion over a relatively long time and a cluttered background. A usual scheme for tackling multi-target tracking is to divide it into two sub-problems: data association and trajectory estimation. A reasonable approach is based on joint optimization of a discrete model for data association and a continuous model for trajectory estimation in a Markov Random Field framework. Nonetheless, usual solutions of the data association problem are based only on location information, while the visual information in the images is ignored. Visual features can be useful for associating detections with true targets more reliably, because the targets usually have discriminative features. In this work, we propose a combination of position and visual feature information in a discrete data association model. Moreover, we propose the use of group Lasso regularization in order to improve the identification of particular pedestrians, given that the discriminative regions are associated with particular visual blocks in the image. We find promising results for our approach in terms of precision and robustness when compared with a state-of-the-art method in standard datasets for multi-target pedestrian tracking.
Lecture Notes in Computer Science, 2004
This paper describes a real-time system for multi-target tracking and classification in image sequences from a single stationary camera. Several targets can be tracked simultaneously in spite of splits and merges amongst the foreground objects and presence of clutter in the segmentation results. In results we show tracking of upto 17 targets simultaneously. The algorithm combines Kalman filter-based motion and shape tracking with an efficient pattern matching algorithm. The latter facilitates the use of a dynamic programming strategy to efficiently solve the data association problem in presence of multiple splits and merges. The system is fully automatic and requires no manual input of any kind for initialization of tracking. The initialization for tracking is done using attributed graphs. The algorithm gives stable and noise free track initialization. The image based tracking results are used as inputs to a Bayesian network based classifier to classify the targets into different categories. After classification a simple 3D model for each class is used along with camera calibration to obtain 3D tracking results for the targets. We present results on a large number of real world image sequences, and accurate 3D tracking results compared with the readings from the speedometer of the vehicle. The complete tracking system including segmentation of moving targets works at about 25Hz for 352×288 resolution color images on a 2.8 GHz pentium-4 desktop.
Pattern Recognition, 2015
The multi-target tracking problem is challenging when there exist occlusions, tracking failures of the detector and severe interferences between detections. In this paper, we propose a novel detection based tracking method that links detections into tracklets and further forms long trajectories. Unlike many previous hierarchical frameworks which split the data association into two separate optimization problems (linking detections locally and linking tracklets globally), we introduce a unified algorithm that can automatically relearn the trajectory models from the local and global information for finding the joint optimal assignment. In each temporal window, the trajectory models are initialized by the local information to link those easy-to-connect detections into a set of tracklets. Then the trajectory models are updated by the reliable tracklets and reused to link separated tracklets into long trajectories. We iteratively update the trajectory models by more information from more frames until the result converges. The iterative process gradually improves the accuracy of the trajectory models, which in turn improves the target ID inferences for all detections by the MRF model. Experiment results revealed that our proposed method achieved state-of-the-art multi-target tracking performance.
1998
A combined 2D, 3D approach is presented that allows for robust tracking of moving bodies in a given environment as observed via a single, uncalibrated video camera. Lowlevel features are often insufficient for detection, segmentation, and tracking of non-rigid moving objects. Therefore, an improved mechanism is proposed that combines lowlevel (image processing) and mid-level (recursive trajectory estimation) information obtained during the tracking process. The resulting system can segment and maintain the tracking of moving objects before, during, and after occlusion. At each frame, the system also extracts a stabilized coordinate frame of the moving objects. This stabilized frame can be used as input to motion recognition modules. The approach enables robust tracking without constraining the system to know the shape of the objects being tracked beforehand; although, some assumptions are made about the characteristics of the shape of the objects, and how they evolve with time. Experiments in tracking moving people are described.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
International Journal of Electrical and Computer Engineering (IJECE), 2024
Computer Vision – ECCV 2012, 2012
2007 IEEE 11th International Conference on Computer Vision, 2007
Multimedia Tools and Applications, 2014
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2018
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015
Lecture Notes in Computer Science, 2006
Proceedings of the International Conference on Computer Vision Theory and Applications, 2013
International Journal of Modeling and Optimization, 2019
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015
Mathematical and Computer Modelling, 2006
IEEE transactions on intelligent vehicles, 2019