Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2015, International Conference on Information Fusion
In recent studies of multi-target tracking, highorder association and its corresponding high-order affinity (or similarity) is often preferred over pairwise comparisons to capture high-order discriminative information. A naturally raised challenge is to calculate affinity (or similarity) among more than two target candidates. When target appearance is represented by histograms, such as the popular SIFT and HOG descriptors, pairwise matching measurements, such as Histogram Intersection (HI) etc.are often combined to fit the high-order request in an ad hoc way. However, such combinations may be ineffective and inefficient. In this paper, we address the pairwise matching issue by proposing a novel multi-histogram similarity named Multiway Histogram Intersection (MHI). MHI naturally extends HI by summing over the "min" value of all histograms in each bin. MHI applies to any number of histograms, fits the request of multitarget tracking better and requires less time than previously used affinities. To demonstrate its superiority, we integrate MHI into a recently proposed rank-1-tensor-approximation multi-tracking framework and apply it to vehicle tracking in wide aerial video surveillance. The advantage of using MHI is clearly supported by the experimental results against six common approaches on two public benchmark datasets.
International Journal of Computer Vision
The multi-dimensional assignment problem is universal for data association analysis such as data association-based visual multi-object tracking and multi-graph matching. In this paper, multi-dimensional assignment is formulated as a rank-1 tensor approximation problem. A dual L 1-normalized context/hyper-context aware tensor power iteration optimization method is proposed. The method is applied to multi-object tracking and multi-graph matching. In the optimization method, tensor power iteration with the dual unit norm enables the capture of information across multiple sample sets. Interactions between sample associations are modeled as contexts or hyper-contexts which are combined with the global affinity into a unified optimization. The optimization is flexible for accommodating various types of contextual models. In multi-object tracking, the global affinity is defined according to the appearance similarity between objects detected in different frames. Interactions between objects are modeled as motion contexts which are encoded into the global association optimization. The tracking method integrates high order motion information and high order appearance variation. The multi-graph matching method carries out matching over graph vertices and structure matching over graph edges simultaneously. The matching consistency across multi-graphs is based on the high-order tensor optimization. Various types of vertex affinities and edge/hyper-edge affinities are flexibly integrated. Experiments on several public datasets, such as the MOT16 challenge benchmark, validate the effectiveness of the proposed methods. Keywords Multi-dimensional assignment • Context/hyper-context aware tensor power iteration • Multi-object tracking • Multi-graph matching Communicated by M. Hebert.
IEEE Access
In this paper, we propose a highly feasible fully online multi-object tracking and segmentation (MOTS) method that uses instance segmentation results as an input. The proposed method is based on the Gaussian mixture probability hypothesis density (GMPHD) filter, a hierarchical data association (HDA), and a mask-based affinity fusion (MAF) model to achieve high-performance online tracking. The HDA consists of two associations: segment-to-track and track-to-track associations. One affinity, for position and motion, is computed by using the GMPHD filter, and the other affinity, for appearance is computed by using the responses from single object trackers such as kernalized correlation filter, SiamRPN, and DaSiamRPN. These two affinities are simply fused by using a score-level fusion method such as min-max normalization referred to as MAF. In addition, to reduce the number of false positive segments, we adopt mask IoU-based merging (mask merging). The proposed MOTS framework with the key modules: HDA, MAF, and mask merging, is easily extensible to simultaneously track multiple types of objects with CPU-only execution in parallel processing. In addition, the developed framework only requires simple parameter tuning unlike many existing MOTS methods that need intensive hyperparameter optimization. In the experiments on the two popular MOTS datasets, the key modules show some improvements. For instance, ID-switch decreases by more than half compared to a baseline method in the training sets. In conclusion, our tracker achieves state-of-the-art MOTS performance in the test sets. INDEX TERMS Multi-object tracking, instance segmentation, tracking by segmentation, online approach, Gaussian mixture probability hypothesis filter, affinity fusion.
In this paper, we focus mainly on designing a Multi-Target Object Tracking algorithm that would produce high-quality trajectories while maintaining low computational costs. Using online association, such features enable this algorithm to be used in applications like autonomous driving and autonomous surveillance. We propose CNN-based, instead of hand-crafted, features to lead to higher accuracies. We also present a novel grouping method for 2-D online environments without prior knowledge of camera parameters and an affinity measure based on the groups maintained in previous frames. Comprehensive evaluations of our algorithm (CNNMTT) on a publicly available and widely used dataset (MOT16) reveal that the CNNMTT method achieves high quality tracking results in comparison to the state of the art while being faster and involving much less computational cost.
Lecture Notes in Computer Science, 2020
This research introduces a novel multiple object tracking algorithm called SMAT (Smart Multiple Affinity Metric Tracking) that works as an online tracking-by-detection approach. The use of various characteristics from observation is established as a critical factor for improving tracking performance. By using the position, motion, appearance, and a correction component, our approach achieves an accuracy comparable to state of the art trackers. We use the optical flow to track the motion of the objects, we show that tracking accuracy can be improved by using a neural network to select key points to be tracked by the optical flow. The proposed algorithm is evaluated by using the KITTI Tracking Benchmark for the class CAR.
IEEE Transactions on Intelligent Vehicles, 2019
In this paper, we present a multiple-object vehicle tracking system. We introduce a method to combine bounding boxes extracted from multiple CNNs-based detections as a light and accurate alternative to confidence-based detection methods such as Non-Maximum Suppression and clustering approaches predicting a single bounding box. An Intersection over Union metric and a threshold value is proposed to determine whether a single detection or a connectivity graph between extracted bounding boxes exists. Affinity measurements are extracted from features representing bounding box geometry, appearance comparison, and changing scene properties. Then, data association is performed by solving the min-cost flow problem of the temporal windows' affinity network. An affinity network of a directed graph associates the objects and determines whether an existing tracklet is maintained, terminated or a new tracklet is initiated. Our model is evaluated and tested by KITTI Object Tracking-Car Class Benchmark dataset. Overall, the proposed Multiple Object Tracking performance is ranked second according to the Multiple Object Tracking Accuracy, Mostly-Tracked, Mostly-Lost statistical metric values, assures lower Fragmentation and less than half of the captured IDswitch in comparison with respect to the method reaching at the highest Multiple Object Tracking Accuracy metric. Furthermore, the runtime is 6 times faster.
ICTACT Journal on Image and Video Processing, 2017
Sensors, 2022
Joint detection and embedding (JDE) methods usually fuse the target motion information and appearance information as the data association matrix, which could fail when the target is briefly lost or blocked in multi-object tracking (MOT). In this paper, we aim to solve this problem by proposing a novel association matrix, the Embedding and GioU (EG) matrix, which combines the embedding cosine distance and GioU distance of objects. To improve the performance of data association, we develop a simple, effective, bottom-up fusion tracker for re-identity features, named SimpleTrack, and propose a new tracking strategy which can mitigate the loss of detection targets. To show the effectiveness of the proposed method, experiments are carried out using five different state-of-the-art JDE-based methods. The results show that by simply replacing the original association matrix with our EG matrix, we can achieve significant improvements in IDF1, HOTA and IDsw metrics, and increase the tracking speed of these methods by around 20%. In addition, our SimpleTrack has the best data association capability among the JDE-based methods, e.g., 61.6 HOTA and 76.3 IDF1, on the test set of MOT17 with 23 FPS running speed on a single GTX2080Ti GPU.
Pattern Recognition, 2015
The multi-target tracking problem is challenging when there exist occlusions, tracking failures of the detector and severe interferences between detections. In this paper, we propose a novel detection based tracking method that links detections into tracklets and further forms long trajectories. Unlike many previous hierarchical frameworks which split the data association into two separate optimization problems (linking detections locally and linking tracklets globally), we introduce a unified algorithm that can automatically relearn the trajectory models from the local and global information for finding the joint optimal assignment. In each temporal window, the trajectory models are initialized by the local information to link those easy-to-connect detections into a set of tracklets. Then the trajectory models are updated by the reliable tracklets and reused to link separated tracklets into long trajectories. We iteratively update the trajectory models by more information from more frames until the result converges. The iterative process gradually improves the accuracy of the trajectory models, which in turn improves the target ID inferences for all detections by the MRF model. Experiment results revealed that our proposed method achieved state-of-the-art multi-target tracking performance.
Multiple object tracking using space-time adaptive correlation tracking, 2023
In application of tracking and detecting the suspicious activities, multiple object tracking (MOT) has been given fine attention due to its application as it provides the parallel task of identification and tracking of human. MOT ensures the identification and trajectory for each object frame as they interact, despite the changes in its appearance, occlusion and various other tasks involved. Recent adoption of deep learning has given a new perspective but still achieving high metrics remains a major issue to overcome such issues, this research work presents the integrated architecture of deep convolutional covariance networks (DCCNs) and space-time adaptive correlation tracking (STACT) algorithm with similarity map function (SMF). Moreover, in proposed work, DCCNs is utilized for feature extractions through each frame capturing the distinctive information, STACT is tracking approaches that utilizes the SMF for locating and tracking objects. SMFs are updated for any changes in human appearances and motion, also it deals with occlusion. Here the proposed model is evaluated on MOT17 and MOT20 dataset. Performance analysis is carried out through comparing the existing model and Integrated-DCCN achieves higher metrics.
2007 IEEE International Conference on Image Processing, 2007
There are ever increasing number of applications of multitarget tracking and considerable research has been conducted to solve this problem. Multi-target tracking is a NP-hard problem and almost all of the present multi-target tracking algorithms are sub-optimal by finding the solution in a reduced hypothesis space. In this paper we introduce a new approach toward finding the optimal single frame solution for general multi-target tracking problem. Our proposed method finds the optimal solution using linear programming optimization method. The proposed method has been successfully applied to synthetic and real data.
2014 Canadian Conference on Computer and Robot Vision, 2014
This paper presents a novel multiple object tracking framework based on multiple visual cues. To build tracks by selecting the best matching score between several detections, a set of probability maps is estimated by a function integrating templates using a sparse representation and color information using locality sensitive histograms. All people detected in two consecutive frames are matched with each other based on similarity scores. This last task is performed using the comparison of two models (sparse apparence and color models). A score matrix is then obtained for each model. Those scores are combined by Dempster-Shafer's combination rule. To obtain an optimal selection of the best candidate, a data association step is achieved using a greedy search algorithm. We validated our tracking algorithm on challenging publicly available video sequences and we show that we outperform recent state-of-theart methods.
Lecture Notes in Computer Science, 2004
This paper describes a real-time system for multi-target tracking and classification in image sequences from a single stationary camera. Several targets can be tracked simultaneously in spite of splits and merges amongst the foreground objects and presence of clutter in the segmentation results. In results we show tracking of upto 17 targets simultaneously. The algorithm combines Kalman filter-based motion and shape tracking with an efficient pattern matching algorithm. The latter facilitates the use of a dynamic programming strategy to efficiently solve the data association problem in presence of multiple splits and merges. The system is fully automatic and requires no manual input of any kind for initialization of tracking. The initialization for tracking is done using attributed graphs. The algorithm gives stable and noise free track initialization. The image based tracking results are used as inputs to a Bayesian network based classifier to classify the targets into different categories. After classification a simple 3D model for each class is used along with camera calibration to obtain 3D tracking results for the targets. We present results on a large number of real world image sequences, and accurate 3D tracking results compared with the readings from the speedometer of the vehicle. The complete tracking system including segmentation of moving targets works at about 25Hz for 352×288 resolution color images on a 2.8 GHz pentium-4 desktop.
Machine vision and applications, 2024
Handling unreliable detections and avoiding identity switches are crucial for the success of multiple object tracking (MOT). Ideally, MOT algorithm should use true positive detections only, work in real-time and produce no identity switches. To approach the described ideal solution, we present the BoostTrack, a simple yet effective tracing-by-detection MOT method that utilizes several lightweight plug and play additions to improve MOT performance. We design a detection-tracklet confidence score and use it to scale the similarity measure and implicitly favour high detection confidence and high tracklet confidence pairs in one-stage association. To reduce the ambiguity arising from using intersection over union (IoU), we propose a novel Mahalanobis distance and shape similarity additions to boost the overall similarity measure. To utilize low-detection score bounding boxes in one-stage association, we propose to boost the confidence scores of two groups of detections: the detections we assume to correspond to the existing tracked object, and the detections we assume to correspond to a previously undetected object. The proposed additions are orthogonal to the existing approaches, and we combine them with interpolation and camera motion compensation to achieve results comparable to the standard benchmark solutions while retaining real-time execution speed. When combined with appearance similarity, our method outperforms all standard benchmark solutions on MOT17 and MOT20 datasets. It ranks first among online methods in HOTA metric in the MOT Challenge on MOT17 and MOT20 test sets. We make our code available at https://github.com/vukasin-stanojevic/BoostTrack.
2006 9th International Conference on Information Fusion, 2006
In many useful video tracking situations, targets move through repeated mutual occlusions. As targets undergo occlusions , the feature subsets and combinations of those features that are effective in identifying the target and improvin g tracking performance may change. We use Combinatorial Fusion Analysis to select and evaluate criteria by which to identify the combination of features that will produce the most accurate tracking.
Electronic Imaging, 2019
An appearance model plays a crucial rule in multi-target tracking. In traditional approaches, the two steps of appearance modeling i.e visual representation and statistically similarity measure are modeled separately. Visual representation is achieved either through hand-crafted features or deep features and statically similarity is measure through a cross entropy loss function. A loss function based on crossentropy (KL-divergence, mutual information) find closely related probability distribution for the targets. However, if the targets have similar visual representation, it ends up mixing the targets. To tackle this problem, we come up with a synergetic appearance model named Single Shot Appearance Model (SSAM) based on Siamese neural network. The network is trained with a contrastive loss function for finding the similarity between different targets in a single shot. The input to the network is two target patches and based on their similarity, a contrastive score is output by the network. The proposed model is evaluated on accumulative dissimilarity metric on three datasets. Quantitatively, promising results are achieved against three baseline methods.
In the recent past, the computer vision community has developed centralized benchmarks for the performance evaluation of a variety of tasks, including generic object and pedestrian detection, 3D reconstruction, optical flow, single-object short-term tracking, and stereo estimation. Despite potential pitfalls of such benchmarks, they have proved to be extremely helpful to advance the state of the art in the respective area. Interestingly, there has been rather limited work on the standardization of quantitative benchmarks for multiple target tracking. One of the few exceptions is the well-known PETS dataset [20], targeted primarily at surveillance applications. Despite being widely used, it is often applied inconsistently, for example involving using different subsets of the available data, different ways of training the models, or differing evaluation scripts. This paper describes our work toward a novel multiple object tracking benchmark aimed to address such issues. We discuss the...
In tracking-by-detection paradigm for multi-target tracking, target association is modeled as an optimization problem that is usually solved through network flow formulation. In this paper, we proposed combinatorial optimization formulation and used a bipartite graph matching for associating the targets in the consecutive frames. Usually, the target of interest is represented in a bounding box and track the whole box as a single entity. However, in the case of humans, the body goes through complex articulation and occlusion that severely deteriorate the tracking performance. To partially tackle the problem of occlusion, we argue that tracking the rigid body organ could lead to better tracking performance compared to the whole body tracking. Based on this assumption, we generated the target hypothesis of only the spatial locations of person’s heads in every frame. After the localization of head location, a constant velocity motion model is used for the temporal evolution of the targe...
2017 IEEE International Conference on Image Processing (ICIP), 2017
▪ We propose a novel Hierarchical Feature Model (HFM) for multi-target tracking. ▪ Traditional approaches use local or global hand-crafted features to model the appearance of a target. ▪ In this work, we investigate deep features for modeling the appearance of the targets. ▪ Deep features are sparse coded for computational efficiency and a Bayesian filter is used to track the targets.
2017
In this paper, we consider multi-object target tracking using video reference datasets. Our objective is detection of the target using a novel adaboost and Gentle Boost method in order to track the subjects from reference data sets. Multi-target tracking is still challenging topic which is used to find the same object across different camera views and also used to find the location and sizes of different object at different places. Furthermore extensive performance analysis of the three main parts demonstrates usefulness of multi object tracking. We carried out experiment to analyze discriminative power of nine features (HSV, LBP, HOG extracted on body, torso and legs) used in the appearance model for multicam dataset. For each features the RMSE and PSNR obtained.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019
In this work, we propose a tracker that differs from most existing multi-target trackers in two major ways. Firstly, our tracker does not rely on a pre-trained object detector to get the initial object hypotheses. Secondly, our tracker's final output is the fine contours of the targets rather than traditional bounding boxes. Therefore, our tracker simultaneously solves three main problems: detection, data association and segmentation. This is especially important because the output of each of those three problems are highly correlated and the solution of one can greatly help improve the others. The proposed algorithm consists of two main components: structured learning and Lagrange dual decomposition. Our structured learning based tracker learns a model for each target and infers the best locations of all targets simultaneously in a video clip. The inference of our structured learning is achieved through a new Target Identity-aware Network Flow (TINF), where each node in the network encodes the probability of each target identity belonging to that node. The probabilities are obtained by training target specific models using a global structured learning technique. This is followed by proposed Lagrangian relaxation optimization to find the high quality solution to the network. This forms the first component of our tracker. The second component is Lagrange dual decomposition, which combines the structured learning tracker with a segmentation algorithm. For segmentation, multi-label Conditional Random Field (CRF) is applied to a superpixel based spatio-temporal graph in a segment of video, in order to assign background or target labels to every superpixel. We show how the multi-label CRF is combined with the structured learning tracker through our dual decomposition formulation. This leads to more accurate segmentation results and also helps better resolve typical difficulties in multiple target tracking, such as occlusion handling, ID-switch and track drifting. The experiments on diverse and challenging sequences show that our method achieves superior results compared to competitive approaches for detection, multiple target tracking as well as segmentation.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.