100% found this document useful (1 vote)
179 views45 pages

Object Tracking for Researchers

This document summarizes a new approach for simultaneous object detection and multi-object tracking called CenterTrack. CenterTrack frames tracking as detection conditioned on previous tracks, representing each object as a point and using a heatmap to encode the center point locations. It achieves simplified tracking through conditioned detection, point-based matching rather than traditional algorithms, and training directly on video frames. Evaluation on standard benchmarks like MOT17, KITTI, and nuScenes shows it achieves state-of-the-art or competitive performance with advantages of simplicity and efficiency.

Uploaded by

Rahul Deora
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
179 views45 pages

Object Tracking for Researchers

This document summarizes a new approach for simultaneous object detection and multi-object tracking called CenterTrack. CenterTrack frames tracking as detection conditioned on previous tracks, representing each object as a point and using a heatmap to encode the center point locations. It achieves simplified tracking through conditioned detection, point-based matching rather than traditional algorithms, and training directly on video frames. Evaluation on standard benchmarks like MOT17, KITTI, and nuScenes shows it achieves state-of-the-art or competitive performance with advantages of simplicity and efficiency.

Uploaded by

Rahul Deora
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Tracking Objects as Points

Xingyi Zhou, Vladlen Koltun, Philipp Krähenbühl


UT Austin & Intel Labs
Early trackers

[Link]
Early trackers

[Link]
Current frameworks: Tracking-after-detection
Frame t-1

Frame t
Current frameworks: Tracking-after-detection
Frame t-1

Frame t
Current frameworks: Tracking-after-detection
Current frameworks: Tracking-after-detection
Current frameworks: Tracking-after-detection

Tang et al. 2017: Re-identification features, pose features


Xu et al. 2019: Spatial-temporal trajectories
Simultaneous detection and tracking
Frame t-1

Frame t

Bergmann et al. 2019 Tracking without bells and whistles


Simultaneous detection and tracking
Frame t-1

Frame t

Bergmann et al. 2019 Tracking without bells and whistles


Frame t

Frame t-1

Tracks t-1 Deep Network


Frame t
Frame t-1
Tracks t-1
Frame t Detections t

Frame t-1

Deep
Network
Tracks t-1 Offsets t → t-1
Detections t
Offsets t → t-1
Offsets t → t-1
Advantages
Advantages
• Simplified tracking conditioned detection.
Conditioned detection
• Ours:
• Tractor [Bergmann et al. 2019]:

• Implicit prior heatmap • Explicit region proposal


Advantages
• Simplified tracking conditioned detection.

• Simplified matching.
Point-based matching
• Ours:
• Prior works:

• Greedy matching by point distance. • Hungarian algorithm.

• Separate motion model.

• Additional association features.


Advantages
• Simplified tracking conditioned detection.

• Simplified matching.

• Simplified training on videos.


Frame t-1
Frame t
Results
Results - KITTI
Extend to monocular 3D tracking
Results - monocular 3D tracking on nuScenes
Ablation studies
MOT17 (30 FPS) KITTI (10 FPS) nuScenes (2FPS)
67 89 30
detection only
w/o offset
w/o heatmap
66 87.75 Ours 22.5

65 86.5 15

64 85.25 7.5

63 84 0
Ablation studies
MOT17 (30 FPS) KITTI (10 FPS) nuScenes (2FPS)
67 89 30
detection only
without vs. with heatmap w/o offset
w/o heatmap
66 87.75 Ours 22.5

65 86.5 15

64 85.25 7.5

63 84 0
Ablation studies
MOT17 (30 FPS) KITTI (10 FPS) nuScenes (2FPS)
89 30
67 without vs. with offset
detection only
w/o offset
w/o heatmap
66 87.75 Ours 22.5

65 86.5 15

64 85.25 7.5

63 84 0
Ablation studies
MOT17 (30 FPS) KITTI (10 FPS) nuScenes (2FPS)
67 89 30
detection only
w/o offset
w/o heatmap
66 87.75 Ours 22.5

65 86.5 15

64 85.25 7.5

63 84 0
Ablation studies - motion models
Trained on image data only
Trained on image data only
Code is available!

[Link]

You might also like