3DMOTFormer: Graph Transformer for Online 3D Multi-Object Tracking
Tracking 3D objects accurately and consistently is crucial for autonomous vehicles, enabling more reliable downstream tasks such as trajectory prediction and motion pl...
Tags:Paper and LLMsAutonomous VehiclesPricing Type
- Pricing Type: Free
- Price Range Start($):
GitHub Link
The GitHub link is https://github.com/dsx0511/3dmotformer
Introduce
The repository “3DMOTFormer” is the official implementation of the ICCV2023 paper titled “3DMOTFormer Graph Transformer for Online 3D Multi-Object Tracking.” The paper addresses the challenge of accurate 3D object tracking for autonomous vehicles.
It introduces a learned framework called 3DMOTFormer that leverages the transformer architecture.
The framework uses an Edge-Augmented Graph Transformer to handle frame-by-frame reasoning on track-detection graphs and performs data association through edge classification.
To mitigate the gap between training and inference, an innovative online training strategy is proposed.
The approach achieves state-of-the-art results on nuScenes validation and test data using CenterPoint detections. The repository provides installation instructions and data preparation steps for replication.
Tracking 3D objects accurately and consistently is crucial for autonomous vehicles, enabling more reliable downstream tasks such as trajectory prediction and motion planning.
Content
Tracking 3D objects accurately and consistently is crucial for autonomous vehicles, enabling more reliable downstream tasks such as trajectory prediction and motion planning. Based on the substantial progress in object detection in recent years, the tracking-by-detection paradigm has become a popular choice due to its simplicity and efficiency. State-of-the-art 3D multi-object tracking (MOT) works typically rely on non-learned model-based algorithms such as Kalman Filter but require many manually tuned parameters. On the other hand, learning-based approaches face the problem of adapting the training to the online setting, leading to inevitable distribution mismatch between training and inference as well as suboptimal performance. In this work, we propose 3DMOTFormer, a learned geometry-based 3D MOT framework building upon the transformer architecture. We use an Edge-Augmented Graph Transformer to reason on the track-detection bipartite graph frame-by-frame and conduct data association via edge classification. To reduce the distribution mismatch between training and inference, we propose a novel online training strategy with autoregressive and recurrent forward pass as well as sequential batch optimization. Using CenterPoint detections, our approach achieves state-of-the-art 71.2% and 68.2% AMOTA on nuScenes validation and test split. In addition, a trained 3DMOTFormer model generalizes well across different object detectors.
Related
In the pursuit of promoting the expressiveness of GNNs for tail nodes, we explore how the deficiency of structural information deteriorates the performance of tail nodes and propose a general Structural Augmentation based taIL nOde Representation learning framework, dubbed as SAILOR, which can jointly learn to augment the graph structure and extract more informative representations for tail nodes.










