Zhe Liu 1, 2,
Jinghua Hou 1,
Xiaoqing Ye 3,
Jingdong Wang 3,
Hengshuang Zhao 2,✉,
Xiang Bai 1,✉
1 Huazhong University of Science and Technology,
2 The University of Hong Kong,
3 Baidu Inc.
✉ Corresponding author.
-
Unified Heterogeneous Inputs. UniLION integrates multi-view images, LiDAR point clouds, and temporal information into a unified 3D backbone through direct token concatenation without hand-crafted fusion modules. 💪
-
Unified Model. UniLION enables parameter sharing across different input formats. Specifically, once trained with multi-modal temporal data, the same UniLION model can be directly deployed across different sensor configurations and temporal settings (e.g., LiDAR-only, temporal LiDAR, or multi-modal fusion) without retraining. 💪
-
Unified Representation. UniLION highly compresses heterogeneous multi-modal and temporal information into a compact BEV feature representation that serves as a shared strong representation for autonomous driving. 💪
-
Strong performance. UniLION achieves competitive and SOTA performance across comprehensive autonomous driving tasks including 3D perception, motion prediction, and planning. 💪
- 2025.12.15: DrivePI paper released. 🔥
- 2025.12.15: GenieDrive (Physics-Aware Driving World Model) paper released. 🔥
- 2025.06.16: Our new work about Transformer-Mamba architecture HybridTM have been accepted by IROS 2025 as Oral presentation. 🎉
- 2024.09.26: LION has been accepted by NeurIPS 2024. 🎉
- 2024.07.25: LION paper released. 🔥
- 2024.07.02: Our new works OPEN and SEED have been accepted by ECCV 2024. 🎉
- nuScenes Validation Set
| Model | Modality | NDS | mAP | AMOTA | mIoU | RayIoU | minADE (Car/Ped.) | L2 | Col. | Config | Checkpoint |
|---|---|---|---|---|---|---|---|---|---|---|---|
| UniLION | L | 72.3 | 67.5 | 72.6 | 71.7 | 46.8 | - | - | - | - | - |
| UniLION | LT | 73.0 | 68.9 | 73.3 | 72.4 | 49.6 | 0.58 / 0.39 | 0.60 | 0.27 | - | - |
| UniLION | LC | 74.9 | 72.2 | 76.2 | 72.3 | 50.8 | - | - | - | config | - |
| UniLION | LCT | 75.4 | 73.2 | 76.5 | 73.3 | 51.3 | 0.57 / 0.37 | 0.65 | 0.18 | config | model |
- 3D Object Detection
| Model | Modality | NDS | mAP |
|---|---|---|---|
| UniLION | L | 72.3 | 67.5 |
| UniLION | LT | 73.0 | 68.9 |
| UniLION | LC | 74.9 | 72.2 |
| UniLION | LCT | 75.4 | 73.2 |
- Multi-object Tracking
| Model | Modality | AMOTA | AMOTP | IDS |
|---|---|---|---|---|
| UniLION | L | 72.6 | 0.542 | 510 |
| UniLION | LT | 73.3 | 0.515 | 537 |
| UniLION | LC | 76.2 | 0.499 | 711 |
| UniLION | LCT | 76.5 | 0.477 | 613 |
- BEV Map Segmentation
| Model | Modality | mIoU |
|---|---|---|
| UniLION | L | 71.7 |
| UniLION | LT | 72.4 |
| UniLION | LC | 72.3 |
| UniLION | LCT | 73.3 |
- 3D Occupancy Prediction
| Model | Modality | RayIoU |
|---|---|---|
| UniLION | L | 46.8 |
| UniLION | LT | 49.6 |
| UniLION | LC | 50.8 |
| UniLION | LCT | 51.3 |
- Motion Prediction
| Model | Modality | minADE (Car) | minADE (Ped) | EPA |
|---|---|---|---|---|
| UniLION | LT | 0.58 | 0.39 | 0.647 |
| UniLION | LCT | 0.57 | 0.37 | 0.678 |
- Planning
| Model | Modality | L2 (1s) | L2 (2s) | L2 (3s) | L2 (avg) | Col. (1s) | Col. (2s) | Col. (3s) | Col. (avg) |
|---|---|---|---|---|---|---|---|---|---|
| UniLION | LT | 0.35 | 0.67 | 1.09 | 0.70 | 0.01 | 0.20 | 0.60 | 0.27 |
| UniLION | LCT | 0.33 | 0.62 | 0.99 | 0.65 | 0.01 | 0.12 | 0.42 | 0.18 |
Please refer to INSTALL.md for the installation of UniLION codebase.
First, download the nuscenes dataset
├── can_bus
├── maps
├── occ3d
├── samples
├── sweeps
└── v1.0-trainval
The dataset of occ3d can be downloaded from: https://tsinghua-mars-lab.github.io/Occ3D/
Then, we run the following script:
export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH
export PYTHONPATH="$(dirname $0)/..":$PYTHONPATH
python tools/data_converter/nuscenes_converter.py nuscenes \
--root-path ./data/nuscenes \
--canbus ./data/nuscenes \
--out-dir ./data/nuscenes \
--extra-tag nuscenes \
--db-save-path ./data/nuscenes/ \
--version v1.0
For planning anchor generation, please run:
python kmeans_planning.pyDwonload image pretrained model
We adopt the same image pretrained model with bevfusion. You could download this file in swint_nuimg_pretrained.pth
After completing all the steps above, the final data structure should be organized as follows:
├── can_bus
├── kmeans_motion_6.npy
├── kmeans_planning_4096.npy
├── kmeans_planning_6.npy
├── maps
├── nuscenes_dbinfos_train.pkl
├── nuscenes_gt_database
├── nuscenes_infos_train.pkl
├── nuscenes_infos_val.pkl
├── occ3d
├── samples
├── sweeps
└── v1.0-trainval
# First stage
tools/dist_train.sh projects/unilion_swin_384_det_map.py 8
# Second stage
tools/dist_train.sh projects/unilion_swin_384_seq_perception.py 8
# Third stage
tools/dist_train.sh projects/unilion_swin_384_seq_e2e.py 8Note: for better performance, you can additionly train 3D object detection for providing the pretrained model in the first stage.
tools/dist_train.sh projects/unilion_swin_384_det.py 8You can freely select the tasks to be evaluated in the config.
tools/dist_test.sh projects/<CONFIGS> 8 <CKPT> --eval mAPBesides, you can use our released model (UniLION model) to evaluate all results:
tools/dist_test.sh projects/projects/configs/unilion_swin_384_seq_e2e.py 8 <CKPT> --eval mAP- Release the paper.
- Release the code of UniLION.
- Release checkpoints of UniLION.
@article{liu2024lion,
title={LION: Linear Group RNN for 3D Object Detection in Point Clouds},
author={Zhe Liu, Jinghua Hou, Xingyu Wang, Xiaoqing Ye, Jingdong Wang, Hengshuang Zhao, Xiang Bai},
journal={Advances in Neural Information Processing Systems},
year={2024}
}
@article{liu2025unilion,
title={UniLION:Towards Unified Autonomous Driving Model with Linear Group RNNs},
author={Zhe Liu, Jinghua Hou, Xiaoqing Ye, Jingdong Wang, Hengshuang Zhao, Xiang Bai},
journal={arXiv preprint arXiv:2511.01768},
year={2025}
}
We thank these great works and open-source repositories: LION, MMDectection, SparseDrive, Mamba, RWKV, Vision-RWKV, and flash-linear-attention.
