Technical report can be found on arXiv:2510.24152
This is the code for reproducing the results of the RoboSense track1-phase2.
Install the required dependencies listed in requirements.txt:
pip install -r requirements.txtSame as phase1.
Official Challenge Repository: The original dataset and challenge details can be found at robosense2025/track1
- Converted questions with history frame file is
data/nuscenes/converted_with_history.json - Ensure corresponding images are stored in the
data/nuscenes/samples/directory - The system expects images organized by camera views (CAM_FRONT, CAM_BACK, etc.)
- Download the Phase-2 data from the official repository and organize according to the structure below
data/
└── nuscenes/
├── converted.json
├── converted_with_history.json
├── robosense_track1_phase2.json
├── samples/
│ ├── CAM_FRONT/
│ │ ├── n008-2018-...jpg
│ │ └── ...
│ ├── CAM_BACK/
│ ├── CAM_FRONT_LEFT/
│ ├── CAM_FRONT_RIGHT/
│ ├── CAM_BACK_LEFT/
│ └── CAM_BACK_RIGHT/
├── v1.0-trainval/
│ ├── sample.json
│ ├── sample_data.json
│ └── ...
└── maps/
├── 36092f0b03a857c6a3403e25b4b7aab3.png
└── ...
This project uses the opensource Qwen2.5-VL-72B-Instruct model accessed through API:
Method 1: Alibaba Cloud Bailian
- Register for an API key at Alibaba Cloud Bailian Console
- Configure your API key in
inference_parallel.sh - Note: Each inference run costs approximately 250 RMB in tokens
Method 2: ModelScope API
- Register for a ModelScope API key
- Configure your API key in
inference_parallel.sh - Note: Free tier has daily request limits
Run the inference pipeline:
bash inference_parallel.shResults will be saved to the outputs/ folder.
If you find this work useful, please cite:
@article{wu2025enhancing,
title={Enhancing Vision-Language Models for Autonomous Driving through Task-Specific Prompting and Spatial Reasoning},
author={Wu, Aodi and Luo, Xubo},
journal={arXiv preprint arXiv:2510.24152},
year={2025},
note={Technical Report for RoboSense Challenge at IROS 2025},
url={https://arxiv.org/abs/2510.24152}
}