UCAS-CSU-phase2

Technical Report

Technical report can be found on arXiv:2510.24152

This is the code for reproducing the results of the RoboSense track1-phase2.

Prerequisites

Install the required dependencies listed in requirements.txt:

pip install -r requirements.txt

Same as phase1.

Data Preparation

Official Challenge Repository: The original dataset and challenge details can be found at robosense2025/track1

Converted questions with history frame file is data/nuscenes/converted_with_history.json
Ensure corresponding images are stored in the data/nuscenes/samples/ directory
The system expects images organized by camera views (CAM_FRONT, CAM_BACK, etc.)
Download the Phase-2 data from the official repository and organize according to the structure below

data content

data/
└── nuscenes/
    ├── converted.json
    ├── converted_with_history.json
    ├── robosense_track1_phase2.json
    ├── samples/
    │   ├── CAM_FRONT/
    │   │   ├── n008-2018-...jpg
    │   │   └── ...
    │   ├── CAM_BACK/
    │   ├── CAM_FRONT_LEFT/
    │   ├── CAM_FRONT_RIGHT/
    │   ├── CAM_BACK_LEFT/
    │   └── CAM_BACK_RIGHT/
    ├── v1.0-trainval/
    │   ├── sample.json
    │   ├── sample_data.json
    │   └── ...
    └── maps/
        ├── 36092f0b03a857c6a3403e25b4b7aab3.png
        └── ...

Model Configuration

This project uses the opensource Qwen2.5-VL-72B-Instruct model accessed through API:

API Configuration

Method 1: Alibaba Cloud Bailian

Register for an API key at Alibaba Cloud Bailian Console
Configure your API key in inference_parallel.sh
Note: Each inference run costs approximately 250 RMB in tokens

Method 2: ModelScope API

Register for a ModelScope API key
Configure your API key in inference_parallel.sh
Note: Free tier has daily request limits

Usage

Run the inference pipeline:

bash inference_parallel.sh

Results will be saved to the outputs/ folder.

Citation

If you find this work useful, please cite:

@article{wu2025enhancing,
  title={Enhancing Vision-Language Models for Autonomous Driving through Task-Specific Prompting and Spatial Reasoning},
  author={Wu, Aodi and Luo, Xubo},
  journal={arXiv preprint arXiv:2510.24152},
  year={2025},
  note={Technical Report for RoboSense Challenge at IROS 2025},
  url={https://arxiv.org/abs/2510.24152}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
outputs		outputs
prompt		prompt
README.md		README.md
convert_format.py		convert_format.py
corruption_reference.jpg		corruption_reference.jpg
inference_parallel.py		inference_parallel.py
inference_parallel.sh		inference_parallel.sh
requirements.txt		requirements.txt
visual.py		visual.py
visual.sh		visual.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UCAS-CSU-phase2

Technical Report

Prerequisites

Data Preparation

data content

Model Configuration

API Configuration

Usage

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

UCAS-CSU-phase2

Technical Report

Prerequisites

Data Preparation

data content

Model Configuration

API Configuration

Usage

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages