Skip to content

wuaodi/UCAS-CSU-phase2

Repository files navigation

UCAS-CSU-phase2

Technical Report

arXiv

Technical report can be found on arXiv:2510.24152

This is the code for reproducing the results of the RoboSense track1-phase2.

Prerequisites

Install the required dependencies listed in requirements.txt:

pip install -r requirements.txt

Same as phase1.

Data Preparation

Official Challenge Repository: The original dataset and challenge details can be found at robosense2025/track1

  1. Converted questions with history frame file is data/nuscenes/converted_with_history.json
  2. Ensure corresponding images are stored in the data/nuscenes/samples/ directory
  3. The system expects images organized by camera views (CAM_FRONT, CAM_BACK, etc.)
  4. Download the Phase-2 data from the official repository and organize according to the structure below

data content

data/
└── nuscenes/
    ├── converted.json
    ├── converted_with_history.json
    ├── robosense_track1_phase2.json
    ├── samples/
    │   ├── CAM_FRONT/
    │   │   ├── n008-2018-...jpg
    │   │   └── ...
    │   ├── CAM_BACK/
    │   ├── CAM_FRONT_LEFT/
    │   ├── CAM_FRONT_RIGHT/
    │   ├── CAM_BACK_LEFT/
    │   └── CAM_BACK_RIGHT/
    ├── v1.0-trainval/
    │   ├── sample.json
    │   ├── sample_data.json
    │   └── ...
    └── maps/
        ├── 36092f0b03a857c6a3403e25b4b7aab3.png
        └── ...

Model Configuration

This project uses the opensource Qwen2.5-VL-72B-Instruct model accessed through API:

API Configuration

Method 1: Alibaba Cloud Bailian

  1. Register for an API key at Alibaba Cloud Bailian Console
  2. Configure your API key in inference_parallel.sh
  3. Note: Each inference run costs approximately 250 RMB in tokens

Method 2: ModelScope API

  1. Register for a ModelScope API key
  2. Configure your API key in inference_parallel.sh
  3. Note: Free tier has daily request limits

Usage

Run the inference pipeline:

bash inference_parallel.sh

Results will be saved to the outputs/ folder.

Citation

If you find this work useful, please cite:

@article{wu2025enhancing,
  title={Enhancing Vision-Language Models for Autonomous Driving through Task-Specific Prompting and Spatial Reasoning},
  author={Wu, Aodi and Luo, Xubo},
  journal={arXiv preprint arXiv:2510.24152},
  year={2025},
  note={Technical Report for RoboSense Challenge at IROS 2025},
  url={https://arxiv.org/abs/2510.24152}
}

About

This is the code for the IROS2025 RoboSense challenge track1: LLM for Driving

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors