GitHub - FanGShiYuu/CoReVLA: CoReVLA: A Dual-Stage End-to-End Autonomous Driving Framework for Long-Tail Scenarios via Collect-and-Refine

Updates

2025/12/10: Model weights are updated on Hugging Face: Model

Abstract 🧾

Autonomous Driving (AD) systems have made notable progress, but their performance in long-tail, safety-critical scenarios remains limited. These rare cases contribute a disproportionate number of accidents. Vision-Language Action (VLA) models have strong reasoning abilities and offer a potential solution, but their effectiveness is limited by the lack of high-quality data and inefficient learning in such conditions. To address these challenges, we propose CoReVLA, a continual learning end-to-end autonomous driving framework that improves the performance in long-tail scenarios through a dual-stage process of data Collection and behavior Refinement. First, the model is jointly fine-tuned on a mixture of open-source driving QA datasets, allowing it to acquire a foundational understanding of driving scenarios. Next, CoReVLA is deployed within the Cave Automatic Virtual Environment (CAVE) simulation platform, where driver takeover data is collected from real-time interactions. Each takeover indicates a long-tail scenario that CoReVLA fails to handle reliably. Finally, the model is refined via Direct Preference Optimization (DPO), allowing it to learn directly from human preferences and thereby avoid reward hacking caused by manually designed rewards. Extensive open-loop and closed-loop experiments demonstrate that the proposed CoReVLA model can accurately perceive driving scenarios and make appropriate decisions. On the Bench2Drive benchmark, CoReVLA achieves a Driving Score (DS) of 72.18 and a Success Rate (SR) of 50%, outperforming state-of-the-art methods by 7.96 DS and 15% SR under long-tail, safety-critical scenarios. Furthermore, case studies demonstrate the model’s ability to continually improve its performance in similar failure-prone scenarios by leveraging past takeover experiences.

Highlights ✨

HITL-based data collection in immersive simulation. We collect visually grounded takeover data from the CAVE platform, enabling capture of failure cases in long-tail scenarios.
DPO-based behavior refinement. We apply Direct Preference Optimization to efficiently align the model with human intent using sparse but high-quality takeover data.

TODO List 🔨

Train and Eval Code
DPO Dataset Part 1 (LingoQA)
Model Checkpoint before Refinement
Model Checkpoint after Refinement
DPO Dataset Part 2 (Takeover in CAVE)

Getting Started 🚀

1.CARLA Setup

mkdir carla && cd carla
wget https://carla-releases.s3.us-east-005.backblazeb2.com/Linux/CARLA_0.9.15.tar.gz
tar -xvf CARLA_0.9.15.tar.gz
cd Import
wget https://carla-releases.s3.us-east-005.backblazeb2.com/Linux/AdditionalMaps_0.9.15.tar.gz
cd .. && bash ImportAssets.sh

Set environment variables:

export CARLA_ROOT=YOUR_CARLA_PATH
echo "$CARLA_ROOT/PythonAPI/carla/dist/carla-0.9.15-py3.7-linux-x86_64.egg" >> YOUR_CONDA_PATH/envs/YOUR_CONDA_ENV_NAME/lib/python3.7/site-packages/carla.pth

Write env.sh

export CARLA_ROOT=/path/to/your/carla
export CARLA_SERVER=${CARLA_ROOT}/CarlaUE4.sh
export PYTHONPATH=$PYTHONPATH:${CARLA_ROOT}/PythonAPI:${CARLA_ROOT}/PythonAPI/carla
export PYTHONPATH=$PYTHONPATH:$CARLA_ROOT/PythonAPI/carla/dist/carla-0.9.15-py3.7-linux-x86_64.egg

export WORK_DIR=/path/to/this/repo
export PYTHONPATH=$PYTHONPATH:${WORK_DIR}/scenario_runner:${WORK_DIR}/leaderboard:${WORK_DIR}/B2DVL_Adapter

export SCENARIO_RUNNER_ROOT=${WORK_DIR}/scenario_runner
export LEADERBOARD_ROOT=${WORK_DIR}/leaderboard

export VQA_GEN=1
export STRICT_MODE=1

Activate Environment

source ./env.sh

2.Bench2Drive Setup

Detailed initialization procedures can be found in Here.

git clone https://github.com/Thinklab-SJTU/Bench2Drive-VL

Replace the relevant source code files in the repository (listed in the bench2drive_files folder).

3.Model Setup

Load the base model Qwen2.5-VL-7B-Instruct from the official Hugging Face repository:

from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "Qwen/Qwen2.5-VL-7B-Instruct", torch_dtype="auto", device_map="auto"
)
processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct")

Merge LoRA parameters into the Base Model with PEFT library or Llama Factory with this YAML (the latter is recommended)

from transformers import AutoTokenizer
from peft import PeftModel, PeftConfig
import torch

# Load base model
base_model_path = "./XX/"  # where you store Qwen
model = AutoModelForCausalLM.from_pretrained(base_model_path, trust_remote_code=True, torch_dtype=torch.float16)
tokenizer = AutoTokenizer.from_pretrained(base_model_path, trust_remote_code=True)

# Load and merge LoRA
lora_path = "./XX/" 
model = PeftModel.from_pretrained(model, lora_path)
model = model.merge_and_unload()

# Save the merged model (optional)
model.save_pretrained("./merged_model")
tokenizer.save_pretrained("./merged_model")

OR you can directly download the model from Huggingface: Here

4.Run Closed-loop Test

Once the model and Bench2Drive are set up, run the following command for closed-loop testing:

bash Dev10-Qwen2.5-all.sh

Data Release 📁

STF Datasets

Base Dataset	Instruction	Size	Released
BDD-100k	Used for multi-task autonomous driving benchmarks: detection, segmentation, tracking, prediction.	100k figures with textual explanation and description	O
HAD HRI	Supports research on driver behavior, takeover intention, and human-automation responsibility modeling.	4319 videos of 20 seconds long with textual explanation and description (3926 in Train; 1123 in Val)	O
LingoQA	Evaluates visual-language models through question answering on images or video content.	Scenario clips with 5 seconds length and language annotations (267.8k QA in Congition; 152.5 QA in Decision)	O

Example of STF dataset format, use this code to process the original data into the corresponding format

{
   "messages": [
     {
       "role": "user",
       "content": "<image>../datasets/images/train/1074b3c991ca71badb563356dbef3d0a/0.jpg<image>../datasets/images/train/1074b3c991ca71badb563356dbef3d0a/1.jpg<image>../datasets/images/train/1074b3c991ca71badb563356dbef3d0a/2.jpg<image>../datasets/images/train/1074b3c991ca71badb563356dbef3d0a/3.jpg<image>../datasets/images/train/1074b3c991ca71badb563356dbef3d0a/4.jpg\nWhat is your current driving action and it's justification?"
     },
     {
       "role": "assistant",
       "content": "I am stopping to let the pedestrian safely cross the road ahead because it is the law and it's the right thing to do to ensure the safety of the pedestrian."
     }
   ],
   "images": [
     "../datasets/images/train/1074b3c991ca71badb563356dbef3d0a/0.jpg",
     "../datasets/images/train/1074b3c991ca71badb563356dbef3d0a/1.jpg",
     "../datasets/images/train/1074b3c991ca71badb563356dbef3d0a/2.jpg",
     "../datasets/images/train/1074b3c991ca71badb563356dbef3d0a/3.jpg",
     "../datasets/images/train/1074b3c991ca71badb563356dbef3d0a/4.jpg"
   ]
 },

DPO Datasets

Base Dataset	Instruction	Size	Released
Part1	Preference data from LingoQA.	9k decision preference pairs with language annotations	O
Part2	Preference data from CAVE.	Takeover data in CAVE	X

Example of DPO dataset format

{
   "conversations": [
     {
       "from": "human",
       "value": "<image>../datasets/images/train/213b094305b36196f786e9c263a6ab89/0.jpg<image>../datasets/images/train/213b094305b36196f786e9c263a6ab89/1.jpg<image>../datasets/images/train/213b094305b36196f786e9c263a6ab89/2.jpg<image>../datasets/images/train/213b094305b36196f786e9c263a6ab89/3.jpg<image>../datasets/images/train/213b094305b36196f786e9c263a6ab89/4.jpg\nWhat are you currently doing and why?"
     }
   ],
   "images": [
     "../datasets/images/train/213b094305b36196f786e9c263a6ab89/0.jpg",
     "../datasets/images/train/213b094305b36196f786e9c263a6ab89/1.jpg",
     "../datasets/images/train/213b094305b36196f786e9c263a6ab89/2.jpg",
     "../datasets/images/train/213b094305b36196f786e9c263a6ab89/3.jpg",
     "../datasets/images/train/213b094305b36196f786e9c263a6ab89/4.jpg"
   ],
   "chosen": {
     "from": "gpt",
     "value": "I will turn to the right lane and accelerate to pass the car in front of me because there is a larger gap on the right that can provide a passing environment."
   },
   "rejected": {
     "from": "gpt",
     "value": "I will keep following the car at a steady speed because I don't know if there are any cars behind me when I turn."
   }
 },

Citation

If this work is helpful for your research, please consider citing:

@misc{fang2025corevladualstageendtoendautonomous,
      title={CoReVLA: A Dual-Stage End-to-End Autonomous Driving Framework for Long-Tail Scenarios via Collect-and-Refine}, 
      author={Shiyu Fang and Yiming Cui and Haoyang Liang and Chen Lv and Peng Hang and Jian Sun},
      year={2025},
      eprint={2509.15968},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2509.15968}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
assets		assets
bench2drive_files		bench2drive_files
checkpoint		checkpoint
docs		docs
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Updates

Abstract 🧾

Highlights ✨

TODO List 🔨

Getting Started 🚀

1.CARLA Setup

2.Bench2Drive Setup

3.Model Setup

4.Run Closed-loop Test

Data Release 📁

STF Datasets

DPO Datasets

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Updates

Abstract 🧾

Highlights ✨

TODO List 🔨

Getting Started 🚀

1.CARLA Setup

2.Bench2Drive Setup

3.Model Setup

4.Run Closed-loop Test

Data Release 📁

STF Datasets

DPO Datasets

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages