Zehao Wang1, Huaide Jiang1, Shuaiwu Dong1, Yuping Wang1,2, Hang Qiu1, Jiachen Li1*
1University of California, Riverside 2University of Michigan *Corresponding author
Human driving behavior is inherently personal, shaped by long-term habits and influenced by short-term intentions. Individuals differ in how they accelerate, brake, merge, yield, and overtake across diverse situations. However, existing end-to-end autonomous driving systems either optimize for generic objectives or rely on fixed driving modes, lacking the ability to adapt to individual preferences or interpret natural language intent.
To address this gap, we propose Drive My Way (DMW), a personalized Vision-Language-Action (VLA) driving framework that aligns with users' long-term driving habits and adapts to real-time user instructions. DMW learns a user embedding from our personalized driving dataset collected across multiple real drivers and conditions the policy on this embedding during planning, while natural language instructions provide additional short-term guidance. Closed-loop evaluation on the Bench2Drive benchmark demonstrates that DMW improves style instruction adaptation, and user studies show that its generated behaviors are recognizable as each driver's own style, highlighting personalization as a key capability for human-centered autonomous driving.
- Long-term preference learning — A contrastive preference encoder learns user embeddings from structured driver profiles and historical driving behavior, capturing stable individual driving habits.
- Short-term instruction alignment — Natural language instructions at runtime steer the policy toward the user's immediate intent (e.g., aggressive vs. conservative maneuvers).
- GRPO-based policy alignment — Group Relative Policy Optimization with style-aware rewards aligns the VLA policy to diverse user preferences without relying on explicit human feedback.
- Personalized Driving Dataset (PDD) — Real human driving demonstrations across diverse CARLA scenarios, collected with a steering wheel setup across multiple drivers and conditions.
Given camera observations and navigation goals, DMW fuses the driver's long-term preferences (via a learned user embedding) with real-time natural language instructions to produce adaptive, personalized actions.
PDD collects real human driving demonstrations across diverse scenarios in CARLA using a steering wheel setup. It covers a wide range of interactive scenarios: cut-ins, pedestrians, obstacle avoidance, merging, and more.
Download: PDD is coming soon.
Sample drivers from the dataset, recorded at 2× speed:
DMW/
├── grpo/ # GRPO post-training (to be released)
├── checkpoints/ # Checkpoints (to be released)
├── model/ # Model arch
├── team_code/ # CARLA agent
├── leaderboard/ # CARLA leaderboard evaluation
├── scenario_runner/ # CARLA scenario runner
├── pretrained/ # Base VLM checkpoint (InternVL2-1B)
├── data/ # Route configs
- Linux (Ubuntu 20.04+ recommended)
- Conda / Miniconda
- CUDA 12.1 (for PyTorch 2.2.0 + flash-attn)
- CARLA 0.9.15 simulator
conda env create -f environment.yaml
conda activate dmwThis installs Python 3.8 and base system packages. All Python dependencies are installed via pip inside the conda env.
pip install -r requirements.txtpip install flash-attn==2.7.0.post2 --no-build-isolationThis repo contains a stripped-down TRL fork with only GRPO training support.
cd grpo
pip install -e .
cd ..The custom TRL requires:
accelerate >= 1.4.0
datasets >= 3.0.0
transformers >= 4.55.0
These are already covered by requirements.txt.
Download and extract CARLA 0.9.15 to your system (e.g., /home/<user>/carla0915).
Official download: https://github.com/carla-simulator/carla/releases/tag/0.9.15
Edit setup_carla.sh to match your paths, then source it:
# Edit these paths in setup_carla.sh
export CARLA_ROOT=/home/<user>/carla0915
export WORK_DIR=/home/<user>/Downloads/DMW
# Then source it
source setup_carla.shThis sets the following PYTHONPATH entries:
$CARLA_ROOT/PythonAPI/carla$WORK_DIR/scenario_runner_autopilot$WORK_DIR/leaderboard_autopilot$WORK_DIR/grpo
Add source /path/to/setup_carla.sh to your .bashrc / .zshrc to persist across sessions.
The training pipeline uses InternVL2-1B as the base vision-language model.
# Expected path: pretrained/InternVL2-1B/
huggingface-cli download OpenGVLab/InternVL2-1B --local-dir pretrained/InternVL2-1Bconda activate dmw
python -c "import trl; from trl import GRPOTrainer, GRPOConfig; print('TRL OK')"
python -c "import torch; print('PyTorch:', torch.__version__); print('CUDA:', torch.cuda.is_available())"
python -c "import transformers; print('Transformers:', transformers.__version__)"carla module not found
- Ensure
setup_carla.shis sourced and$CARLA_ROOT/PythonAPI/carlais onPYTHONPATH.
flash_attn build fails
- Match your CUDA version exactly. Use
nvcc --versionandpython -c "import torch; print(torch.version.cuda)"to confirm alignment.
transformers version conflict
- TRL requires
>= 4.55.0whileenvironment.yamlpins4.46.3. Afterconda env create, upgrade via:pip install "transformers>=4.55.0"
DeepSpeed compilation errors
- Ensure
ninjais installed:pip install ninja - Set
DS_BUILD_OPS=0to disable custom CUDA kernel compilation during import.
We sincerely thank the researchers and developers for SimLingo for their amazing work.
If you find this work useful, please cite:
@misc{wang2026drivewaypreferencealignment,
title={Drive My Way: Preference Alignment of Vision-Language-Action Model for Personalized Driving},
author={Zehao Wang and Huaide Jiang and Shuaiwu Dong and Yuping Wang and Hang Qiu and Jiachen Li},
year={2026},
eprint={2603.25740},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2603.25740},
}


