This repository contains the official code and data analysis for the paper: Natural Language Instructions for Scene-Responsive Human-in-the-Loop Motion Planning in Autonomous Driving using Vision-Language-Action Models.
The goal of this project is to evaluate how adding doScenes-style instructions affects the performance of Vision Language Models (VLMs) in autonomous driving planning tasks. Specifically, we measure the Average Displacement Error (ADE) across various scenes using OpenEMMA as our baseline VLM.
We compare:
- Baseline Performance: OpenEMMA running on NuScenes data without specific instruction injection.
- doScenes Performance: OpenEMMA running with injected doScenes prompts to guide motion planning.
ade_anlysis.ipynb: The primary analysis notebook. It contains the code to generate the tables found in the paper, comparing the baselinevlm_results.csvagainst the promptedvlm_doscenes_results.csv.
The src/ directory is a fork of the OpenEMMA repository, modified to support our experiments.
src/main.py: Updated to include logic for doScenes prompt injection.src/automate_openemma.sh: A shell script created to automate the execution of OpenEMMA on the NuScenes dataset to establish baselines.src/automate_doscenes.py: A Python script used to run OpenEMMA specifically with doScenes instruction injections.
This directory contains the raw output from the models and the parsed CSV files used for analysis.
data/VLM_Data/: Contains raw results from the baseline OpenEMMA runs (generated byautomate_openemma.sh).data/VLM_doScenes_Data/: Contains raw results from the doScenes instruction runs (generated byautomate_doscenes.py).data/doScenes/: Contains the dataset of instructions and annotations used for the experiments.annotated_doscenes.csv,entire_doscenes.csv, etc.- Note:
entire_doscenes.csvaggregates data derived from the doScenes repository.
- Parsing Scripts:
parse_vlm_data.py: Parses the raw folders inVLM_Dataintovlm_results.csv.parse_vlm_doscenes_data.py: Parses the raw folders inVLM_doScenes_Dataintovlm_doscenes_results.csv.
- Results:
vlm_results.csv: Aggregated baseline ADE results.vlm_doscenes_results.csv: Aggregated ADE results with instruction injection.