RoadBench evaluates MLLMs across six distinct tasks that are fundamental to road network understanding:
- Task 1: Lane Counting - Count lanes from aerial/satellite perspective with reference line guidance
- Task 2: Lane Designation Recognition - Identify lane purposes from bird's-eye view with directional annotations
- Task 3: Road Network Correction - Detect and correct road network topology errors
- Task 4: Lane Counting - Determine the number of available lanes for vehicle travel
- Task 5: Lane Designation Recognition - Identify turning directions and lane purposes (straight, left-turn, right-turn, u-turn)
- Task 6: Road Type Classification - Distinguish between main roads and service roads
roadnetbenchmark/- Core benchmark libraryvlm_client.py- Universal VLM client supporting OpenAI-compatible APIsimage.py- Image processing and annotation utilitiesmetric.py- Evaluation metrics including F1 scores and geometric distance measurescoord.py- Coordinate transformation utilitiessatellite.py- Satellite imagery processingconcurrent_jsonl_writer.py- Efficient concurrent data I/O
Each task includes three types of scripts:
- Evaluation Scripts (
*_*.py) - Run VLM inference on datasets - Metrics Scripts (
*_*_metrics.py) - Calculate performance metrics - Shell Scripts (
*_*.sh) - Automated execution with various configurations
RoadBench employs multiple sophisticated metrics tailored to different task types:
- Junction Point Distance - Assesses accuracy of road segment termination points
- Fréchet Distance - Evaluates similarity of road line trajectories
- Buffer F1 Score - Measures overlap between predicted and ground truth road geometries
- Weighted F1 Score - Balanced precision and recall weighted by class frequency
- Weighted Precision/Recall - Class-weighted precision and recall metrics
- RMSE - Root Mean Square Error for lane counting regression tasks
- Hamming Loss - Proportion of incorrect label predictions across all lane designations
- Exact Match Ratio (Accuracy) - Percentage of samples where all lane designations are correctly predicted
- Python 3.12 or higher
- UV package manager (recommended) or pip
-
Clone the repository
-
Install dependencies:
# Using UV (recommended)
uv install
# Or using pip
pip install -e .- Set up environment variables:
# Create .env file with your API keys
cp .env.example .env
# Edit .env with your VLM API credentialsCreate a .env file with the following variables:
# OpenAI API
OPENAI_API_KEY=your_openai_api_key
# For other VLM providers, add respective API keys
ANTHROPIC_API_KEY=your_anthropic_key
GOOGLE_API_KEY=your_google_keypython bev_lane_counting.pypython bev_lane_designations.pypython bev_roadnet_correction.pypython fpv_lane_counting.pypython fpv_lane_designations.pypython fpv_road_type_classification.pyAfter running evaluations, compute performance metrics:
# Calculate BEV lane counting metrics
python bev_lane_counting_metrics.py
# Calculate BEV lane designations metrics
python bev_lane_designations_metrics.py
# Calculate BEV road network correction metrics
python bev_roadnet_correction_metrics.py
# Calculate FPV lane counting metrics
python fpv_lane_counting_metrics.py
# Calculate FPV lane designations metrics
python fpv_lane_designations_metrics.py
# Calculate FPV road type classification metrics
python fpv_road_type_classification_metrics.pydata/
├── task_1_2/ # BEV Lane Counting & Designations
│ └── dataset/
│ ├── *.png # BEV road images
│ └── labels.jsonl
├── task_3/ # BEV Road Network Correction
├── task_4_5/ # FPV Lane Tasks
└── task_6/ # FPV Road Type Classification