DynamicVL is a comprehensive framework for analyzing long-term urban dynamics through remote sensing imagery. This repository ships the DVL-Suite dataset, task-specific benchmarks, and evaluation scripts that cover both closed-form vision-language tasks and pixel-level change detection.
- 2025/08 DynamicVL was accepted to NeurIPS 2025! We will add encoder-decoder-based semantic change detection implementations to this repo. Stay tuned!
# Create the conda environment
conda create -n dvl python=3.10 -y
conda activate dvl
# Install the package
(dvl): pip install -e .
# Optional: manually install PyTorch if the vLLM dependency conflicts with your environment
# Note: Downgrade cu128 if it conflicts with your CUDA drivers.
(dvl): pip install -U torch torchvision xformers --index-url https://download.pytorch.org/whl/cu128
# Optional: fix "version `GLIBCXX_3.4.32' not found" errors
(dvl): conda install -c conda-forge gcc=13 gxx=13 -y
(dvl): export LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATHDownload the DVL-Suite dataset and unzip the training and test archives:
mkdir data && cd data
unzip train.zip
unzip test.zipExpected directory layout:
data/
├── train/ # DVL-Instruct (Training Set)
│ ├── images/{city}/{region}/{image_id_timestamp}.tif
│ ├── cd_sem_masks/
│ ├── cd_refer_seg_masks/
│ ├── regional_caption/
│ ├── metadata.json
│ ├── basic_change_choice_qa.json
│ ├── basic_change_report_qa.json
│ ├── change_speed_choice_qa.json
│ ├── change_speed_report_qa.json
│ ├── change_referring_seg_qa.json
│ ├── eco_assessment.json
│ ├── dense_temporal_caption.json
│ └── regional_caption.json
└── test/ # DVL-Bench (Test Set)
└── [same structure as train/]
from dvl.vqa.dataset import DynamicVLVQA
dataset = DynamicVLVQA(subset="BCA-QA", data_dir="data/train")
for item in dataset:
# images: List[PIL.Image] across time
# messages: multi-turn Q&A dicts
# metadata: contains id, task_type, prompts, options_str, image_list, time_stamps
print(item)(dvl): python -m dvl.vqa.run_vllm \
--model_id Qwen/Qwen2.5-VL-3B-Instruct \
--subset BCA-QAAvailable subsets:
BCA-QA- Basic Change Analysis (QA)CSE-QA- Change Speed Estimation (QA)BCA-Report- Basic Change Analysis (Report)CSE-Report- Change Speed Estimation (Report)DTC- Dense Temporal CaptionRCC- Regional Change CaptionEA- Environmental Assessment
Note: Set
--batch_size 1forllava-hf/llava-onevision-qwen2-7b-ov-hfto avoid GPU OOM.
Output: results/vqa/Qwen--Qwen2.5-VL-3B-Instruct/ stores .jsonl predictions and .json summaries.
export AZURE_OPENAI_BASE="{your-azure-endpoint}"
export AZURE_OPENAI_KEY="{your-api-key}"
export AZURE_OPENAI_API_VERSION="{your-api-version}"
(dvl): python -m dvl.vqa.run_azure_openai \
--model_id gpt-4o \
--subset BCA-QAOutput: results/vqa/gpt-4o/ stores task-specific .jsonl predictions and .json metrics.
export AZURE_OPENAI_BASE="{your-azure-endpoint}"
export AZURE_OPENAI_KEY="{your-api-key}"
export AZURE_OPENAI_API_VERSION="{your-api-version}"
(dvl): python -m dvl.vqa.pretty_print.gpt_eval \
--gpt_model_id gpt-4.1-mini \
--eval_model_id "Qwen/Qwen2.5-VL-3B-Instruct" \
--subset DTCSupported subsets:
BCA-ReportCSE-ReportDTCRCC
Output: results/vqa/Qwen--Qwen2.5-VL-3B-Instruct/ includes GPT-scored .jsonl files (for example DTC.gpt-4.1-mini.jsonl).
# Multi-choice QA tasks (BCA-QA, CSE-QA, EA)
(dvl): python -m dvl.vqa.pretty_print.acc_table
# Open-ended generation tasks (Reports & Captions)
(dvl): python -m dvl.vqa.pretty_print.gen_table --gpt_model_id gpt-4.1-miniTabulated metrics are printed to console and saved in results/vqa/.
from dvl.vqa.dataset import DynamicVLReferSeg
dataset = DynamicVLReferSeg(data_dir="data/train")
for item in dataset:
# t1_image, t2_image: np.ndarray of shape (1024, 1024, 3)
# gt_mask: binary change mask
# messages: instruction-response history
# cd_info: source/target land-cover classes and indices
# metadata: contains the unique evaluation id
print(item)Organize predicted masks using item["metadata"]["id"] as the filename stem:
{your-pred-dir}/
├── change_referring_seg_qa_0.png
├── change_referring_seg_qa_1.png
└── ...
Run the evaluation utilities:
# LISA-style binary IoU metrics
(dvl): python -m dvl.vqa.pretty_print.referseg_iou --pred_dir "{your-pred-dir}"
# MambaCD-style semantic change detection metrics
(dvl): python -m dvl.vqa.pretty_print.referseg_cd --pred_dir "{your-pred-dir}"Scores are printed to console and stored alongside the submitted prediction masks.
If you find DynamicVL useful, please cite:
@article{xuan2025dynamicvl,
title={DynamicVL: Benchmarking Multimodal Large Language Models for Dynamic City Understanding},
author={Xuan, Weihao and Wang, Junjue and Qi, Heli and Chen, Zihang and Zheng, Zhuo and Zhong, Yanfei and Xia, Junshi and Yokoya, Naoto},
journal={arXiv preprint arXiv:2505.21076},
year={2025}
}DynamicVL is released under the Apache-2.0 License.
DynamicVL builds on NAIP aerial imagery and the open-source multimodal community. We appreciate all contributors who benchmarked cutting-edge MLLMs on our dataset and shared feedback during the public release.