Situat3DChange is a 3D visual-language benchmark designed to assess multimodal large language models (MLLMs) on real-world change understanding tasks, including change description, rearrangement planning, and question answering, all with situation awareness.
- π Dataset on Hugging Face: lrp123/Situat3DChange
- π€ Baseline model: SCReasoner
- π Evaluation tools: for both traditional NLP metrics and GPT-based evaluation
We recommend setting up the environment by following the steps in embodied-generalist, as SCReasoner builds on similar infrastructure.
Clone the repo:
git clone https://github.com/RuipingL/Situat3DChange.git
cd Situat3DChange- Download Checkpoints
Download checkpoints.zip from the Hugging Face dataset page, and extract it into:
Situat3DChange/SCReasoner/
- Launch Training
Use the following command to train SCReasoner with SLURM and Submitit:
python launch.py \
--mode submitit \
--config configs/default.yaml \
--name default \
--time 48 \
--num_nodes 1 \
--partition accelerated \
--gpu_per_node 4 \
--mem_per_gpu 100 \
--port 2050Run:
python eval_qa/eval.pyFor traditional metrics (BLEU-4, ROUGE, CIDEr, METEOR, BERTScore):
python eval_longform/eval.pyFor GPT-based evaluation:
python eval_longform/eval_gpt.pyResults for SCReasoner including GPT scores are stored in:
results/SCReasoner/
If you use this project or dataset, please cite us:
@article{liu2025situat3dchange,
title={Situat3DChange: Situated 3D Change Understanding Dataset for Multimodal Large Language Model},
author={Liu, Ruiping and Zheng, Junwei and Chen, Yufan and Wang, Zirui and Peng, Kunyu and Yang, Kailun and Zhang, Jiaming and Pollefeys, Marc and Stiefelhagen, Rainer},
journal={arXiv preprint arXiv:2510.11509},
year={2025}
}
We thank the LEO project, upon which our project is based.