When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning
This is the official implementation for adaptive visual imagination control
Authors: Shoubin Yu*, Yue Zhang*, Zun Wang, Jaehong Yoon, Huaxiu Yao, Mingyu Ding, Mohit Bansal
Please follow Mindjourney instructions for environment installation and data downloading.
cd visual_spatial_reasoning
conda create -n mindjourney_svc python=3.10 -y
conda activate mindjourney_svc
# Editable install of the SVC module (dependencies defined in pyproject.toml)
pip install -e stable_virtual_camera/
# Optionally reuse shared utilities if needed
pip install -r requirements_svc.txtPlease follow MapGPT instructions for setting up Room2Room evaluation environment.
You need to
(1) Install Matterport3D simulators: follow instructions here. We use the latest version instead of v0.1.
(2) And then install MapGPT dependencies and data.
(3) install stable virtual camera as in visual spatial reasoning.
We install environment with docker, and re-compile Matterport3D with python 3.10, in this case, you will need to download anaconda in the docker environment.
please set up your API keys in api.py for both tasks before running experiments.
cd visual_spatial_reasoning
sh scripts/pipeline_avic.shcd navigation
sh scripts/gpt4o.shWe thank the developers of MindJourney, MapGPT for their public code release.
Please cite our paper if you use our models in your works:
@article{yu2026when,
author = {Shoubin Yu, Yue Zhang, Zun Wang, Jaehong Yoon, Huaxiu Yao, Mingyu Ding, Mohit Bansal},
title = {When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning},
journal = {arxiv: 2602.08236},
year = {2026},
}
