When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning

This is the official implementation for adaptive visual imagination control

Authors: Shoubin Yu, Yue Zhang, Zun Wang, Jaehong Yoon, Huaxiu Yao, Mingyu Ding, Mohit Bansal

University of North Carolina at Chapel Hill, Nanyang Technological University

Figure 1. Different cases in always-on visual imagination. Imagined views are generated independently for different beam-searched actions (shown by multiple arrows). Case 1 (Helpful): Visual imagination reveals previously unseen viewpoints, enabling helpful spatial reasoning. Case 2 (Misleading): Imagination fails to preserve task-relevant objects (e.g., the white table in the red box), resulting in incorrect spatial inference and wrong answers. Case 3 (Unnecessary): The required information is already clearly observable in the original view (e.g., the bathtub in the blue box), making additional imagined views redundant.

Installation

for visual spatial reasoning

Please follow Mindjourney instructions for environment installation and data downloading.

cd visual_spatial_reasoning
conda create -n mindjourney_svc python=3.10 -y
conda activate mindjourney_svc

# Editable install of the SVC module (dependencies defined in pyproject.toml)
pip install -e stable_virtual_camera/

# Optionally reuse shared utilities if needed
pip install -r requirements_svc.txt

for navigation

Please follow MapGPT instructions for setting up Room2Room evaluation environment.

You need to

(1) Install Matterport3D simulators: follow instructions here. We use the latest version instead of v0.1.

(2) And then install MapGPT dependencies and data.

(3) install stable virtual camera as in visual spatial reasoning.

We install environment with docker, and re-compile Matterport3D with python 3.10, in this case, you will need to download anaconda in the docker environment.

Experiments

please set up your API keys in api.py for both tasks before running experiments.

visual spatial reasoning

cd visual_spatial_reasoning
sh scripts/pipeline_avic.sh

navigation

cd navigation
sh scripts/gpt4o.sh

Acknowledgments

We thank the developers of MindJourney, MapGPT for their public code release.

Reference

Please cite our paper if you use our models in your works:

@article{yu2026when,
  author    = {Shoubin Yu, Yue Zhang, Zun Wang, Jaehong Yoon, Huaxiu Yao, Mingyu Ding, Mohit Bansal},
  title     = {When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning},
  journal   = {arxiv: 2602.08236},
  year      = {2026},
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
asset		asset
navigation		navigation
visual_spatial_reasoning		visual_spatial_reasoning
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning

Authors: Shoubin Yu, Yue Zhang, Zun Wang, Jaehong Yoon, Huaxiu Yao, Mingyu Ding, Mohit Bansal

University of North Carolina at Chapel Hill, Nanyang Technological University

Installation

for visual spatial reasoning

for navigation

Experiments

visual spatial reasoning

navigation

Acknowledgments

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning

Authors: Shoubin Yu*, Yue Zhang*, Zun Wang, Jaehong Yoon, Huaxiu Yao, Mingyu Ding, Mohit Bansal

University of North Carolina at Chapel Hill, Nanyang Technological University

Installation

for visual spatial reasoning

for navigation

Experiments

visual spatial reasoning

navigation

Acknowledgments

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Authors: Shoubin Yu, Yue Zhang, Zun Wang, Jaehong Yoon, Huaxiu Yao, Mingyu Ding, Mohit Bansal

Packages