2nd Workshop on Multimodal Spatial Intelligence

June 3, 2026 · 8:10 AM – 12:30 PM

Room 601

About the Workshop

Our multi-modal spatial intelligence (MUSI) workshop addresses how multimodal large language models (MLLMs) understand, reason about, and interact with spatial information from the physical world. The multimodal nature of spatial intelligence—requiring integration of images, videos, and 3D data—necessitates bringing together researchers from diverse domains: computer vision, robotics, graphics, and NLP. While recent MLLMs show promising visual-spatial capabilities, fundamental questions remain about spatial relationships, 3D environment modeling, and real-world spatial reasoning. This workshop explores how MLLMs learn spatial representations across modalities, advance world modeling and embodied AI, and address ethical considerations. We aim to establish benchmarks and foster cross-disciplinary collaboration to advance spatial reasoning in multimodal AI.

Keywords:

Spatial Reasoning Multimodal Large Language Model World Models Embodied AI 3D Understanding

Program

08:10 – 08:20 Welcome & Introduction
08:20 – 08:50 Keynote Talk 1 Katerina Fragkiadaki (Carnegie Mellon University)
08:50 – 09:20 Keynote Talk 2 [Slides] Angel X. Chang (Simon Fraser University)
09:20 – 09:50 Keynote Talk 3 Chuang Gan (UMass Amherst / MIT-IBM Watson AI Lab)
09:50 – 10:05 ☕ Coffee Break & Social
10:05 – 10:35 Keynote Talk 4 Roozbeh Mottaghi (Skild AI / University of Washington)
10:30 – 12:30 Poster Session ExHall A · Board IDs 142 – 149
10:35 – 11:05 Keynote Talk 5 [Slides] Saining Xie (NYU / AMI Labs)
11:05 – 11:35 Keynote Talk 6 Ranjay Krishna (University of Washington / Allen Institute for AI)
11:35 – 12:05 Keynote Talk 7 Kristen Grauman (University of Texas at Austin)
12:05 – 12:30 Closing Remarks

Call for Papers

Topics

We invite submissions on topics including, but not limited to:

  • Spatial Reasoning in Multimodal LLMs
  • World Models for Physical Understanding
  • Embodied Agents and VLA Models
  • 3D Scene Understanding, Generation, and Reconstruction
  • Open-Vocabulary 2D/3D Perception and Reasoning
  • Temporal and Causal Reasoning in Dynamic Environments
  • Multimodal Interaction, Grounding, and Planning
  • Neuro-symbolic Approaches for Spatial Intelligence
  • Benchmarks and Datasets for Spatial Reasoning
  • Trust, Ethics, and Societal Impact of Spatial AI

Important Dates

Submission Deadline March 13, 2026 (23:59 AoE) Loading...
Author Notification April 3, 2026 (23:59 AoE) Loading...
Camera Ready April 24, 2026 (23:59 AoE) Loading...
Workshop Date June 3rd Morning, 2026 -

*All deadlines are Anywhere on Earth (AoE). Timelines are subject to change.

Submission Guidelines

  • Eligibility: We welcome both new work and papers previously accepted at other venues.
  • Format: For new work, papers must be submitted in the CVPR 2026 format. Previously accepted papers may be submitted in their original format, but must still be anonymized for the review process.
  • Length: Max 8 pages (excluding references).
  • Review: Double-blind peer review.
  • Presentation: Accepted papers will be presented as posters.
Submit via OpenReview

Submissions are closed. The OpenReview portal is no longer accepting new papers.

Publication

The workshop will be non-archival. Authors of accepted papers retain the full copyright of their work and are free to submit extended versions to conferences or journals.

Invited Speakers

Photo of Chuang Gan

Chuang Gan

UMass Amherst / MIT-IBM Watson AI Lab

Photo of Roozbeh Mottaghi

Roozbeh Mottaghi

Skild AI / UW

Photo of Saining Xie

Saining Xie

NYU / AMI Labs

Photo of Ranjay Krishna

Ranjay Krishna

UW / AI2

Photo of Kristen Grauman

Kristen Grauman

UT Austin

Organizers

Photo of Juil Koo

Juil Koo

KAIST

Photo of Songyou Peng

Songyou Peng

Google DeepMind

Photo of Sanja Fidler

Sanja Fidler

NVIDIA

Photo of Leonidas Guibas

Leonidas Guibas

Stanford / Google DeepMind

Photo of Minhyuk Sung

Minhyuk Sung

KAIST

Accepted Papers

Congratulations to all accepted authors!

Poster presentations

ExHall A · 10:30 AM – 12:30 PM · Poster board IDs 142 – 149 (2 posters per board)

Reviewer Acknowledgement

We sincerely thank the following reviewers for their time, expertise, and thoughtful feedback during the peer review process.

  • Siyi Chen
  • Daehyeon Choi
  • Jaewoo Jung
  • Minseo Kim
  • Seonho Lee
  • Damiano Marsili
  • Kiet T. Nguyen
  • Chanho Park
  • Zekun Qi
  • Marc Unzueta
  • Austin T. Wang
  • Chun-Hsiao Yeh
  • Baiqiao Yin
  • Jin Yoo
  • Sangwoo Youn

Contact

For any inquiries about the workshop, please reach out via email: