ChangingGrounding: 3D Visual Grounding in Changing Scenes

Miao Hu¹, Zhiwei Huang², Tai Wang⁴, Jiangmiao Pang⁴, Dahua Lin^3,4, Nanning Zheng^1*, Runsen Xu^3,4*

¹Xi’an Jiaotong University, ²Zhejiang University, ³The Chinese University of Hong Kong, ⁴Shanghai AI Laboratory

^*Corresponding Author

🔔News

🔥[2025-10-17]: We released our paper. The code and benchmark will be released after the paper is accepted.

Abstract

Real-world robots localize objects from natural-language instructions while scenes around them keep changing. Yet most of the existing 3D visual grounding (3DVG) method still assumes a reconstructed and up-to-date point cloud, an assumption that forces costly re-scans and hinders deployment. We argue that 3DVG should be formulated as an active, memory-driven problem, and we introduce ChangingGrounding, the first benchmark that explicitly measures how well an agent can exploit past observations, explore only where needed, and still deliver precise 3D boxes in changing scenes. To set a strong reference point, we also propose Mem-ChangingGrounder, a zero-shot method for this task that marries cross-modal retrieval with lightweight multi-view fusion: it identifies the object type implied by the query, retrieves relevant memories to guide actions, then explores the target efficiently in the scene, falls back when previous operations are invalid, performs multi-view scanning of the target, and projects the fused evidence from multi-view scans to get accurate object bounding boxes. We evaluate different baselines on ChangingGrounding, and our Mem-ChangingGrounder achieves the highest localization accuracy while greatly reducing exploration cost. We hope this benchmark and method catalyze a shift toward practical, memory-centric 3DVG research for real-world applications.

📄 License

Shield:

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

🔗 Citation

If you find our work helpful, please consider citing and starring this repo 🌟:

@misc{hu2025changinggrounding3dvisualgrounding,
      title={ChangingGrounding: 3D Visual Grounding in Changing Scenes}, 
      author={Miao Hu and Zhiwei Huang and Tai Wang and Jiangmiao Pang and Dahua Lin and Nanning Zheng and Runsen Xu},
      year={2025},
      eprint={2510.14965},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2510.14965}, 
}

Acknowledgment

The ChangingGrounding dataset builds upon 3RScan and ReferIt3D. Our method implementation is adapted from VLM-Grounder. We thank these teams for their open-source contributions.

Contact

Miao Hu: [email protected]
Runsen Xu: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets		assets
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ChangingGrounding: 3D Visual Grounding in Changing Scenes

🔔News

Abstract

📄 License

🔗 Citation

Acknowledgment

Contact

About

Uh oh!

Releases

Packages

hm123450/ChangingGroundingBenchmark

Folders and files

Latest commit

History

Repository files navigation

ChangingGrounding: 3D Visual Grounding in Changing Scenes

🔔News

Abstract

📄 License

🔗 Citation

Acknowledgment

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages