ImViD: Immersive Volumetric Videos for Enhanced VR Engagement

🌟 CVPR 2025 Highlight 🌟

🚨 NEWS 🚨

🔥 IVV Major Update

Realizing Immersive Volumetric Video: A Multimodal Framework for 6-DoF VR Engagement (IVV)

New paper and immersive demo are now available.

Code will be released after the paper submission is completed. Stay tuned!

Tsinghua University
Migu Beijing Research Institute IACAS

Zhengxian Yang*,Shi Pan*, Shengqi Wang*, Haoxiang Wang, Li Lin, Guanjun Li,

Zhengqi Wen†, Borong Lin†, Jianhua Tao†, Tao Yu†,

* Equal Contribution, † Corresponding Authors

Overview

We introduce ImViD, a multi-view, multi-modal dataset featuring complete space-oriented data capture and various indoor/outdoor scenarios. The dataset includes high-resolution, synchronized audiovisual content captured at 5K resolution and 60 frames per second, with durations ranging from 1 to 5 minutes.

Download

Download and fill the application forms:
- Fillable PDF: docs/application_form.pdf
Email the completed form to the contacts listed in the Contact section below.
Upon approval, we will send you the download instructions.

For a quick look, a small sample dataset is available on the Release Page. The sample includes Scene 1 videos (300×5K@60 FPS, H.264 MP4) and COLMAP-style camera metadata files: cameras.txt and images.txt.

Dataset Summary

Scene	Cameras	Static VPs	Takes	Strategy	Avg. S-T Density (m³/s)	Viewing Space	Duration	Storage (GB)
Opera	39	1152	2	1–180°	–	180°	3:22	226
Laboratory	39	1225	2	2	0.10	360°	1:42	137.3
Classroom	39	1223	2	2	0.10	360°	4:42	497
Meeting	39	1223	1	1–360°	–	360°	3:16	114
Rendition	39	1620	4	2	0.10	360°	2:02	516
Puppy	39	1404	3	2	0.10	360°	1:50	359
Playing	39	1224	2	2	0.10	360°	1:10	220
Total	–	–	16	–	–	–	38:46	2069.3

Preview Clips

Below are low-bitrate preview clips for each scene.

Scene 1: Opera	Scene 2: Laboratory	Scene 3: Classroom
Scene 4: Meeting	Scene 5: Rendition	Scene 6: Puppy
Scene 7: Playing

Using the Dataset

Extracting Frames

To extract individual frames from the sample video:

python scripts/extract_frames.py \
    --input path/to/your_video_folder \
    --output path/to/output_frames_folder \

Note: Video alignment accuracy is approximately 10–20 ms. If you need higher precision, please contact us.

Camera Parameters

The cameras.txt and images.txt follow COLMAP’s native format. You can feed them directly into COLMAP and run point_triangulator tool to obtain an SfM point cloud.

Note: Make sure the frame filenames exactly match the image names listed in images.txt. You can either update images.txt or rename the extracted images to correspond to the entries in images.txt.

Moving Rig Captured Data

We also provide the captured data with the moving rig. This data poses significant challenges for existing calibration methods, often resulting in errors and floaters. But we believe that this data will greatly contribute to the advancement of the field, and thus we also publicly releasing the data. For more details, refer to our paper.

TODO

Release the sample dataset and download instructions.
Release the full dataset.
Open-source the code after the paper submission is completed.

Citation

@InProceedings{Yang_2025_CVPR,
    author    = {Yang, Zhengxian and Pan, Shi and Wang, Shengqi and Wang, Haoxiang and Lin, Li and Li, Guanjun and Wen, Zhengqi and Lin, Borong and Tao, Jianhua and Yu, Tao},
    title     = {ImViD: Immersive Volumetric Videos for Enhanced VR Engagement},
    booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
    month     = {June},
    year      = {2025},
    pages     = {16554-16564}
}

@misc{yang2026realizingimmersivevolumetricvideo,
      title={Realizing Immersive Volumetric Video: A Multimodal Framework for 6-DoF VR Engagement}, 
      author={Zhengxian Yang and Shengqi Wang and Shi Pan and Hongshuai Li and Haoxiang Wang and Lin Li and Guanjun Li and Zhengqi Wen and Borong Lin and Jianhua Tao and Tao Yu},
      year={2026},
      eprint={2604.09473},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2604.09473}, 
}

Contact

For access to the full dataset, please contact:

Zhengxian Yang: [email protected]
Shengqi Wang: [email protected]

License

This project is licensed under the CC BY 4.0 license. You are free to share and adapt the material, provided you give appropriate credit, indicate if changes were made, and do not apply legal terms or technological measures that restrict others from using the material.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets		assets
docs		docs
scripts		scripts
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ImViD: Immersive Volumetric Videos for Enhanced VR Engagement

🌟 CVPR 2025 Highlight 🌟

🚨 NEWS 🚨

🔥 IVV Major Update

Tsinghua University
Migu Beijing Research Institute IACAS

Overview

Download

Dataset Summary

Preview Clips

Using the Dataset

Extracting Frames

Camera Parameters

Moving Rig Captured Data

TODO

Citation

Contact

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ImViD: Immersive Volumetric Videos for Enhanced VR Engagement

🌟 CVPR 2025 Highlight 🌟

🚨 NEWS 🚨

🔥 IVV Major Update

Tsinghua University Migu Beijing Research Institute IACAS

Overview

Download

Dataset Summary

Preview Clips

Using the Dataset

Extracting Frames

Camera Parameters

Moving Rig Captured Data

TODO

Citation

Contact

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Tsinghua University
Migu Beijing Research Institute IACAS

Packages