Realizing Immersive Volumetric Video: A Multimodal Framework for 6-DoF VR Engagement (IVV)
New paper and immersive demo are now available.
Code will be released after the paper submission is completed. Stay tuned!
Zhengxian Yang*,Shi Pan*, Shengqi Wang*, Haoxiang Wang, Li Lin, Guanjun Li,
Zhengqi Wen†, Borong Lin†, Jianhua Tao†, Tao Yu†,
* Equal Contribution, † Corresponding Authors
We introduce ImViD, a multi-view, multi-modal dataset featuring complete space-oriented data capture and various indoor/outdoor scenarios. The dataset includes high-resolution, synchronized audiovisual content captured at 5K resolution and 60 frames per second, with durations ranging from 1 to 5 minutes.
- Download and fill the application forms:
- Fillable PDF: docs/application_form.pdf
- Email the completed form to the contacts listed in the Contact section below.
- Upon approval, we will send you the download instructions.
For a quick look, a small sample dataset is available on the Release Page. The sample includes Scene 1 videos (300×5K@60 FPS, H.264 MP4) and COLMAP-style camera metadata files: cameras.txt and images.txt.
| Scene | Cameras | Static VPs | Takes | Strategy | Avg. S-T Density (m³/s) | Viewing Space | Duration | Storage (GB) |
|---|---|---|---|---|---|---|---|---|
| Opera | 39 | 1152 | 2 | 1–180° | – | 180° | 3:22 | 226 |
| Laboratory | 39 | 1225 | 2 | 2 | 0.10 | 360° | 1:42 | 137.3 |
| Classroom | 39 | 1223 | 2 | 2 | 0.10 | 360° | 4:42 | 497 |
| Meeting | 39 | 1223 | 1 | 1–360° | – | 360° | 3:16 | 114 |
| Rendition | 39 | 1620 | 4 | 2 | 0.10 | 360° | 2:02 | 516 |
| Puppy | 39 | 1404 | 3 | 2 | 0.10 | 360° | 1:50 | 359 |
| Playing | 39 | 1224 | 2 | 2 | 0.10 | 360° | 1:10 | 220 |
| Total | – | – | 16 | – | – | – | 38:46 | 2069.3 |
Below are low-bitrate preview clips for each scene.
![]() Scene 1: Opera |
![]() Scene 2: Laboratory |
![]() Scene 3: Classroom |
![]() Scene 4: Meeting |
![]() Scene 5: Rendition |
![]() Scene 6: Puppy |
![]() Scene 7: Playing |
||
To extract individual frames from the sample video:
python scripts/extract_frames.py \
--input path/to/your_video_folder \
--output path/to/output_frames_folder \Note: Video alignment accuracy is approximately 10–20 ms. If you need higher precision, please contact us.
The cameras.txt and images.txt follow COLMAP’s native format. You can feed them directly into COLMAP and run point_triangulator tool to obtain an SfM point cloud.
Note: Make sure the frame filenames exactly match the image names listed in
images.txt. You can either updateimages.txtor rename the extracted images to correspond to the entries inimages.txt.
We also provide the captured data with the moving rig. This data poses significant challenges for existing calibration methods, often resulting in errors and floaters. But we believe that this data will greatly contribute to the advancement of the field, and thus we also publicly releasing the data. For more details, refer to our paper.
- Release the sample dataset and download instructions.
- Release the full dataset.
- Open-source the code after the paper submission is completed.
@InProceedings{Yang_2025_CVPR,
author = {Yang, Zhengxian and Pan, Shi and Wang, Shengqi and Wang, Haoxiang and Lin, Li and Li, Guanjun and Wen, Zhengqi and Lin, Borong and Tao, Jianhua and Yu, Tao},
title = {ImViD: Immersive Volumetric Videos for Enhanced VR Engagement},
booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
month = {June},
year = {2025},
pages = {16554-16564}
}
@misc{yang2026realizingimmersivevolumetricvideo,
title={Realizing Immersive Volumetric Video: A Multimodal Framework for 6-DoF VR Engagement},
author={Zhengxian Yang and Shengqi Wang and Shi Pan and Hongshuai Li and Haoxiang Wang and Lin Li and Guanjun Li and Zhengqi Wen and Borong Lin and Jianhua Tao and Tao Yu},
year={2026},
eprint={2604.09473},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2604.09473},
}For access to the full dataset, please contact:
- Zhengxian Yang: [email protected]
- Shengqi Wang: [email protected]
This project is licensed under the CC BY 4.0 license. You are free to share and adapt the material, provided you give appropriate credit, indicate if changes were made, and do not apply legal terms or technological measures that restrict others from using the material.







