Skip to content

Metaverse-AI-Lab-THU/ImViD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ImViD: Immersive Volumetric Videos for Enhanced VR Engagement

🌟 CVPR 2025 Highlight 🌟

🚨 NEWS 🚨

🔥 IVV Major Update

Realizing Immersive Volumetric Video: A Multimodal Framework for 6-DoF VR Engagement (IVV)

New paper and immersive demo are now available.

Code will be released after the paper submission is completed. Stay tuned!

Project arXiv Bilibili

Zhengxian Yang*,Shi Pan*, Shengqi Wang*, Haoxiang Wang, Li Lin, Guanjun Li,

Zhengqi Wen†, Borong Lin†, Jianhua Tao†, Tao Yu†,

* Equal Contribution, † Corresponding Authors

paper arXiv Project YouTube

量子位 VR陀螺 3D视觉之心


Overview

ImViD Teaser

We introduce ImViD, a multi-view, multi-modal dataset featuring complete space-oriented data capture and various indoor/outdoor scenarios. The dataset includes high-resolution, synchronized audiovisual content captured at 5K resolution and 60 frames per second, with durations ranging from 1 to 5 minutes.

Download

  1. Download and fill the application forms:
  2. Email the completed form to the contacts listed in the Contact section below.
  3. Upon approval, we will send you the download instructions.

For a quick look, a small sample dataset is available on the Release Page. The sample includes Scene 1 videos (300×5K@60 FPS, H.264 MP4) and COLMAP-style camera metadata files: cameras.txt and images.txt.

Dataset Summary

Scene Cameras Static VPs Takes Strategy Avg. S-T Density (m³/s) Viewing Space Duration Storage (GB)
Opera 39 1152 2 1–180° 180° 3:22 226
Laboratory 39 1225 2 2 0.10 360° 1:42 137.3
Classroom 39 1223 2 2 0.10 360° 4:42 497
Meeting 39 1223 1 1–360° 360° 3:16 114
Rendition 39 1620 4 2 0.10 360° 2:02 516
Puppy 39 1404 3 2 0.10 360° 1:50 359
Playing 39 1224 2 2 0.10 360° 1:10 220
Total 16 38:46 2069.3

Preview Clips

Below are low-bitrate preview clips for each scene.

Scene 1: Opera Preview
Scene 1: Opera
Scene 2: Laboratory Preview
Scene 2: Laboratory
Scene 3: Classroom Preview
Scene 3: Classroom
Scene 4: Meeting Preview
Scene 4: Meeting
Scene 5: Rendition Preview
Scene 5: Rendition
Scene 6: Puppy Preview
Scene 6: Puppy
Scene 7: Playing Preview
Scene 7: Playing

Using the Dataset

Extracting Frames

To extract individual frames from the sample video:

python scripts/extract_frames.py \
    --input path/to/your_video_folder \
    --output path/to/output_frames_folder \

Note: Video alignment accuracy is approximately 10–20 ms. If you need higher precision, please contact us.

Camera Parameters

The cameras.txt and images.txt follow COLMAP’s native format. You can feed them directly into COLMAP and run point_triangulator tool to obtain an SfM point cloud.

Note: Make sure the frame filenames exactly match the image names listed in images.txt. You can either update images.txt or rename the extracted images to correspond to the entries in images.txt.

Moving Rig Captured Data

We also provide the captured data with the moving rig. This data poses significant challenges for existing calibration methods, often resulting in errors and floaters. But we believe that this data will greatly contribute to the advancement of the field, and thus we also publicly releasing the data. For more details, refer to our paper.

TODO

  • Release the sample dataset and download instructions.
  • Release the full dataset.
  • Open-source the code after the paper submission is completed.

Citation

@InProceedings{Yang_2025_CVPR,
    author    = {Yang, Zhengxian and Pan, Shi and Wang, Shengqi and Wang, Haoxiang and Lin, Li and Li, Guanjun and Wen, Zhengqi and Lin, Borong and Tao, Jianhua and Yu, Tao},
    title     = {ImViD: Immersive Volumetric Videos for Enhanced VR Engagement},
    booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
    month     = {June},
    year      = {2025},
    pages     = {16554-16564}
}

@misc{yang2026realizingimmersivevolumetricvideo,
      title={Realizing Immersive Volumetric Video: A Multimodal Framework for 6-DoF VR Engagement}, 
      author={Zhengxian Yang and Shengqi Wang and Shi Pan and Hongshuai Li and Haoxiang Wang and Lin Li and Guanjun Li and Zhengqi Wen and Borong Lin and Jianhua Tao and Tao Yu},
      year={2026},
      eprint={2604.09473},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2604.09473}, 
}

Contact

For access to the full dataset, please contact:

License

This project is licensed under the CC BY 4.0 license. You are free to share and adapt the material, provided you give appropriate credit, indicate if changes were made, and do not apply legal terms or technological measures that restrict others from using the material.

About

[CVPR 2025 (Highlight)] ImViD: Immersive Volumetric Videos for Enhanced VR Engagement

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages