If you appreciate our project, please consider giving us a star β on GitHub to stay updated with the latest developments.
2025.01.16 We have open-sourced the second version of the fully manually annotated video intelligence evaluation benchmark. VideoVista-Benchmark B
2025.12.11 π We have open-sourced the video long chain-of-thought reasoning data for cold-start reinforcement learning, available at Video-CoTs
2025.11.17 π We have partnered with Huawei Cloud to launch the first VideoVista Video Understanding and Reasoning Competition. Everyone is welcome to sign up! For more details, see VideoVista-Competition
2025.04.23 π We release VideoVista-CulturalLingo, the first video evaluation benchmark designed to bridge cultural, linguistic, and domain divide in video comprehension. You can download this benchmark from HuggingFace.
2025.04.13 π We move the previous VideoVista from Uni-MoE to here. It contains the VideoVista (Evaluation), VideoVista-Train (Instruction Tuning), VideoVista-Event (Pertaining). Detailed content is here.
If you find this project useful in your research, please consider cite:
@article{li2024videovista,
title={Videovista: A versatile benchmark for video understanding and reasoning},
author={Li, Yunxin and Chen, Xinyu and Hu, Baotian and Wang, Longyue and Shi, Haoyuan and Zhang, Min},
journal={arXiv preprint arXiv:2406.11303},
year={2024}
}
@article{chen2025videovista,
title={VideoVista-CulturalLingo: 360$\^{}$\backslash$circ $ Horizons-Bridging Cultures, Languages, and Domains in Video Comprehension},
author={Chen, Xinyu and Li, Yunxin and Shi, Haoyuan and Hu, Baotian and Luo, Wenhan and Wang, Yaowei and Zhang, Min},
journal={arXiv preprint arXiv:2504.17821},
year={2025}
}