[ICCV 2025] Flash-VStream: Efficient Real-Time Understanding for Long Video Streams

Haoji Zhang^*, Yiqin Wang^*, Yansong Tang^✉, Yong Liu, Jiashi Feng, Xiaojie Jin^✉†

^*Equally contributing first authors, ^✉Correspondence, ^†Project Leader

Work done when interning at Bytedance.

We proposed Flash-VStream, an efficient VLM with a novel Flash Memory mechanism that enables real-time understanding and Q&A of extremely long video streams. Our model achieves outstanding accuracy and efficiency on EgoSchema, MLVU, LVBench, MVBench and Video-MME Benchmarks.

News

[2025/6/26] 🔥 [ICCV 2025] Flash-VStream-Qwen is coming! We release the homepage, paper, Code, and model.
[2024/6/15] 🏅 Our team won the 1st Place at Long-Term Video Question Answering Challenge of LOVEU Workshop@CVPR'24. Here is our certification. We used a Hierarchical Memory model based on Flash-VStream-7b.
[2024/06/12] Flash-VStream-LLaVA is coming! We release the homepage, paper, code and model for Flash-VStream. We release the dataset for VStream-QA benchmark.

Flash-VStream-Qwen

See Flash-VStream-Qwen/README.md.

Flash-VStream-LLaVA

See Flash-VStream-LLaVA/README.md.

Citation

If you find this project useful in your research, please consider citing:

@article{zhang2025flashvstream,
    title={Flash-VStream: Efficient Real-Time Understanding for Long Video Streams}, 
    author={Haoji Zhang and Yiqin Wang and Yansong Tang and Yong Liu and Jiashi Feng and Xiaojie Jin},
    journal={arXiv preprint arXiv:2506.23825},
    year={2025},
}
@article{zhang2024flashvstream,
    title={Flash-vstream: Memory-based real-time understanding for long video streams},
    author={Zhang, Haoji and Wang, Yiqin and Tang, Yansong and Liu, Yong and Feng, Jiashi and Dai, Jifeng and Jin, Xiaojie},
    journal={arXiv preprint arXiv:2406.08085},
    year={2024}
}

Acknowledgement

We would like to thank the following repos for their great work:

This work is built upon the LLaVA.
This work utilizes LLMs from Vicuna.
Some code is borrowed from LLaMA-VID.
We perform video-based evaluation from Video-ChatGPT.

License

This project is licensed under the Apache-2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
Flash-VStream-LLaVA		Flash-VStream-LLaVA
Flash-VStream-Qwen		Flash-VStream-Qwen
assets		assets
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

[ICCV 2025] Flash-VStream: Efficient Real-Time Understanding for Long Video Streams

News

Contents

Flash-VStream-Qwen

Flash-VStream-LLaVA

Citation

Acknowledgement

License

About

Uh oh!

Contributors 2

Languages

License

IVGSZ/Flash-VStream

Folders and files

Latest commit

History

Repository files navigation

[ICCV 2025] Flash-VStream: Efficient Real-Time Understanding for Long Video Streams

News

Contents

Flash-VStream-Qwen

Flash-VStream-LLaVA

Citation

Acknowledgement

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 2

Languages