Improving LLM Video Understanding with 16 Frames Per Second

🚀🚀 Welcome to the repo of F-16!

F-16 is a powerful video large language model (LLM) that perceives high-frame-rate videos, which is developed by the Department of Electronic Engineering at Tsinghua University and ByteDance.

🔥 News

2025-07-03: We release the final checkpoint of F-16.
2025-06-18: We release the code of F-16.

⚡️ Future Plans

~~Release the code.~~
~~Release final F-16.~~

🌈 How to Use

How to train a model

Prepare the dataset following scripts/example_sft.json.
Download LLaVA-OneVision Model from huggingface.
Modify the parameters in scripts/train_sft.sh.
Run bash scripts/train_sft.sh.

How to evaluate a checkpoint

Prepare the dataset following scripts/example_sft.json.
Modify the parameters in scripts/eval.sh.
Run bash scripts/eval.sh.

👀 Team

Team Tsinghua: Yixuan Li, Changli Tang, Jimin Zhuang, Yudong Yang, Guangzhi Sun, Chao Zhang

Team ByteDance: Wei Li, Zejun Ma

✨ Citation

If you find F-16 useful, please cite the paper:

@inproceedings{li2025improving,
  title={Improving LLM Video Understanding with 16 Frames Per Second},
  author={Li, Yixuan and Tang, Changli and Zhuang, Jimin and Yang, Yudong and Sun, Guangzhi and Li, Wei and Ma, Zejun and Zhang, Chao},
  booktitle={Proc. ICML},
  year={2025}, 
  address={Vancouver}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
llava		llava
scripts		scripts
third-party-license		third-party-license
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Improving LLM Video Understanding with 16 Frames Per Second

🔥 News

⚡️ Future Plans

🌈 How to Use

How to train a model

How to evaluate a checkpoint

👀 Team

✨ Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

bytedance/F-16

Folders and files

Latest commit

History

Repository files navigation

Improving LLM Video Understanding with 16 Frames Per Second

🔥 News

⚡️ Future Plans

🌈 How to Use

How to train a model

How to evaluate a checkpoint

👀 Team

✨ Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages