TokensGen: Harnessing Condensed Tokens for Long Video Generation

Wenqi Ouyang¹ Zeqi Xiao¹ Danni Yang² Yifan Zhou¹ Shuai Yang³ Lei Yang² Jianlou Si² Xingang Pan¹
¹S-Lab, Nanyang Technological University, ²SenseTime Research
³Wangxuan Institute of Computer Technology, Peking University

The official repo for "TokensGen: Harnessing Condensed Tokens for Long Video Generation".

teaser_small.mp4

🔥 News

[2025-12-09] Our code and weights have been released
[2025-07-20] Our project page has been established
[2025-06-26] Our paper is accepted to ICCV 2025

🔧 TODO

Release code and weights

🧐 Methods

Overview of the model. Left: Overall Framework for TokensGen. Right: Trainable Modules.

🌿 Installation with conda

git clone https://github.com/Vicky0522/TokensGen.git
cd TokensGen
# install required packages
conda env create -f environment.yml
# install longvgen
python setup.py develop

🚀 Quick Start

Download the weights

Download CogVideoX-5b, To2V and T2To. Place them under the created folder weights/.

Editing (To2V)

# single-GPU inference
CUDA_VISIBLE_DEVICES=0 python infer_cogvideo_mp_fifo.py --config config/infer/edit.yaml

# multi-GPU inference
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python infer_cogvideo_mp_fifo.py --config config/infer/edit.yaml

Generation (T2To + To2V)

# single-GPU inference
CUDA_VISIBLE_DEVICES=0 python infer_cogvideo_mp_fifo.py --config config/infer/gen.yaml

# multi-GPU inference
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python infer_cogvideo_mp_fifo.py --config config/infer/gen.yaml

⚙️ Training

Download the Dataset

Download the MiraData

Train the To2V and T2To Model

Change the video_dir and csv_file in the YAML file to your own paths. We provide the CSV files that were used to train the T2To Model. After setting the paths, run the following command:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch train_cogvideo_to2v.py --config config/train/cogvideo_5b_vaevip_4x8x12_to2v.yaml

# After the training of To2V finishes, run the data processing script to obtain VAE latents for long videos (only the selected videos are calculated):
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch calculate_vae_latents.py --config config/dataprocess/cogvideo_5b_vaevip_4x8x12_calculate_vae_latents.yaml

# Change the video_dir to the path of calculated VAE latents, and train the T2To Model
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch train_cogvideo_t2to.py --config config/train/cogvideo_5b_vaevip_4x8x12_t2to.yaml

✏️ Citation

If our work is helpful for your research, please consider citing:

@inproceedings{ouyang2025tokensgen,
  title={TokensGen: Harnessing Condensed Tokens for Long Video Generation},
  author={Ouyang, Wenqi and Xiao, Zeqi and Yang, Danni and Zhou, Yifan and Yang, Shuai and Yang, Lei and Si, Jianlou and Pan, Xingang},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={18197--18206},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
assets		assets
config		config
demos		demos
longvgen		longvgen
.gitignore		.gitignore
README.md		README.md
VERSION		VERSION
calculate_vae_latents.py		calculate_vae_latents.py
environment.yml		environment.yml
infer_cogvideo_mp_fifo.py		infer_cogvideo_mp_fifo.py
pca.py		pca.py
setup.py		setup.py
train_cogvideo_t2to.py		train_cogvideo_t2to.py
train_cogvideo_to2v.py		train_cogvideo_to2v.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TokensGen: Harnessing Condensed Tokens for Long Video Generation

🔥 News

🔧 TODO

🧐 Methods

🌿 Installation with conda

🚀 Quick Start

Download the weights

Editing (To2V)

Generation (T2To + To2V)

⚙️ Training

Download the Dataset

Train the To2V and T2To Model

✏️ Citation

About

Uh oh!

Languages

Vicky0522/TokensGen

Folders and files

Latest commit

History

Repository files navigation

TokensGen: Harnessing Condensed Tokens for Long Video Generation

🔥 News

🔧 TODO

🧐 Methods

🌿 Installation with conda

🚀 Quick Start

Download the weights

Editing (To2V)

Generation (T2To + To2V)

⚙️ Training

Download the Dataset

Train the To2V and T2To Model

✏️ Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages