🎞️ Infinity-RoPE

Action-Controllable Infinite Video Generation Emerges From Autoregressive Self-Rollout

CVPR 2026

Hidir Yesiltepe¹ · Tuna Han Salih Meral¹ · Adil Kaan Akan² · Kaan Oktay² · Pinar Yanardag¹
¹Virginia Tech ²fal

Infinity-RoPE-teaser.1.2.mp4

🔥 News

[2026 Feb 08] Infinity-RoPE has been accepted at CVPR 2026 🎉!
[2026 Feb 08] Thanks to @zhuhz22's recommendation, we adapted Causal Forcing checkpoints to Infinity-RoPE!
[2026 Jan 16] We released the code.
[2026 Jan 11] LongLive adapted Infinity-RoPE for adapting their long video generator to infinite video generator!
[2025 Nov 25] We released the paper and the project page.

Self-Forcing + ∞-RoPE	Causal-Forcing + ∞-RoPE
harry_potter_sf.mp4	harry_potter_cf.mp4

Requirements

We tested this repo on the following setup:

Nvidia GPU with at least 24 GB memory (RTX 4090, A100, and H100 are tested).
Linux operating system.
64 GB RAM.

Other hardware setup could also work but hasn't been tested.

Installation

Create a Python 3.10 environment, install dependencies, and download models:

bash setup_env.sh

Inference

bash inference.sh

🎬 Prompting Structure

Infinity-RoPE utilizes a specific syntax to control temporal duration and scene transitions. Examples are provided in prompts/infinity_rope_prompts.txt. The core format for an action segment is:
"action_description[duration]"

1. Syntax Overview

Operator	Name	Function
`[Ns]`	Duration	Sets the segment length in seconds (e.g., `[10s]`).
`\|`	Separator	Chains multiple action prompts together.
`#`	Scene Cut	When placed inside brackets (e.g., `[10s#]`), it triggers a hard cut.
`;`	Subtitle Toggle	Separates action prompts (left) from subtitle text (right).

2. Examples

Continuous Single Action

Generates one seamless video of the specified length.

"action_1_prompt[30s]"

Total Length: 30s
Result: A single 30-second continuous shot.

Multi-Action (Smooth Transition)

Transitions between different behaviors within a single, continuous camera shot.

"action_1_prompt[5s] | action_2_prompt[10s] | action_3_prompt[15s]"

Total Length: 30s ($5s + 10s + 15s$)
Result: The subject transitions naturally from action 1 to 2 to 3 without a camera break.

Multi-Scene (Cinematic Cuts)

Forces the model to perform a hard jump-cut at the beginning of specific segments.

"action_1_prompt[10s] | action_2_prompt[10s#] | action_3_prompt[10s#]"

Total Length: 30s
Result: Three distinct 10-second scenes. The # at the start of action 2 and 3 initiates the scene cuts.

Multi-Scene with Subtitles

Combines scene cuts with synchronized text overlays.

"action_1[10s] | action_2[10s#] | action_3[10s#] ; subtitle_1 | subtitle_2 | subtitle_3"

Total Length: 30s
Result: Three distinct 10-second scenes. Each segment displays its corresponding subtitle from the list provided after the ;.

Note:

As KV Flush is effectively an index change operation, we found it quite useful to repeat the characteristics of environment and people in every action prompt. See examples in prompts/infinity_rope_prompts.txt
Our model works better with long, detailed prompts since it's trained with such prompts. We will integrate prompt extension into the codebase (similar to Wan2.1) in the future. For now, it is recommended to use third-party LLMs (such as GPT-4o) to extend your prompt before providing to the model.
You may want to adjust FPS so it plays smoothly on your device.
The speed can be improved by enabling torch.compile, TAEHV-VAE, or using FP8 Linear layers, although the latter two options may sacrifice quality. It is recommended to use torch.compile if possible and enable TAEHV-VAE if further speedup is needed.

Training

Download text prompts and ODE initialized checkpoint

huggingface-cli download gdhe17/Self-Forcing checkpoints/ode_init.pt --local-dir .
huggingface-cli download gdhe17/Self-Forcing vidprom_filtered_extended.txt --local-dir prompts

Note: Our training algorithm (except for the GAN version) is data-free (no video data is needed). For now, we directly provide the ODE initialization checkpoint and will add more instructions on how to perform ODE initialization in the future (which is identical to the process described in the CausVid repo).

Self Forcing Training with DMD

torchrun --nnodes=8 --nproc_per_node=8 --rdzv_id=5235 \
  --rdzv_backend=c10d \
  --rdzv_endpoint $MASTER_ADDR \
  train.py \
  --config_path configs/self_forcing_dmd.yaml \
  --logdir logs/self_forcing_dmd \
  --disable-wandb

Our training run uses 600 iterations and completes in under 2 hours using 64 H100 GPUs. By implementing gradient accumulation, it should be possible to reproduce the results in less than 16 hours using 8 H100 GPUs.

Acknowledgements

This codebase is built on top of the open-source implementation of Self-Forcing. We also appreciate Infinite-Forcing for providing an attention sink checkpoint, and Causal Forcing for providing high dynamic degree & imaging quality checkpoint.

Citation

If you find this codebase useful for your research, please kindly cite our paper:

@article{yesiltepe2025infinity,
  title={Infinity-RoPE: Action-Controllable Infinite Video Generation Emerges From Autoregressive Self-Rollout},
  author={Yesiltepe, Hidir and Meral, Tuna Han Salih and Akan, Adil Kaan and Oktay, Kaan and Yanardag, Pinar},
  journal={arXiv preprint arXiv:2511.20649},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
configs		configs
demo_utils		demo_utils
images		images
model		model
pipeline		pipeline
prompts		prompts
scripts		scripts
templates		templates
trainer		trainer
utils		utils
wan		wan
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.py		demo.py
inference.py		inference.py
inference.sh		inference.sh
inference_causal_forcing.sh		inference_causal_forcing.sh
requirements.txt		requirements.txt
setup.py		setup.py
setup_env.sh		setup_env.sh
test_loading.py		test_loading.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎞️ Infinity-RoPE

Action-Controllable Infinite Video Generation Emerges From Autoregressive Self-Rollout

CVPR 2026

🔥 News

Requirements

Installation

Inference

🎬 Prompting Structure

1. Syntax Overview

2. Examples

Continuous Single Action

Multi-Action (Smooth Transition)

Multi-Scene (Cinematic Cuts)

Multi-Scene with Subtitles

Training

Download text prompts and ODE initialized checkpoint

Self Forcing Training with DMD

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎞️ Infinity-RoPE

Action-Controllable Infinite Video Generation Emerges From Autoregressive Self-Rollout

CVPR 2026

🔥 News

Requirements

Installation

Inference

🎬 Prompting Structure

1. Syntax Overview

2. Examples

Continuous Single Action

Multi-Action (Smooth Transition)

Multi-Scene (Cinematic Cuts)

Multi-Scene with Subtitles

Training

Download text prompts and ODE initialized checkpoint

Self Forcing Training with DMD

Acknowledgements

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages