by Chen Liu and Tobias Ritschel
International Conference on Computer Vision (ICCV 2025)
Please also check out our (Paper | Project Page)
This repo provides the official implementation of our paper in PyTorch.
We have also provided a compact pseudocode that shows the core logic of our bi-flow algorithm, without bogging you down in all the less relevant code files.
# 1. Clone the repo
git clone https://github.com/ryushinn/ode-video.git
cd ode-video
# 2. Recommend installing in a new virtual env with python 3.10, such as conda:
conda create -n ode-video python=3.10
conda activate ode-video
# 3. Install the dependencies
pip install -r requirements.txtOur test environment is Ubuntu 22.04.4 x64 and NVIDIA RTX4090 GPU with CUDA 12.
Our dataloader expects the following folder structure:
data
└── {Dataset}
├── {train_split}
│ └── ... # Nested folders are allowed
│ └── {clip_folder}
│ ├── 000001.jpg # first frame
│ ├── 000002.jpg # second frame
│ └── ...
└── {test_split}
└── ...
└── {clip_folder}
├── 000001.jpg
├── 000002.jpg
└── ...Every (sub)folder in train or test split should only contain consecutive frames from the same video clip, which are named in sorted order.
For example, you can setup sky dataset as in the above format, using:
# If daily download limit was reached, please download manually at
# https://drive.google.com/uc?id=1xWLiU-MBGN7MrsFHQm4_yXmfHBsMbJQo
gdown 1xWLiU-MBGN7MrsFHQm4_yXmfHBsMbJQo -O sky_timelapse.zip
unzip sky_timelapse.zip -d data
rm sky_timelapse.zipFor those datasets at a different format other than frames, you can use scripts/pt_to_frames.py (e.g., CARLA) or scripts/video_to_frames.py (e.g., minerl and mazes) to convert them to image frames.
If the dataset does not come with a default train-test split, you can use scripts/split.py to setup one, e.g., for biking and riding.
You can download the pre-trained weights for six datasets we report in our paper.
# If daily download limit was reached, please download manually
gdown 1SOylrO6udRW_Qd6YRRIXHnv3FmHc3ukL -O checkpoints_ode-video.zip
unzip checkpoints_ode-video.zip
rm checkpoints_ode-video.zipWe use Huggingface accelerate to setup gradient accumulation and mixed precision training.
The default arguments are already specified in the script.
In case you want to modify, please use accelerate config.
# USAGE:
# train_videoflow.sh <data_path> <exp_path> <image_size> <num_processes> <accumulation_steps>
# ARGS:
# <data_path> : the folder of your training dataset
# <exp_path> : the folder to save checkpoints and logs
# <image_size> : resize the training images to this size
# <num_processes> : the number of GPUs
# <accumulation_steps> : the number of steps you accumulate the gradients from several batches.
# This will NOT affect the actual batch size,
# but allow you to use a large batch size in limited GPU memory
# by performing one optimizer step after several backward passes
bash scripts/train_videoflow.sh data/sky_timelapse/sky_train experiments_weights/sky 128 1 4Above is an example to train condiff, flow, and bi-flow for the dataset sky.
If out of GPU memory, you can use more accumulation steps.
Note that our trained ODEs can generate next frames but the first frame has to be given or generated separately. Thus you would need to setup the test split of the corresponding dataset to sample (generate) videos using the trained weights.
To sample the trained models, you can use:
# USAGE:
# sample_videoflow.sh <data_path> <model_path> <exp_path> <image_size> <n_samples> <batchsize> <n_frames>
# ARGS:
# <data_path> : the folder of your test dataset
# <model_path> : the folder of your training checkpoints and logs
# <exp_path> : the folder to save sampling results
# <image_size> : the image size you sample
# <n_samples> : the number of videos generated, must be a multiple of the batch size
# <batchsize> : the batch size in sampling
# <n_frames> : the number of frames in each sample
bash scripts/sample_videoflow.sh data/sky_timelapse/sky_test experiments_weights/sky experiments_inference/sky 128 64 8 32The above command generates 64 videos, each of which has 32 frames. The sampling script will sample condiff and flow, together with bi-flow under four different levels of inference noises.
If you find this useful or adopt (parts of) our project, please cite our paper:
@inproceedings{liu2025generative,
title={Generative Video Bi-flow},
author={Liu, Chen and Ritschel, Tobias},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
pages={19363--19372},
year={2025}
}