LiFT: Linearized Feature Trajectories (NeurIPS 2025)

Chirality in Action: Time-Aware Video Representation Learning by Latent Straightening

NeurIPS 2025

University of Oxford

Brief Overview

LiFT learns time-aware video representations that can linearly separate temporally opposite (chiral) actions like "opening" vs "closing" or "moving up" vs "moving down".

🔐 The Key Nugget

Key observation: tSNE projections of per-frame features from DINOv2 show that they lie on a time-sensitive trajectory. Can we use these to learn a time-aware video representation?

🏗️ The Model: LiFT

Inspired by perceptual straightening: LiFT transforms non-linear DINO trajectories into a compact video embedding under a linearized Auto-Encoder model, inspired by the perceptual straightening hypothesis [Hénaff et al., Nature 2019].

What we contribute:

Model: LiFT - a compact (768-dim) time-aware video embedding trained in an unsupervised manner
Benchmark: Chirality in Action (CiA) - a new benchmark built from SSv2, EPIC, and Charades datasets to evaluate temporal understanding

Installation and Setup

First, create a conda environment:

conda create --name lift python=3.11 -y
conda activate lift

Then, install the LiFT package:

pip install git+https://github.com/bpiyush/LiFT.git

Alternative: Manual installation with conda

If you prefer more control over dependencies, create a conda environment:

conda create --name lift python=3.11 -y
conda activate lift

# Install torch
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124

# Install lightning
pip install lightning==2.4.0

# Install other dependencies
pip install einops==0.8.1
pip install timm==1.0.22
pip install decord==0.6.0
pip install matplotlib==3.9.2
pip install opencv-python pandas ipdb ipywidgets tqdm scikit-learn termcolor seaborn ffmpeg-python

# Install gdown for downloading model weights
pip install gdown

Download Model Weights

Download the pre-trained LiFT model weights (~110MB):

# Download the checkpoint file
gdown 1DFapOrZwRcltyq3_tQNTQ9mHtpgKqtZY -O ggwirp95-epoch=458-step=834003.ckpt

Alternatively, you can manually download from Google Drive.

Quick Start

# Set path to your video
video_path = "your_video.mp4"

import torch
from lift import DINOv2ForVideo, make_classification_eval_transform, load_lift_module
from lift.dinov2 import compute_dino_features_for_single_video
from lift.demo import compute_lift_embeddings
from lift.viz_utils import show_trajectory_with_reconstruction

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load models
backbone = DINOv2ForVideo(model_id='vit_small_patch14_reg4_dinov2.lvd142m').to(device)
preprocess = make_classification_eval_transform()
lift_model = load_lift_module(ckpt_root=".", ckpt_name="ggwirp95-epoch=458-step=834003.ckpt").to(device)

# Extract features from your video
frames, _, dino_feats = compute_dino_features_for_single_video(
    video_path, preprocess, backbone, return_frames=True, device=device, n_frames=16
)

# Get LiFT embedding (768-dim time-aware video representation)
lift_output = compute_lift_embeddings(dino_feats.unsqueeze(0), lift_model, device=device)
embedding = lift_output["concat"]  # Shape: [1, 768]

# Visualize tSNE (DINO trajectory in red, LiFT reconstruction in blue)
img = show_trajectory_with_reconstruction(
    video_path=video_path,
    x=dino_feats,
    x_hat=lift_output["reconstructed"].squeeze(0),
    class_name="my video",
    method="tsne",
    joint_dimred=True,
    return_img=True,
)
img.save("lift_output.png")

Visualization of the DINO trajectory (red) and LiFT reconstruction (blue).

Alternative: Run the demo script

cd LiFT
export PYTHONPATH=$PWD
python lift/demo.py --ckpt_root . --ckpt_name ggwirp95-epoch=458-step=834003.ckpt

Citation

If you find this work useful, please consider citing:

@InProceedings{BagadLiFT25,
  author       = "Piyush Bagad and Andrew Zisserman",
  title        = "Chirality in Action: Time-Aware Video Representation Learning by Latent Straightening",
  booktitle    = "NeurIPS",
  year         = "2025",
}

Please also consider checking out the following papers:

Seeing the Arrow of Time in Large Multimodal Models. NeurIPS (2025).
Retro-Actions: Learning ‘Close’ by Time-Reversing ‘Open’ Videos. ICCVW (2019).
Perceptual straightening of natural videos. Nature Neuroscience (2019).

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets		assets
lift		lift
shared		shared
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
lift_output.png		lift_output.png
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LiFT: Linearized Feature Trajectories (NeurIPS 2025)

Chirality in Action: Time-Aware Video Representation Learning by Latent Straightening

Table of Contents

Brief Overview

🔐 The Key Nugget

🏗️ The Model: LiFT

Installation and Setup

Download Model Weights

Quick Start

Citation

About

Uh oh!

Releases

Packages

Languages

License

bpiyush/LiFT

Folders and files

Latest commit

History

Repository files navigation

LiFT: Linearized Feature Trajectories (NeurIPS 2025)

Chirality in Action: Time-Aware Video Representation Learning by Latent Straightening

Table of Contents

Brief Overview

🔐 The Key Nugget

🏗️ The Model: LiFT

Installation and Setup

Download Model Weights

Quick Start

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages