Skip to content

bpiyush/LiFT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LiFT: Linearized Feature Trajectories (NeurIPS 2025)

Project Page Dataset NeurIPS 2025 GitHub

Chirality in Action: Time-Aware Video Representation Learning by Latent Straightening

NeurIPS 2025

Piyush Bagad,   Andrew Zisserman

University of Oxford

image

Table of Contents


Brief Overview

LiFT learns time-aware video representations that can linearly separate temporally opposite (chiral) actions like "opening" vs "closing" or "moving up" vs "moving down".

🔐 The Key Nugget

Key observation: tSNE projections of per-frame features from DINOv2 show that they lie on a time-sensitive trajectory. Can we use these to learn a time-aware video representation?

DINO Trajectory

🏗️ The Model: LiFT

Inspired by perceptual straightening: LiFT transforms non-linear DINO trajectories into a compact video embedding under a linearized Auto-Encoder model, inspired by the perceptual straightening hypothesis [Hénaff et al., Nature 2019].

LiFT Architecture

Perceptual Straightening

What we contribute:

  • Model: LiFT - a compact (768-dim) time-aware video embedding trained in an unsupervised manner
  • Benchmark: Chirality in Action (CiA) - a new benchmark built from SSv2, EPIC, and Charades datasets to evaluate temporal understanding

Installation and Setup

First, create a conda environment:

conda create --name lift python=3.11 -y
conda activate lift

Then, install the LiFT package:

pip install git+https://github.com/bpiyush/LiFT.git
Alternative: Manual installation with conda

If you prefer more control over dependencies, create a conda environment:

conda create --name lift python=3.11 -y
conda activate lift

# Install torch
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124

# Install lightning
pip install lightning==2.4.0

# Install other dependencies
pip install einops==0.8.1
pip install timm==1.0.22
pip install decord==0.6.0
pip install matplotlib==3.9.2
pip install opencv-python pandas ipdb ipywidgets tqdm scikit-learn termcolor seaborn ffmpeg-python

# Install gdown for downloading model weights
pip install gdown

Download Model Weights

Download the pre-trained LiFT model weights (~110MB):

# Download the checkpoint file
gdown 1DFapOrZwRcltyq3_tQNTQ9mHtpgKqtZY -O ggwirp95-epoch=458-step=834003.ckpt

Alternatively, you can manually download from Google Drive.

Quick Start

# Set path to your video
video_path = "your_video.mp4"

import torch
from lift import DINOv2ForVideo, make_classification_eval_transform, load_lift_module
from lift.dinov2 import compute_dino_features_for_single_video
from lift.demo import compute_lift_embeddings
from lift.viz_utils import show_trajectory_with_reconstruction

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load models
backbone = DINOv2ForVideo(model_id='vit_small_patch14_reg4_dinov2.lvd142m').to(device)
preprocess = make_classification_eval_transform()
lift_model = load_lift_module(ckpt_root=".", ckpt_name="ggwirp95-epoch=458-step=834003.ckpt").to(device)

# Extract features from your video
frames, _, dino_feats = compute_dino_features_for_single_video(
    video_path, preprocess, backbone, return_frames=True, device=device, n_frames=16
)

# Get LiFT embedding (768-dim time-aware video representation)
lift_output = compute_lift_embeddings(dino_feats.unsqueeze(0), lift_model, device=device)
embedding = lift_output["concat"]  # Shape: [1, 768]

# Visualize tSNE (DINO trajectory in red, LiFT reconstruction in blue)
img = show_trajectory_with_reconstruction(
    video_path=video_path,
    x=dino_feats,
    x_hat=lift_output["reconstructed"].squeeze(0),
    class_name="my video",
    method="tsne",
    joint_dimred=True,
    return_img=True,
)
img.save("lift_output.png")

Visualization of the DINO trajectory (red) and LiFT reconstruction (blue).

Alternative: Run the demo script
cd LiFT
export PYTHONPATH=$PWD
python lift/demo.py --ckpt_root . --ckpt_name ggwirp95-epoch=458-step=834003.ckpt

Citation

If you find this work useful, please consider citing:

@InProceedings{BagadLiFT25,
  author       = "Piyush Bagad and Andrew Zisserman",
  title        = "Chirality in Action: Time-Aware Video Representation Learning by Latent Straightening",
  booktitle    = "NeurIPS",
  year         = "2025",
}

Please also consider checking out the following papers:

About

Code for LiFT (Linearized Feature Trajectories) video embedding

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages