Skip to content

yuqunw/scenediff_annotator

Repository files navigation

SceneDiff Annotator

A video annotation tool for logging down changes between paired video sequences.

🔗 Check out the project page for more details.

Overview

The SceneDiff Annotator is built on top of SAM2 and provides a complete workflow for annotating changes between video pairs:

  • Upload & Configure: Upload video pairs and specify object attributes (deformability, change type, multiplicity)
  • Interactive Annotation: Provide sparse point prompts on selected frames with an intuitive click-based interface
  • Offline Propagation: Automatically propagate masks throughout both videos
  • Review & Refine: Visualize annotated videos, refine annotations, and verify results

Demo

annotation_visualization.mov

Watch the video above to see the annotation tool in action.

Installation

Prerequisites

  • Python 3.8+
  • CUDA-capable GPU (recommended for faster processing)
  • ffmpeg

Setup Instructions

  1. Clone the repository (including SAM2 submodule):

    git clone --recursive https://github.com/yuqunw/scenediff_annotator
    cd scenediff_annotator
  2. Create conda environment and install SAM2 and dependencies:

    conda create -n scenediff_annotator python=3.10 -y
    conda activate scenediff_annotator
    pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121 # Install the pytorch fitting your nvcc version    
    pip install -r requirements.txt
    cd sam2 && pip install -e .
  3. Download SAM2 checkpoints:

    bash download_ckpts.sh
    cd ../..

For detailed SAM2 installation instructions, refer to the official SAM2 repository.

Usage

Starting the Application

  1. Launch the backend server:

    python backend.py
  2. Open the web interface: Navigate to http://localhost:5000 in your web browser.

Annotation Workflow

  1. Upload Videos: Upload a pair of videos, fill in the Scene Type, and wait for the initialization.

  2. Configure Object Attributes:

    • Specify the number of changed objects
    • Specify the name, multiplicity index
    • Specify the appearance in video 1 and video 2 and number of frames for annotation
    • Specify the deformabiltiy
  3. Provide Annotations:

    • Select the frame in the video
    • Click to add point prompts
  4. Run Offline Propagation:

    • After all annotation, click Start Offline Job to begin mask propagation
    • SAM2 will propagate objects throughout both videos offline. Could close the page.
  5. Review & Refine:

    • Navigate to Review Sessions to visualize results
    • Review or refine annotations if needed

Output Format

The uploaded videos and generated outputs are saved at ./uploads and ./results. The annotation tool generates two primary output files:

  • inference_data.pkl: Stores initial prompt inputs used for mask propagation
  • segments.pkl: Contains the final segmentation masks and metadata

Segments Structure

The segments.pkl file follows this hierarchical structure:

segments = {
    'scenetype': str,                    # Type of scene change
    'video1_objects': {
        'object_id': {
            'frame_id': RLE_Mask         # Run-length encoded mask
        }
    },
    'video2_objects': {
        'object_id': {
            'frame_id': RLE_Mask         # Run-length encoded mask
        }
    },
    'objects': {
        'object_1': {
            'label': str,                # Object label/name
            'in_video1': bool,           # Present in video 1
            'in_video2': bool,           # Present in video 2
            'deformability': str         # 'rigid' or 'deformable'
        }
    }
}

Loading Masks

To convert RLE masks back to tensors:

import torch
from pycocotools import mask as mask_utils

# Load and decode RLE mask
tensor_mask = torch.tensor(mask_utils.decode(rle_mask))

Acknowledgements

This project is built upon the excellent SAM2 repository (Segment Anything Model 2). We gratefully acknowledge their contributions to the computer vision community.

License

See LICENSE for more information.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published