A video annotation tool for logging down changes between paired video sequences.
🔗 Check out the project page for more details.
The SceneDiff Annotator is built on top of SAM2 and provides a complete workflow for annotating changes between video pairs:
- Upload & Configure: Upload video pairs and specify object attributes (deformability, change type, multiplicity)
- Interactive Annotation: Provide sparse point prompts on selected frames with an intuitive click-based interface
- Offline Propagation: Automatically propagate masks throughout both videos
- Review & Refine: Visualize annotated videos, refine annotations, and verify results
annotation_visualization.mov
Watch the video above to see the annotation tool in action.
- Python 3.8+
- CUDA-capable GPU (recommended for faster processing)
- ffmpeg
-
Clone the repository (including SAM2 submodule):
git clone --recursive https://github.com/yuqunw/scenediff_annotator cd scenediff_annotator -
Create conda environment and install SAM2 and dependencies:
conda create -n scenediff_annotator python=3.10 -y conda activate scenediff_annotator pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121 # Install the pytorch fitting your nvcc version pip install -r requirements.txt cd sam2 && pip install -e .
-
Download SAM2 checkpoints:
bash download_ckpts.sh cd ../..
For detailed SAM2 installation instructions, refer to the official SAM2 repository.
-
Launch the backend server:
python backend.py
-
Open the web interface: Navigate to
http://localhost:5000in your web browser.
-
Upload Videos: Upload a pair of videos, fill in the Scene Type, and wait for the initialization.
-
Configure Object Attributes:
- Specify the number of changed objects
- Specify the name, multiplicity index
- Specify the appearance in video 1 and video 2 and number of frames for annotation
- Specify the deformabiltiy
-
Provide Annotations:
- Select the frame in the video
- Click to add point prompts
-
Run Offline Propagation:
- After all annotation, click
Start Offline Jobto begin mask propagation - SAM2 will propagate objects throughout both videos offline. Could close the page.
- After all annotation, click
-
Review & Refine:
- Navigate to
Review Sessionsto visualize results - Review or refine annotations if needed
- Navigate to
The uploaded videos and generated outputs are saved at ./uploads and ./results. The annotation tool generates two primary output files:
inference_data.pkl: Stores initial prompt inputs used for mask propagationsegments.pkl: Contains the final segmentation masks and metadata
The segments.pkl file follows this hierarchical structure:
segments = {
'scenetype': str, # Type of scene change
'video1_objects': {
'object_id': {
'frame_id': RLE_Mask # Run-length encoded mask
}
},
'video2_objects': {
'object_id': {
'frame_id': RLE_Mask # Run-length encoded mask
}
},
'objects': {
'object_1': {
'label': str, # Object label/name
'in_video1': bool, # Present in video 1
'in_video2': bool, # Present in video 2
'deformability': str # 'rigid' or 'deformable'
}
}
}To convert RLE masks back to tensors:
import torch
from pycocotools import mask as mask_utils
# Load and decode RLE mask
tensor_mask = torch.tensor(mask_utils.decode(rle_mask))This project is built upon the excellent SAM2 repository (Segment Anything Model 2). We gratefully acknowledge their contributions to the computer vision community.
See LICENSE for more information.