Skip to content

yuqunw/scene_diff

Repository files navigation


SceneDiff: A Benchmark and Method for Multiview Object Change Detection


Yuqun Wu · Chih-hao Lin · Henry Che · Aditi Tiwari · Chuhang Zou · Shenlong Wang · Derek Hoiem

Project Page Arxiv Dataset Data Annotator


This repository contains the code for the paper SceneDiff: A Benchmark and Method for Multiview Object Change Detection. We investigate the problem of identifying objects that have been changed between a pair of captures of the same scene at different times, introducing the first object-level multiview change detection benchmark and a new training-free method.

SceneDiff Benchmark

Download the SceneDiff benchmark dataset from 🤗 Hugging Face.

mkdir data && cd data
wget https://huggingface.co/datasets/yuqun/SceneDiff/resolve/main/scenediff_benchmark.zip
unzip scenediff_bechmark.zip

Dataset Structure

scenediff_benchmark/
├── data/                          # 350 sequence pairs
│   ├── sequence_pair_1/
│   │   ├── original_video1.mp4    # Raw video before change
│   │   ├── original_video2.mp4    # Raw video after change
│   │   ├── video1.mp4             # Video with annotation mask (before)
│   │   ├── video2.mp4             # Video with annotation mask (after)
│   │   ├── segments.pkl           # Dense segmentation masks for evaluation
│   │   └── metadata.json          # Sequence metadata
│   ├── sequence_pair_2/
│   │   └── ...
│   └── ...
├── splits/                        # Val/Test splits
│   ├── val_split.json
│   └── test_split.json
└── vis/                           # Visualization tools
    ├── visualizer.py              # Flask-based web viewer
    ├── requirements.txt
    └── templates/

About segments.pkl: See the detailed description here.

Visualization: For better visualization, run the command:

cd data/scenediff_benchmark/vis && pip install -r requirements.txt
python visualizer.py

Evaluation

We expect the method predictions have following structures:

output_dir/
├── sequence_pair_1/
│   └── object_masks.pkl           # Dense segmentations of changed objects (for evaluation)
├── sequence_pair_2/
└── ...

with object_masks.pkl following this structure:

object_masks = {
    'H': int,                           # Image height
    'W': int,                           # Image width
    'video_1': {                        # Objects existing in video_1
        'object_id_1': {                # Integer ID for each detected object
            'frame_id_1': {             # Integer frame number
                'mask': RLE_Mask,       # Run-length encoded mask
                'cost': float           # Confidence score of the prediction
            },
            ...
        },
        ...
    },
    'video_2': {                        # Objects existing in video_2
        'object_id_1': {                # Integer ID for each detected object
            'frame_id_1': {             # Integer frame number
                'mask': RLE_Mask,       # Run-length encoded mask
                'cost': float           # Confidence score of the prediction
            },
            ...
        },
        ...
    }
}

Then the evaluation script can be run with:

python scripts/evaluate_multiview.py \
    --pred_dir ${OUTPUT_DIR} \
    --duplicate_match_threshold 2 \
    --per_frame_duplicate_match_threshold 2 \
    --splits val \
    --sets varied \
    --output_path ${OUTPUT_FILE_PATH} \
    --visualize False

Arguments:

  • --duplicate_match_threshold: Tolerance for duplicate objects across frames (default: 2)
  • --per_frame_duplicate_match_threshold: Tolerance for duplicate regions per frame (default: 2)
  • --splits: Choose from val, test, or all
  • --sets: Choose from varied, kitchen, or All
  • --visualize: Set to True to save visualization outputs

Output: The evaluation results will be saved to ${OUTPUT_FILE_PATH}

Getting Started

Installation

  1. Clone this repository with submodules:

    git clone --recursive https://github.com/yuqunw/scene_diff.git
    cd scene_diff
  2. Create conda environment and install dependencies:

    conda create -n scene_diff python=3.10 -y
    conda activate scene_diff
    pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121 # Install the pytorch fitting your nvcc version 
    pip install -r requirements.txt
    pip install torch-scatter -f https://data.pyg.org/whl/torch-2.5.1+cu121.html # install torch_scatter
  3. Install submodules:

    # Install segment-anything submodule
    cd submodules/segment-anything-langsplat-modified
    pip install -e .
    cd ../..

Download Checkpoints

1. Download the Segment-Anything checkpoint:

bash checkpoints/download_sam_checkpoint.sh

2. Configure DINOv3 checkpoint:

The DINOv3 checkpoint will be automatically downloaded on first use after filling in the checkpoint url. To set it up:

  1. Visit the DINOv3 downloads page to apply for the checkpoint access
  2. Right-click on dinov3_vith16plus_pretrain_lvd1689m-7c1da9a5.pth and copy the download link
  3. Update the URL in configs/scenediff_config.yml:
    models:
      dinov3:
        weight_url: "<paste_your_copied_url_here>"

Quick Demo

Run change detection on any two videos:

python scripts/demo.py \
    --config configs/scenediff_config.yml \
    --video1 path/to/video1.mp4 \
    --video2 path/to/video2.mp4 \
    --output output/demo

Output: The script generates point cloud visualizations including score maps and object segmentations for both videos in the specified output directory.

Parameters: You can modify parameters in configs/scenediff_config.yml. If the automatic threshold for change detection doesn't work well (score maps look correct but too many or few detections), you can manually set detection.object_threshold in the config file.

Predict on SceneDiff Benchmark

Run inference on all sequences in the benchmark:

python scripts/predict_multiview.py \
    --config configs/scenediff_config.yml \
    --splits val \
    --sets varied \
    --output_dir output/scenediff_benchmark

Arguments:

  • --splits: Choose from val, test, or all
  • --sets: Choose from varied, kitchen, or All
  • --output_dir: Directory to save predictions
  • Modify more arguments in the config file

Acknowledgement

We thank the great work from these repositories:

License

This project is released under the MIT License. See LICENSE for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published