Project Page
Paper
Data
Supplemental
by Nicolai Hermann, Jorge Condor, and Piotr Didyk
This repository contains the implementation of the cross-reference metric PuzzleSim and a dedicated demo for the paper "Puzzle Similarity: A Perceptually-Guided Cross-Reference Metric for Artifact Detection in 3D Scene Reconstructions".
- (04-09-2025) Dinov3 🦖 backbones (ConvNeXt & Vits) are now supported
- (31-08-2025) Major refactoring allowing to easily add custom backbones and automated cross-platform testing
- (25-06-2025) PuzzleSim was officially accepted to ICCV 2025 in Hawaii 🌸!
- (29-11-2024) Official code release
If you simply want to use the metric use:
pip install puzzle_simIf you want to extend it please install it locally as a package. The package requires Python 3.8 or higher. If you wish to use dinov3 backbones you must have Python 3.10 or higher and transformers>=4.56:
pip install -e .You can use the metric in your own code as follows:
from puzzle_sim import PuzzleSim
priors = ... # load priors from file with shape (N, C, H, W) in [0, 1]
test_image = ... # load test image (C, H, W) or (1, C, H, W) in [0, 1]
puzzle = PuzzleSim(reference=priors, net_type='squeeze')
similarity_map = puzzle(test_image) # (H, W) similarity map in [0, 1]To use dinov3 backbones you must be logged in to HuggingFace (hf auth login), requested access to the models on HuggingFace and review the necessary requirements above. You can request access to the models here.
In code, you have to adapt the puzzle() call as default arguments assume the configuration from the paper. We have not tested optimal weights for dinov3 backbones yet, so we recommend to use a simple average over all layers (which has shown similar performance to the configurations in the paper):
from puzzle_sim import PuzzleSim
priors = ... # load priors from file with shape (N, C, H, W) in [0, 1]
test_image = ... # load test image (C, H, W) or (1, C, H, W) in [0, 1]
puzzle = PuzzleSim(reference=priors, net_type='convnext_tiny')
similarity_map = puzzle(test_image, layers=range(5), weights=None, reduction='mean') # (H, W) similarity map in [0, 1]If your GPU runs out of memory, try reducing the
strideparameter in the forward call, this will reduce memory consumption. On the other hand, with small image dimensions the naive implementation might be faster although requiring much more memory (setmem_save=False).
Please find the demo in demo.ipynb to see how to run the metric on some example sets. In order to run the demo, you need to pull the data from another repository. Do this by either cloning the repository using
git clone https://github.com/nihermann/PuzzleSim.git --recursiveor if you already cloned the repository without the data submodule, you can download the submodule using
git submodule update --init --recursiveYou can extend PuzzleSim with your own backbone models. To get started, inherit from adapters.FeatureExtractor and implement the compute_features method.
There are two ways to use your backbone:
- Directly in the constructor:
PuzzleSim(..., net_type=YourBackbone())- Via the factory function: Register your backbone in
adapters.get_feature_extractorand add the corresponding string toadapters.net_type, so you can refer to it by that string:
PuzzleSim(..., 'your_backbone')If you’d like to share your backbone with the community, feel free to open a pull request. Please make sure that:
- Your backbone is publicly available (e.g., on HuggingFace or PyTorch Hub)
- you’ve registered it in the factory function
adapters.get_feature_extractor, - extended
adapters.net_type(so the tests pick it up automatically), - and all tests pass (run
pytestin the project root).
For development, we recommend installing the package in editable mode with dev requirements:
pip install -e .[dev]If you find this work useful, please consider citing:
@InProceedings{Hermann_puzzlesim_iccv25,
author = {Hermann, Nicolai and Condor, Jorge and Didyk, Piotr},
title = {Puzzle Similarity: A Perceptually-guided Cross-Reference Metric for Artifact Detection in 3D Scene Reconstructions},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2025},
pages = {28881-28891}
}