Skip to content

The official implementation of our NeurIPS 2025 Poster paper: Fuse2Match: Training-Free Fusion of Flow, Diffusion, and Contrastive Models for Zero-Shot Semantic Matching.

License

Notifications You must be signed in to change notification settings

panda7777777/fuse2match

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fuse2Match: Training-Free Fusion of Flow, Diffusion, and Contrastive Models for Zero-Shot Semantic Matching

This repository contains the official implementation of our NeurIPS 2025 Poster paper "Fuse2Match: Training-Free Fusion of Flow, Diffusion, and Contrastive Models for Zero-Shot Semantic Matching".


Introduction

Recent work shows that features from Stable Diffusion (SD) and contrastively pretrained models like DINO can be directly used for zero-shot semantic correspondence via naive feature concatenation. In this paper, we explore the stronger potential of Stable Diffusion 3 (SD3), a rectified flow-based model with a multimodal transformer backbone (MM-DiT). We show that semantic signals in SD3 are scattered across multiple timesteps and transformer layers, and propose a multi-level fusion scheme to extract discriminative features. Moreover, we identify that naive fusion across models suffers from inconsistent distributions, thus leading to suboptimal performance. To address this, we propose a simple yet effective confidence-aware feature fusion strategy that re-weights each model’s contribution based on prediction confidence scores derived from their matching uncertainties. Notably, this fusion approach is not only training-free but also enables per-pixel adaptive integration of heterogeneous features. The resulting representation, \modelname, significantly outperforms strong baselines on SPair-71k, PF-Pascal, and PSC6K, validating the benefit of combining SD3, SD, and DINO through our proposed confidence-aware feature fusion.


Installation

conda create -n fuse2match python==3.12.3
conda install pytorch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 pytorch-cuda=12.4 -c pytorch -c nvidia
pip install diffusers timm transformers numpy accelerate sentencepiece

Usage

SPair-71k

Requirements NVIDIA A100 40GB GPU recommended for optimal performance.

Preparing Data

cd path/to/data  # Replace with your dataset folder path
wget https://cvlab.postech.ac.kr/research/SPair-71k/data/SPair-71k.tar.gz
tar -xvf SPair-71k.tar.gz

Evaluation

python evaluation.py \
    --model_names sd3 sd2 dinov2 \
    --dataset_name SPair71k \
    --dataset_path path/to/SPair-71k \  # Replace with your dataset folder path
    \
    --sd3_save_path sd3 \
    --sd3_img_size 1024 1024 \
    --sd3_t 721 621 521 \
    --sd3_layers 24 25 \
    --sd3_facets query value \
    \
    --sd2_save_path sd2 \
    --sd2_img_size 768 768 \
    --sd2_t 261 \
    --up_ft_index 1 \
    \
    --dinov2_save_path dinov2 \
    --dinov2_img_size 840 840 \
    --dinov2_size large \
    \
    --ensemble_size 8 \
    --load_from_local

Citation

If you find our work useful, please cite:

@inproceedings{zuo2025esd3,
  title     = {Fuse2Match: Training-Free Fusion of Flow, Diffusion, and Contrastive Models for Zero-Shot Semantic Matching},
  author    = {Zuo, Jing and Wang, Jiaqi and Qi, Yonggang and Song, Yi-Zhe},
  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
  year      = {2025}
}

About

The official implementation of our NeurIPS 2025 Poster paper: Fuse2Match: Training-Free Fusion of Flow, Diffusion, and Contrastive Models for Zero-Shot Semantic Matching.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages