Skip to content

SuleBai/SC-CLIP

Repository files navigation

Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation [IEEE TIP 2025]


Tsinghua University

🔥 News

  • This paper has been accepted by IEEE Transactions on Image Processing (TIP).

📖 Overview

  1. We propose SC-CLIP, a training-free method designed to enhance CLIP's dense feature representation, effectively addressing the uniform attention activations and feature homogenization caused by the anomaly tokens.
  2. We mitigate the negative effects of anomaly tokens from two perspectives. First, we explicitly address the anomaly tokens based on local context. Second, we reduce their impact on normal tokens by enhancing feature discriminability and attention correlation, leveraging the spatial consistency inherent in CLIP's mid-level features.
  3. Our approach sets new state-of-the-art results across popular benchmarks. And we conduct extensive experiments to validate the effectiveness of our method.

🛠️ Installation

git clone https://github.com/SuleBai/SC-CLIP.git
cd SC-CLIP

conda create -n scclip python=3.9
conda activate scclip
pip install torch==1.10.1+cu111 torchvision==0.11.2+cu111 -f https://download.pytorch.org/whl/cu111/torch_stable.html
pip install openmim
mim install mmcv==2.0.1 mmengine==0.8.4 mmsegmentation==1.1.1
pip install ftfy regex numpy==1.26 yapf==0.40.1

📚 Datasets

We provide the dataset configurations in this repository, following SCLIP.

Please follow the MMSeg data preparation document to download and pre-process the datasets. The COCO-Object dataset can be converted from COCO-Stuff164k by executing the following command:

python datasets/cvt_coco_object.py PATH_TO_COCO_STUFF164K -o PATH_TO_COCO_OBJECT

🔥 Demo

python demo.py

📊 Model Evaluation

Single-GPU running:

python eval.py --config configs/cfg_DATASET.py --workdir YOUR_WORK_DIR

Multi-GPU running:

bash dist_test.sh

🌹 Acknowledgement

This implementation is based on CLIP, SCLIP, CLIP-DINOiser and ClearCLIP. Thanks for the awesome work.

📃 Bibtex

If this work is helpful for your research, please consider citing the following BibTeX entry.

@article{bai2025self,
  title={Self-calibrated clip for training-free open-vocabulary segmentation},
  author={Bai, Sule and Liu, Yong and Han, Yifei and Zhang, Haoji and Tang, Yansong and Zhou, Jie and Lu, Jiwen},
  journal={IEEE Transactions on Image Processing},
  year={2025},
  publisher={IEEE}
}

About

[TIP 2025] Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published