Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation [IEEE TIP 2025]

Sule Bai*, Yong Liu*, Yifei Han, Haoji Zhang, Yansong Tang, Jie Zhou, Jiwen Lu

Tsinghua University

🔥 News

This paper has been accepted by IEEE Transactions on Image Processing (TIP).

📖 Overview

We propose SC-CLIP, a training-free method designed to enhance CLIP's dense feature representation, effectively addressing the uniform attention activations and feature homogenization caused by the anomaly tokens.
We mitigate the negative effects of anomaly tokens from two perspectives. First, we explicitly address the anomaly tokens based on local context. Second, we reduce their impact on normal tokens by enhancing feature discriminability and attention correlation, leveraging the spatial consistency inherent in CLIP's mid-level features.
Our approach sets new state-of-the-art results across popular benchmarks. And we conduct extensive experiments to validate the effectiveness of our method.

🛠️ Installation

git clone https://github.com/SuleBai/SC-CLIP.git
cd SC-CLIP

conda create -n scclip python=3.9
conda activate scclip
pip install torch==1.10.1+cu111 torchvision==0.11.2+cu111 -f https://download.pytorch.org/whl/cu111/torch_stable.html
pip install openmim
mim install mmcv==2.0.1 mmengine==0.8.4 mmsegmentation==1.1.1
pip install ftfy regex numpy==1.26 yapf==0.40.1

📚 Datasets

We provide the dataset configurations in this repository, following SCLIP.

Please follow the MMSeg data preparation document to download and pre-process the datasets. The COCO-Object dataset can be converted from COCO-Stuff164k by executing the following command:

python datasets/cvt_coco_object.py PATH_TO_COCO_STUFF164K -o PATH_TO_COCO_OBJECT

🔥 Demo

python demo.py

📊 Model Evaluation

Single-GPU running:

python eval.py --config configs/cfg_DATASET.py --workdir YOUR_WORK_DIR

Multi-GPU running:

bash dist_test.sh

🌹 Acknowledgement

This implementation is based on CLIP, SCLIP, CLIP-DINOiser and ClearCLIP. Thanks for the awesome work.

📃 Bibtex

If this work is helpful for your research, please consider citing the following BibTeX entry.

@article{bai2025self,
  title={Self-calibrated clip for training-free open-vocabulary segmentation},
  author={Bai, Sule and Liu, Yong and Han, Yifei and Zhang, Haoji and Tang, Yansong and Zhou, Jie and Lu, Jiwen},
  journal={IEEE Transactions on Image Processing},
  year={2025},
  publisher={IEEE}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
clip		clip
configs		configs
datasets		datasets
figs		figs
prompts		prompts
.gitignore		.gitignore
README.md		README.md
custom_datasets.py		custom_datasets.py
demo.py		demo.py
dist_test.sh		dist_test.sh
eval.py		eval.py
pamr.py		pamr.py
scclip_segmentor.py		scclip_segmentor.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation [IEEE TIP 2025]

🔥 News

📖 Overview

🛠️ Installation

📚 Datasets

🔥 Demo

📊 Model Evaluation

🌹 Acknowledgement

📃 Bibtex

About

Uh oh!

Releases

Packages

Languages

SuleBai/SC-CLIP

Folders and files

Latest commit

History

Repository files navigation

Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation [IEEE TIP 2025]

🔥 News

📖 Overview

🛠️ Installation

📚 Datasets

🔥 Demo

📊 Model Evaluation

🌹 Acknowledgement

📃 Bibtex

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages