Dataset Preparation

Unveiling the Potential of Segment Anything Model 2 for RGB-Thermal Semantic Segmentation with Language Guidance

SHIFNet is an innovative SAM2-driven Hybrid Interactive Fusion Paradigm designed for RGB-T perception tasks. This framework fully unlocks the potential of SAM2 through language-guided adaptation, effectively mitigating its inherent RGB bias and enhancing cross-modal semantic consistency. SHIFNet consists of two key components: (1) Semantic-Aware Cross-modal Fusion (SACF) module, which dynamically balances modality contributions through text-guided affinity learning, enabling adaptive cross-modal information integration; (2) Heterogeneous Prompting Decoder (HPD), which enhances global semantic understanding through a semantic enhancement module and category embeddings, ensuring cross-modal semantic consistency. With only 32.27M trainable parameters, SHIFNet achieves 89.8%, 67.8%, and 59.2% mIoU on PST900, FMB, and MFNet benchmarks, respectively, while attaining 76.5% pedestrian detection accuracy in safety-critical scenarios. By reducing the cost of large-scale data collection and enhancing multi-modal perception capabilities, SHIFNet provides a reliable perception foundation for intelligent robotic systems operating in complex environments.

Training

Dataset Preparation

Ensure the dataset directory is structured as follows:

data/
├── FMB
│   ├── train
│   └── test
└── PST900
    ├── train
    └── test

Set the -data_path parameter to correspond to these folders when training.

Environment Setup

Create Conda Environment
conda create --name SHIFNet python=3.12.3
conda activate SHIFNet
Install SAM2 Please install SAM2 following the official documentation.
Install Required Dependencies
pip install tensorboardX matplotlib einops monai tabulate fvcore opencv-python addict yapf rich
pip install scikit-learn simple_parsing requests
pip install mmcv==2.2.0 -f https://download.openmmlab.com/mmcv/dist/cu121/torch2.4/index.html

4.Category Embedding

Method	Description
LanguageBind	Encode class semantic vectors via LanguageBind
Provided Embeddings	Use our precomputed vectors (recommended for quick start)

5.Example command for single-GPU training:

python -m torch.distributed.launch --nproc_per_node=1 --use_env --master_port=30000 multigpu_train.py \
    -ms l \
    -dataset fmb \
    -distributed "0" \
    -data_path data/FMB \
    -gpu_device 0 \
    -b 1 \
    -lr 1e-4 \
    -ddp True \
    -label_path data/FMB/fmb_class_embedding.pt

6.Pretrained Weights & Embeddings

Resource	Download
Best checkpoints for FMB & PST900	https://pan.baidu.com/s/155ZzxPhHTbGCGN0q_OneDA?pwd=kkyc
Text Embeddings (.pt)	Included in download package

Visualization

python vis.py
Example visualization output on FMB dataset:

Visualization examples on the FMB dataset

🤝 Publication:

Please consider referencing this paper if you use the code from our work. Thanks a lot :)

@article{zhao2025unveiling,
  title={Unveiling the Potential of Segment Anything Model 2 for RGB-Thermal Semantic Segmentation with Language Guidance},
  author={Zhao, Jiayi and Teng, Fei and Luo, Kai and Zhao, Guoqiang and Li, Zhiyong and Zheng, Xu and Yang, Kailun},
  journal={arXiv preprint arXiv:2503.02581},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
assets		assets
conf		conf
dataset		dataset
demo		demo
encoding		encoding
sam2		sam2
tools		tools
training		training
INSTALL.md		INSTALL.md
README.md		README.md
cfg.py		cfg.py
multigpu_function.py		multigpu_function.py
multigpu_train.py		multigpu_train.py
multiptu_utils.py		multiptu_utils.py
pyproject.toml		pyproject.toml
setup.py		setup.py
utils.py		utils.py
vis.py		vis.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Unveiling the Potential of Segment Anything Model 2 for RGB-Thermal Semantic Segmentation with Language Guidance

Training

Dataset Preparation

Environment Setup

Visualization

🤝 Publication:

About

Uh oh!

Releases

Packages

Contributors 2

Languages

iAsakiT3T/SHIFNet

Folders and files

Latest commit

History

Repository files navigation

Unveiling the Potential of Segment Anything Model 2 for RGB-Thermal Semantic Segmentation with Language Guidance

Training

Dataset Preparation

Environment Setup

Visualization

🤝 Publication:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages