[2024 NIPS] Unsupervised Modality Adaptation with Text-to-Image Diffusion Models for Semantic Segmentation

by Ruihao Xia, Yu Liang, Peng-Tao Jiang, Hao Zhang, Bo Li, Yang Tang, and Pan Zhou

✨ Overview

In this paper, we present Modality Adaptation with text-to-image Diffusion Models (MADM). With the powerful generalization of Text-to-Image Diffusion Models (TIDMs), we extend domain adaptation to modality adaptation, aiming to segment other unexplored visual modalities in the real-world.

MADM Framework

⭐ Visual Results

Qualitative semantic segmentation results generated by SoTA methods MIC, Rein, and our proposed MADM on three modalities.

🛠️ Environment Setup

Create a conda virtual env, activate it, and install packages.

conda create -n MADM python==3.10
conda activate MADM
conda install pytorch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -U openmim
mim install mmcv==1.3.7
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
pip install -r requirements.txt

🔩 Preparing the Datasets

Cityscapes (RGB): Download the gtFine_trainvaltest.zip and leftImg8bit_trainvaltest.zip from cityscapes. Gererate *labelTrainIds.png via cityscapesscripts. Gerenate samples_with_class.json for rare class sample (RCS) via Data Preprocessing in DAFormer.
DELIVER (Depth): Download DELIVER dataset.
FMB (Infrared): Download FMB dataset.
DSEC (Event): Download testing semantic labels and training events & testing events aggregated in the edge form.
Modify the project path in line 393 and dataset path in line 394-403 of main.py

The data folder structure should look like this:

path/to/datasets
├── Cityscapes
│   ├── leftImg8bit
│   ├── gtFine
│   ├── samples_with_class.json
│   ├── ...
├── DELIVER
│   ├── depth
│   ├── semantic
│   ├── ...
├── FMB
│   ├── train
│   ├── test
│   ├── ...
├── DSEC
│   ├── 69mask_train_edges
│   ├── 69mask_test_edges
│   ├── test_semantic_labels
│   ├── ...

🎮 Training

Following the examples in huggingface to automatically download the stable-diffusion-v1-4 model and modify the stable_diffusion_name_or_path in config_files/common/models/mtmadise_multi_lora.py.

Training our MADM requires 2 GPUs with greater than 40GB of memory.

Cityscapes (RGB) → DELIVER (Depth)

CUDA_VISIBLE_DEVICES=0,1 python main.py --config-file config_files/SemSeg/MTMADISE/mtmadise_cityscapes_rgb_to_depth_11.py --num-gpus 2 --bs 2 --tag RGB2Depth

Cityscapes (RGB) → FMB (Infrared)

CUDA_VISIBLE_DEVICES=0,1 python main.py --config-file config_files/SemSeg/MTMADISE/mtmadise_cityscapes_rgb_to_infrared_9.py --num-gpus 2 --bs 2 --tag RGB2Infrared

Cityscapes (RGB) → DSEC (Event)

CUDA_VISIBLE_DEVICES=0,1 python main.py --config-file config_files/SemSeg/MTMADISE/mtmadise_cityscapes_rgb_to_event_11.py --num-gpus 2 --bs 2 --tag RGB2Event

▶️ Testing

Download the trained model of Cityscapes (RGB) → DELIVER (Depth) or Cityscapes (RGB) → FMB (Infrared) or Cityscapes (RGB) → DSEC (Event) and put them in the trained_checkpoints folder. Then, you can inference with them:

Cityscapes (RGB) → DELIVER (Depth)

CUDA_VISIBLE_DEVICES=0,1 python main.py --config-file config_files/SemSeg/MTMADISE/mtmadise_cityscapes_rgb_to_depth_11.py --num-gpus 2 --bs 2 --tag RGB2Depth_eval --eval-only --init-from ./trained_checkpoints/model_RGB2Depth.pth

Cityscapes (RGB) → FMB (Infrared)

CUDA_VISIBLE_DEVICES=0,1 python main.py --config-file config_files/SemSeg/MTMADISE/mtmadise_cityscapes_rgb_to_infrared_9.py --num-gpus 2 --bs 2 --tag RGB2Infrared_eval --eval-only --init-from ./trained_checkpoints/model_RGB2Infrared.pth

Cityscapes (RGB) → DSEC (Event)

CUDA_VISIBLE_DEVICES=0,1 python main.py --config-file config_files/SemSeg/MTMADISE/mtmadise_cityscapes_rgb_to_event_11.py --num-gpus 2 --bs 2 --tag RGB2Event_eval --eval-only --init-from ./trained_checkpoints/model_RGB2Event.pth

For the RGB2Infrared and RGB2Event, since previously trained checkpoints are lost, we provide two new checkpoints with similar performance to that in the paper: RGB2Infrared (Original: 62.23, New: 61.88) and RGB2Event (Original: 56.31, New: 56.68).

♥️ Acknowledgements

Thanks ODISE, DAFormer, Stable Diffusion, Detectron2, and MMCV for their public code and released models.

✒️ Citation

If you find this project useful, please consider citing:

@article{MADM,
  title={Unsupervised Modality Adaptation with Text-to-Image Diffusion Models for Semantic Segmentation},
  author={Xia, Ruihao and Liang, Yu and Jiang, Peng-Tao and Zhang, Hao and Li, Bo and Tang, Yang and Zhou, Pan},
  journal={arXiv:2410.21708},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
checkpoint		checkpoint
config		config
config_files		config_files
data		data
engine		engine
evaluation		evaluation
figs		figs
modeling		modeling
utils		utils
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

[2024 NIPS] Unsupervised Modality Adaptation with Text-to-Image Diffusion Models for Semantic Segmentation

✨ Overview

⭐ Visual Results

🛠️ Environment Setup

🔩 Preparing the Datasets

🎮 Training

▶️ Testing

♥️ Acknowledgements

✒️ Citation

About

Uh oh!

Releases

Packages

Languages

XiaRho/MADM

Folders and files

Latest commit

History

Repository files navigation

[2024 NIPS] Unsupervised Modality Adaptation with Text-to-Image Diffusion Models for Semantic Segmentation

✨ Overview

⭐ Visual Results

🛠️ Environment Setup

🔩 Preparing the Datasets

🎮 Training

▶️ Testing

♥️ Acknowledgements

✒️ Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages