[AAAI 2026] Empowering DINO Representations for Underwater Instance Segmentation via Aligner and Prompter
Zhiyang Chen*, Chen Zhang*, Hao Fang and Runmin Cong
*These authors contributed equally.
conda create --name DiveSeg python=3.10 -y
conda activate DiveSeg
conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=12.1 -c pytorch -c nvidia
# under your working directory
git clone [email protected]:facebookresearch/detectron2.git
cd detectron2
pip install -e .
cd ..
git clone https://github.com/ettof/Diveseg.git
cd Diveseg
pip install -r requirements.txt
cd mask2former/modeling/pixel_decoder/ops
sh make.sh
cd ../../../..Download the two benchmark datasets and organize them as follows:
data/
├── UIIS/
│ ├── train/
│ ├── val/
│ └── annotations/
│ ├── train.json
│ └── val.json
└── USIS10K/
├── multi_class_annotations/
├── foreground_annotations/
├── train/
├── val/
└── test/
The pre-trained weights of DINOv2 are available for download at link. (We use the register-free version of DINOv2-Large.)
Please place the downloaded files in the checkpoints directory as specified in the config file.
Train the DiveSeg model on UIIS or USIS10K dataset:
bash train.shEvaluate pre-trained models on test sets:
bash eval.shYou are expected to get results like this:
| Dataset | Test | Backbone | weights | |||
|---|---|---|---|---|---|---|
| UIIS | Instance | ViT-L | 35.6 | 52.0 | 38.5 | model |
| USIS10K | Class-Agnostic | ViT-L | 64.1 | 82.8 | 72.2 | model |
| USIS10K | Multi-Class | ViT-L | 48.4 | 62.3 | 54.4 | model |
@article{chen2025empowering,
title={Empowering DINO Representations for Underwater Instance Segmentation via Aligner and Prompter},
author={Chen, Zhiyang and Zhang, Chen and Fang, Hao and Cong, Runmin},
journal={arXiv preprint arXiv:2511.08334},
year={2025}
}This repo is based on DINOv2, detectron2 and Mask2Former. Thanks for their great work!
