We introduce FOCUS, Foreground ObjeCts Universal Segmentation framework that can handle multiple foreground tasks with one unified architecture. To achieve boundary-aware segmentation, we develop a multi-scale semantic network using the edge information of objects to enhance image features and propose a novel distillation method, integrating the contrastive learning strategy to refine the prediction mask in multi-modal feature space. Extensive experiments demonstrate that FOCUS achieves SoTA performance across five foreground segmentation tasks, including Salient Object Detection (SOD), Camouflaged Object Detection (COD), Shadow Detection (SD), Defocus Blur Detection (DBD), and Forgery Detection (FD).
- [2025.07.06] FOCUS(DINOv2-L) checkpoints and prediction results are now opensource. We've also updated the training scripts to support DINOv2-L as the backbone, you can now train FOCUS using one single NVIDIA A6000 GPU. Hope you enjoy it!
- [2025.06.27] Our new paper Seg-R1: Segmentation Can Be Surprisingly Simple with Reinforcement Learning is released. In this paper, we explore how to endow large language models (LLMs) with open-world segmentation capabilities using purely reinforcement learning, relying solely on foreground segmentation data.
- [2025.01.03] FOCUS(DINOv2-G) checkpoints and prediction results are now opensource. You can follow the guidelines here to quickly leverage the state-of-the-art performance of our model. Hope you enjoy it!
- [2024.12.16] Our code is released! Feel free to contact us if you have any questions!
- [2024.12.10] Our paper has been accepted by AAAI2025!🔥
- We use CUDA 12.2 for implementation.
- Our code is built upon Pytorch 2.1.1, please make sure you are using PyTorch ≥ 2.1 and matched torchvision. Besides, please check PyTorch version matches that is required by Detectron2.
- We train our models on 2 NVIDIA A6000 GPUs with 48G memory, please make sure that your VRAM is sufficient to avoid the potential OOM issues during training.
#create environment
conda create --name focus python=3.8
conda activate focus
pip install -r requirements.txt
#install detectron2
git clone [email protected]:facebookresearch/detectron2.git # under your working directory
cd detectron2 && pip install -e . && cd ..
#install other dependencies
pip install git+https://github.com/cocodataset/panopticapi.git
cd third_party/CLIP
python -m pip install -Ue .
cd ../../
#compile CUDA kernel for MSDeformAttn
cd focus/modeling/pixel_decoder/ops && sh make.sh && cd ../../../../We provide an inference demo here if you want to try out the our model. You should download the weights from our MODEL_ZOO.md first and run the following command. Make sure that you use the config file that matches the downloaded weights.
python demo/demo.py --config-file path/to/your/config \
--input path/to/your/image \
--output path/to/your/output_file \
--opts MODEL.WEIGHTS path/to/your/weightsYou should download required dataset (CAMO, COD10K, CHAMELEON, NC4K, DUTS, DUTOMRON, HKU-IS, ECSSD, PASCAL-S, ISTD, DUT/CUHK, CASIA1.0, CASIA2.0) into the datasets folder following
datasets/
├── CAMO-V.1.0-CVIU2019
│ ├── GT
│ ├── Images
│ │ ├── Test
│ │ └── Train
├── CASIA
│ ├── CASIA 1.0 dataset
│ ├── CASIA 1.0 groundtruth
│ ├── CASIA2.0_Groundtruth
│ └── CASIA2.0_revised
├── CHAMELEON
│ ├── GT
│ └── Imgs
├── COD10K-v3
│ ├── Test
│ └── Train
├── DEFOCUS
│ └── dataset
│ ├── test_data
│ │ ├── CUHK
│ │ └── DUT
│ └── train_data
│ ├── 1204gt
│ └── 1204source
├── DUTOMRON
│ ├── DUT-OMRON-image
│ └── pixelwiseGT-new-PNG
├── DUTS
│ ├── DUTS-TE
│ └── DUTS-TR
├── ECSSD
│ ├── ground_truth_mask
│ └── images
├── HKU-IS
│ ├── gt
│ └── imgs
├── ISTD_Dataset
│ ├── test
│ └── train
├── NC4K
│ ├── GT
│ └── Imgs
├── PASCAL
│ └── Imgs
and run the corresponding dataset preparation script by running:
python utils/prepare/prepare_<dataset>.py
# e.g. python utils/prepare/prepare_camo.pydownload pre-trained DINOv2 weights by:
#dinov2-g
wget -P ./ckpt https://dl.fbaipublicfiles.com/dinov2/dinov2_vitg14/dinov2_vitg14_reg4_pretrain.pth
#dinov2-l
wget -P ./ckpt https://dl.fbaipublicfiles.com/dinov2/dinov2_vitl14/dinov2_vitl14_reg4_pretrain.pthand run the following line to convert DINOv2 weights into detectron2 format while prepare ResNet weights for edge enhancer
#dinov2-g
python utils/convert_dinov2.py ./ckpt/dinov2_vitg14_reg4_pretrain.pth ./ckpt/dinov2_vitg14_pretrain_updated.pkl
#dinov2-l
python utils/convert_dinov2.py ./ckpt/dinov2_vitl14_reg4_pretrain.pth ./ckpt/dinov2_vitl14_pretrain_updated.pkl
python train_net.py \
--config-file path/to/your/config \
--num-gpus NUM_GPUSpython train_net.py --eval-only \
--config-file path/to/your/config \
--num-gpus NUM_GPUS \
MODEL.WEIGHTS path/to/your/weightsIf you think our work is helpful, please star this repo and cite our paper!
@inproceedings{you2025focus,
title={{FOCUS}: Towards Universal Foreground Segmentation},
author={You, Zuyao and Kong, Lingyu and Meng, Lingchen and Wu, Zuxuan},
booktitle={AAAI},
year={2025},
}
FOCUS is built upon Mask2Former, CLIP, ViT-Adapter, OVSeg, and detectron2. We express our gratitude to the authors for their remarkable work.
