This repo is the official implementation of the paper "WeakMCN: Multi-task Collaborative Network for Weakly Supervised Referring Expression Comprehension and Segmentation"

The directory structure of the project looks like this:
├── README.md <- The top-level README for developers using this project.
│
├── config <- configuration
│
├── data
│ ├── anns
│ ├── images
│ ├── masks
│
├── datasets <- dataloader file
├── EfficientSAM <- EfficientSAM directory
│
├── models <- Source code for use in this project.
│ ├── __init__.py
│ ├── language_encoder.py <- encoder for images' text descriptions
│ ├── network_blocks.py <- files included essential model blocks
│ ├── visual_encoder.py <- visual backbone
│ ├── weakmcn <- most important files for WeakMCN model implementations
│ │ ├── __init__.py
│ │ ├── head.py <- for anchor-prompt contrastive loss
│ │ ├── seg_head.py <- for segmentation head
│ │ ├── head.py <- for anchor-prompt contrastive loss
| | ├── net.py <- main code for WeakMCN model
│ │
│ │
├── utils <- hepler functions
├── requirements.txt <- The requirements file for reproducing the analysis environment
│── train.py <- script for training the model
│── test.py <- script for testing from a model
└── LICENSE <- Open-source license if one is chosenInstructions on how to clone and set up your repository:
- Clone the repository and navigate to the project directory:
git clone https://github.com/MRUIL/WeakMCN.git
cd weakmcnconda create -n weakmcn python=3.9 -y
conda activate weakmcn- Install Pytorch following the offical installation instructions
(We run all our experiments on pytorch 1.11.0 with CUDA 11.3)
- Install apex following the official installation guide for more details.
(or use the following commands we copied from their offical repo)
git clone https://github.com/NVIDIA/apex
cd apex
git checkout origin/22.02-parallel-state
python setup.py install --cuda_ext --cpp_ext
pip3 install -v --no-cache-dir ./- Clone the EfficientSAM repository
cd EfficientSAM
mkdir weights
cd weights
wget https://github.com/yformer/EfficientSAM/raw/refs/heads/main/weights/efficient_sam_vitt.pt
wget https://github.com/yformer/EfficientSAM/raw/refs/heads/main/weights/efficient_sam_vits.pt.zip
unzip efficient_sam_vits.pt.zip
cd ../..cd utils/DCN
./make.shpip install -r requirements.txt
pip install transformers==4.41.1
wget https://github.com/explosion/spacy-models/releases/download/en_vectors_web_lg-2.1.0/en_vectors_web_lg-2.1.0.tar.gz -O en_vectors_web_lg-2.1.0.tar.gz
pip install en_vectors_web_lg-2.1.0.tar.gz- Download images and Generate annotations according to SimREC
(We also prepared the annotations inside the data/anns folder for saving your time)
- Download the pretrained weights of YoloV3 from Google Drive
(We recommend to put it in the main path of WeakMCN otherwise, please modify the path in config files)
- The data directory should look like this:
├── data
│ ├── anns
│ ├── refcoco.json
│ ├── refcoco+.json
│ ├── refcocog.json
│ ├── images
│ ├── train2014
│ ├── COCO_train2014_000000515716.jpg
│ ├── ...
│ ├── masks
... the remaining directories - NOTE: our YoloV3 is trained on COCO’s training images, excluding those in RefCOCO, RefCOCO+, and RefCOCOg’s validation+testing
- If you want to train WeakMCN with SAM ViT-tiny backbone, you can run the following command:
python train.py --config ./config/refcoco_tuning.yaml
python train.py --config ./config/refcoco+_tuning.yaml
python train.py --config ./config/refcocog_tuning.yaml- If you want to train WeakMCN with SAM ViT-base backbone, you can run the following command:
python train.py --config ./config/refcoco_tuning_v2.yaml
python train.py --config ./config/refcoco+_tuning_v2.yaml
python train.py --config ./config/refcocog_tuning_v2.yamlpython test.py --config ./config/[DATASET_NAME].yaml --eval-weights [PATH_TO_CHECKPOINT_FILE]| Method | REC | RES | checkpoint | ||||
|---|---|---|---|---|---|---|---|
| val | testA | testB | val | testA | testB | ||
| WeakMCN (SAM Vit-tiny) | 68.63 | 70.18 | 62.36 | 58.41 | 60.06 | 56.08 | link |
| WeakMCN (SAM Vit-base) | 69.22 | 70.76 | 63.43 | 59.49 | 61.01 | 56.40 | link |
| Method | REC | RES | checkpoint | ||||
|---|---|---|---|---|---|---|---|
| val | testA | testB | val | testA | testB | ||
| WeakMCN (SAM Vit-tiny) | 51.14 | 56.92 | 42.22 | 42.51 | 48.91 | 35.10 | link |
| WeakMCN (SAM Vit-base) | 51.93 | 57.40 | 43.28 | 44.36 | 50.40 | 37.12 | link |
| Method | REC | RES | checkpoint |
|---|---|---|---|
| val | val | ||
| WeakMCN (SAM Vit-tiny) | 53.82 | 45.73 | link |
| WeakMCN (SAM Vit-base) | 55.00 | 46.81 | link |
This repository is built upon RefCLIP, LaConvNet, and SimREC. Thanks for those well-organized codebases.
@inproceedings{cheng2025weakmcn,
title={WeakMCN: Multi-task Collaborative Network for Weakly Supervised Referring Expression Comprehension and Segmentation},
author={Cheng, Silin and Liu, Yang and He, Xinwei and Ourselin, Sebastien and Tan, Lei and Luo, Gen},
booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
pages={9175--9185},
year={2025}
}