Skip to content

lhoangan/multas

Repository files navigation

MulTaS: MULti-TAsk Self-training of object detection and semantic segmentation

The source code has been used for our papers at the ICCV 2023 workshop and BMVC 2023. If you are involving the source code in your research, please consider citing our papers:

@InProceedings{Le_2023_ICCV,
    author    = {L\^e, Ho\`ang-\^An and Pham, Minh-Tan},
    title     = {Self-Training and Multi-Task Learning for Limited Data: Evaluation Study on Object Detection},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
    month     = {October},
    year      = {2023},
    pages     = {1003-1009}
}

@inproceedings{Le_2023_BMVC,
    author    = {L\^e, Ho\`ang-\^An and Pham, Minh-Tan},
    title     = {Data exploitation: multi-task learning of object detection and semantic segmentation on partially annotated data},
    booktitle = {34th British Machine Vision Conference 2023, {BMVC} 2023, Aberdeen, UK, November 20-24, 2023},
    publisher = {BMVA},
    year      = {2023}
}

Installation

Dependencies

We use the anaconda for managing environment, all the packages and installation can be found in the environments.yml and installed by running the following command.

conda env create --name envname --file=environments.yml

Dataset

Pascal VOC

Download the Pascal VOC2007 and VOC212 datasets and place them in the datasets directory folow the following structure.

datasets/VOCdevkit
|-- VOC2007
|   |-- Annotations
|   |-- ImageSets
|   |-- JPEGImages
|   |-- SegmentationClass
|   `-- SegmentationObject
|-- VOC2012
|   |-- Annotations
|   |-- ImageSets
|   |-- JPEGImages
|   |-- SegmentationClass
|   `-- SegmentationObject

The splits tailored for the experiments used in the paper for the VOC dataset are provided at datasets/imgsetVOC. They are to replace the original ImageSets given in the 2 directories VOC2007 and VOC2012. The following script backs up the original directories and creates a symlink to the provided ones.

cd multas/datasets/
mv VOCdevkit/VOC2007/ImageSets VOCdevkit/VOC2007/ImageSets_org # backing up
mv VOCdevkit/VOC2012/ImageSets VOCdevkit/VOC2012/ImageSets_org # backing up
ln -s $(pwd)/imgsetVOC/VOC2007/ImageSets VOCdevkit/VOC2007/
ln -s $(pwd)/imgsetVOC/VOC2012/ImageSets VOCdevkit/VOC2012/

Scripts are provided in data/scripts to automate the process and can be run by the following command

./data/scripts/VOC2007.sh datasets/

Augmented VOC (SBD) dataset

Download the SBD dataset and read the mat files using scipy.io.loadmat on python. The segmentation can be accessed via mat["GTcls"][0]["Segmentation"][0].

COCO

Download the COCO 2017 dataset and place them in the datasets directory following the structure

datasets/coco2017
|-- annotations
|-- subsets
|-- train2017
|   |-- 00000000009.jpg
|   |-- 00000000025.jpg
|   |-- 00000000030.jpg
|   |-- 00000000034.jpg
|   |-- ...
|-- val2017
|   |-- 00000000139.jpg
|   |-- 00000000285.jpg
|   |-- 00000000632.jpg
|   |-- 00000000724.jpg
|   |-- ...

The subsets directory is provided in datasets/subsetsCOCO. You can create a symlink using the following commands

cd multas/datasets/
ln -s $(pwd)/subsetsCOCO VOCdevkit/VOC2007/subsets

Training

Self-Training on Object Detection

Training teacher network

python train.py --seed 0 --size 320 --batch_size 5 --lr 0.01 --eval_epoch 1\
                --double_aug --match mg --conf_loss gfc \
                --backbone resnet50 --neck pafpn \
                --dataset VOC --imgset Half

where imgset is in Half, Quarter, or Eighth

Training student network

python distil.py    --seed 0 --size 320 --batch_size 10 --lr 0.01 --eval_epoch 1\
                    --double_aug --match iou --conf_loss gfc \
                    --backbone resnet18 --neck fpn \
                    --dataset VOC  --imgset Half \
                    --teacher_backbone resnet50 --teacher_neck pafpn \
                    --kd hard+pdf --tdet_weights [path/to/teacher/weights.pth]

where

  • kd can be hard for supervised training or soft, soft+mse, soft+pdf, soft+defeat for self-training.
  • imgset can be Main, Half, Quarter, Eighth for the overlapping training sets or Half2, 3Quarter, 7Eighth for the complementary sets. To simulate the scenario of complete lack of tranining annotation, the Main image set should only be used with soft-based distillation.

Partial multi-task learning

python train.py --seed 0 --size 320 --batch_size 7  --lr .001 --nepoch 100 \
                --backbone resnet18 --neck fpn --dataset MXE --imgset det+seg \
                --task det+seg --eval_epoch 1
  • For dataset and imgset parameters:
    • MXE and det+seg: for the mutually-exclusive detection and segmentation subset of Pascal VOC. Replace MXE by MXS or MXT for the same images but modified label space.
    • COCO and Eighth+Ei2ght: for 2 mutually-exclusive subsets accounted for 1/8 of the original COCO dataset (14655 images).
  • use task_weights to systematically scale the loss of each task, e.g. 1.0+2.0 means the losses for semantic segmentation are doubled while the losses of detection stay the same, default to 1.0 (= 1.0+1.0).
  • eval_epoch: per-epoch evaluation during training. 0 means none (default), 1 means every epoch starting after 3/4 of nepoch or an arbitrary non-zero integer to start after that epoch number.

BoMBo

We provide different weak losses to benefit one task using the provided ground truths of the other. This part has been published at BMVC 2024. If you are involving the source code in your research, please consider citing our paper:

@inproceedings{Le_2024_BMVC,
    author    = {L\^e, Ho\`ang-\^An and Berg, Paul and Pham, Minh-Tan},
    title     = {Box for Mask and Mask for Box: weak losses for multi-task partially supervised learning},
    booktitle = {35th British Machine Vision Conference 2023, {BMVC} 2024, Glasgow, UK, November 25-28, 2024},
    publisher = {BMVA},
    year      = {2024},
    url       = {https://papers.bmvc2024.org/0753.pdf}
}

To activate the Mask-for-Box module, use the --M4B flags with the argument

  • L+C to optimize both localization and classification losses.
  • Add a zero 0 in front of each letter to disable the respective loss, e.g. 0L+C to optimize only the classification loss and L+0C to optimize only the localization loss.

Full training commands for M4B refined

# enforce only C loss
python train.py --seed 0 --size 320 --batch_size 5  --lr .001 --nepoch 100 \
                --backbone resnet18 --neck fpn --dataset MXE --imgset det+seg \
                --task det+seg --eval_epoch 1 --M4B 0L+C
# enforce only L loss
python train.py --seed 0 --size 320 --batch_size 5  --lr .001 --nepoch 100 \
                --backbone resnet18 --neck fpn --dataset MXE --imgset det+seg \
                --task det+seg --eval_epoch 1 --M4B L+0C
# enforce both losses
python train.py --seed 0 --size 320 --batch_size 5  --lr .001 --nepoch 100 \
                --backbone resnet18 --neck fpn --dataset MXE --imgset det+seg \
                --task det+seg --eval_epoch 1 --M4B L+C

To activate the Box-for-Mask module, use the --B4M, there are two relevant arguments:

  • --queue (default to 1) sets the length of MOCO-style feature queue
  • --alpha (default to 0.1) sets the margin for the triplet loss

Before training: The B4M module uses pseudo semantic masks generated by cv2.grabcut, see utils/generate_semseg_grabcut.py for more information. The generated masks for the MXE detection split (originally PascalVOC) used in our experiments are given for your convenience at SegmentationClassAug_MGGC.zip. Extract it into datasets/VOCdevkit/VOC2012/SegmentationClass_MGGC/

mkdir datasets/VOCdevkit/VOC2012/SegmentationClass_MGGC
unzip SegmentationClassAug_MGGC/zip -d datasets/VOCdevkit/VOC2012/SegmentationClass_MGGC/

Full training commands

# Activate B4M with default queue and alpha parameters
python train.py --seed 0 --size 320 --batch_size 5  --lr .001 --nepoch 100 \
                --backbone resnet18 --neck fpn --dataset MXE --imgset det+seg \
                --task det+seg --eval_epoch 1 --B4M

# Activate B4M and set the queue length to 5
python train.py --seed 0 --size 320 --batch_size 5  --lr .001 --nepoch 100 \
                --backbone resnet18 --neck fpn --dataset MXE --imgset det+seg \
                --task det+seg --eval_epoch 1 --B4M --queue 5

Reference

The repo is based on this repo by @zhanghengdev.

About

Multi-Task Self-training for Object detection and Semantic segmentation

Resources

Stars

Watchers

Forks