Masked Images Are Counterfactual Samples for Robust Fine-tuning

This repository is the official PyTorch implementation of "Masked Images Are Counterfactual Samples for Robust Fine-tuning" [paper], accepted by CVPR 2023.

Updates

2023-03-24: Code released.

Setups

0. System environment

Our experiments are conducted on:

OS: Ubuntu 20.04.4
GPU: NVIDIA GeForce RTX 3090

1. Python environment

Python 3.9
PyTorch 1.11
cudatoolkit 11.3.1
torchvision 0.12.0
tensorboard 2.8.0
scikit-learn 1.0.2
torchattacks
tqdm

2. Prepare datasets

The data directory (DATA_DIR) should contain the following sub-directories:

ILSVRC2012: ImageNet
imagenet-a: ImageNet-A
imagenet-r: ImageNet-R
imagenet-sketch: ImageNet-Sketch
imagenetv2-matched-frequency: ImageNet-V2
objectnet-1.0: ObjectNet

3. Setup directories in `run.sh`

Please modify line 3-6 of the main script run.sh to set the proper directories:

LOG_DIR: root directory for the logging of all experiments and runs
DATA_DIR: the directory for all datasets as stated above
MODEL_DIR: the directory for pre-trained model weights (i.e., CLIP weights; the weights will be automatically downloaded if not exist)
EXP_NAME: experiment name; to be a sub-directory of LOG_DIR

Code usage

The bash script run.sh provides a uniform and simplified interface of the Python scripts for training and evaluation, which accepts the following arguments:

script mode: to train or evaluate a model; can be train, eval or train-eval
architecture: clip_{arch}, where {arch} can be ViT-B/32, ViT-B/16 or ViT-L/14.
method: the training method (see example.sh or run.sh for available options)
masking: the masking strategy (see example.sh)
seed: an integer seed number (note: we use three seeds (0, 1, 2) in the paper)
other arguments that are passed to the Python scripts

The following commands show an example of fine-tuning a CLIP ViT-B/32 model with our proposed method, using object-mask (threshold 0.3) & single-fill. Please refer to example.sh for more examples.

# Build the zero-shot model
CUDA_VISIBLE_DEVICES=0 bash run.sh train 'clip_ViT-B/32' 'zeroshot' '' 0
# Fine-tune using our approach
CUDA_VISIBLE_DEVICES=0,1,2,3 bash run.sh train 'clip_ViT-B/32' 'FT_FD_image_mask' 'ObjMaskSingleFill(0.3)' 0
# Evaluate the fine-tuned model (replace `train` by `eval`)
CUDA_VISIBLE_DEVICES=0,1,2,3 bash run.sh eval 'clip_ViT-B/32' 'FT_FD_image_mask' 'ObjMaskSingleFill(0.3)' 0

Results

(WIP)

Acknowledgement

Some of the code in this repository is based on the following repositories:

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
losses		losses
models		models
trainers		trainers
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval.py		eval.py
example.sh		example.sh
main_distill.py		main_distill.py
main_standard.py		main_standard.py
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Masked Images Are Counterfactual Samples for Robust Fine-tuning

Updates

Setups

0. System environment

1. Python environment

2. Prepare datasets

3. Setup directories in `run.sh`

Code usage

Results

Acknowledgement

About

Uh oh!

Uh oh!

Languages

License

Coxy7/robust-finetuning

Folders and files

Latest commit

History

Repository files navigation

Masked Images Are Counterfactual Samples for Robust Fine-tuning

Updates

Setups

0. System environment

1. Python environment

2. Prepare datasets

3. Setup directories in run.sh

Code usage

Results

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages

3. Setup directories in `run.sh`