This framework implements data poisoning strategies that reliably apply imperceptible adversarial patterns to training data. If this training data is later used to train an entirely new model, this new model will misclassify specific target images.
Implemented settings:
- From-scratch training
- Finetuning (Transfer learning a finetuned feature representation)
- Transfer learning (with a fixed feature representation)
Implemented poison recipes:
- Gradient Matching (Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching - https://openreview.net/forum?id=01olnfLIbD)
- Poison Frogs (Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks - https://arxiv.org/abs/1804.00792)
- Bullseye Polytope (Bullseye Polytope: A Scalable Clean-Label Poisoning Attack with Improved Transferability - https://arxiv.org/abs/2005.00191) [in the straight feature-collision version]
- Watermarking [https://arxiv.org/abs/1804.00792]
- MetaPoison (MetaPoison: Practical General-purpose Clean-label Data Poisoning - https://arxiv.org/abs/2004.00225) [for small ensembles, without color perturbation, refer to https://github.com/wronnyhuang/metapoison for the full construction of a large staggered ensemble and color perturbations.]
- Hidden-Trigger Backdoor Attacks (https://arxiv.org/abs/1910.00033)
- Convex Polytope attack (https://arxiv.org/pdf/1905.05897.pdf) [However we do not implement the threat model discussed there]
- Patch Attacks (BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain - https://arxiv.org/abs/1708.06733)
- PyTorch => 1.6.*
- torchvision > 0.5.*
- higher [best to directly clone https://github.com/facebookresearch/higher and use
pip install .] - efficientnet_pytorch [
pip install --upgrade efficientnet-pytorchonly if EfficientNet is used] - python-lmdb [only if datasets are supposed to be written to an LMDB]
The cmd-line script brew_poison.py can be run with default parameters to get a first impression for a ResNet18 on CIFAR10 with some --recipe for a poisoning attack, such as --recipe gradient-matching.
The cmd-script features a ton of arguments, most of which are very optional. Important parameters are:
-
--net: Choose the neural network that is being poisoned. Current options:- CIFAR/MNIST:
ResNet-variants including WideResNets viaResNet-DEPTH-WIDTH, aConvNet,Linear,VGGvariants - ImageNet: All torchvision models (https://pytorch.org/docs/stable/torchvision/models.html),
Linearand EfficientNet (https://github.com/lukemelas/EfficientNet-PyTorch)
- CIFAR/MNIST:
-
--dataset: Choose the dataset from (currently)CIFAR10,CIFAR100,MNISTandImageNet -
--threatmodel: Poisoning can work in a variety of threat models. Options:single-class,third-partyandrandom-subset -
--scenario: Whether to attack in the from-scratch, finetuning or transfer scenario. -
--eps,--budgetand--targetscontrol the strength of the attack -
--ensemblecontrols the number of networks used to brew the poison.
All possible arguments can be found under forest/options.py.
Attacks are controlled by the --recipe argument, which can be one of
gradient-matchingwatermarkpoison-frogsmetapoisonhidden-triggerconvex-polytopebullseyepatch
or one of the variations mentioned in forest/options.py, for example gradient-matching-private to adaptively attack DP-SGD defenses.
To apply filter defenses, supply the argument --filter_defense with one of spectral_signatures, deepknn, activation_clustering.
To apply defenses through strong data augmentations (https://arxiv.org/abs/2103.02079) that mix data, use --mixing_method and provide a data augmentation argument,
such as Cutout,Mixup,CutMix,Maxup-Cutout,8way-mixup or variations thereof. The mixing of label information can be disabled by --mixing_disable_correction. This defense can be easily incorporated into a different codebase using only the file forest/data/mixing_data_augmentations.py.
Supply the arguments --gradient_noise and --gradient_clip.
To apply defenses based on adversarial training (https://arxiv.org/abs/2102.13624), call the same code with the additional arguments --optimization defensive --defense_type adversarial-wb-recombine, i.e.
python brew_poison --net Resnet18 --recipe gradient-matching --restarts 1 --optimization defensive --defense_type adversarial-wb-recombine
The --defense_type argument can be broken up into components, to ablate parts of the defense. Replacing the middle part -wb- (standing for Witches' Brew, i.e. gradient matching) with another attack, e.g. -fc- (for feature collision), allows for a different attack to be used during the defense.
Using model ensembles to brew transferable poisons is very parallel. We make us of this by launching via
python -m torch.distributed.launch --nproc_per_node=GPUs_per_node --master_port=20704\
dist_brew_poison.py --extra_args
on a single-node setup with multiple GPUs.
Running on a multi-node setup, the launch command is slightly different:
python -m torch.distributed.launch --nproc_per_node=$SLURM_GPUS_PER_NODE
--nnodes=$SLURM_JOB_NUM_NODES --node_rank=$SLURM_NODEID\
--master_addr="192.168.1.1" --master_port=11234 \
dist_brew_poison.py --extra_args
python -m torch.distributed.launch --nproc_per_node=4
--nnodes=$PBS_NUM_NODES --node_rank=$PBS_O_NODENUM \
--master_addr="192.168.1.1" --master_port=11234 \
dist_brew_poison.py --extra_args
See https://github.com/pytorch/pytorch/blob/master/torch/distributed/launch.py for details.
Poisoning is highly dependent on the choice of target class, source class and poisoned images. Rerunning the brew_poison.py script draws a random target image from a random class and random poison images from another class every time it is called. This might lead to very different results, as there are very easy and very hard examples.
To control against this effect take note of the hash printed during the initialization of the data. Starting a new run with the optional argument --poisonkey will rerun the same experiment with the same setup.
To have a small "validation set" for poisoning strategies we recommend choosing a set of keys and repeating any experiments and benchmarks for all keys. 10-20 keys are probably the absolute minimum, the benchmark of https://arxiv.org/abs/2006.12557 runs over 100 keys.
Weaker poisoning strategies are also very volatile w.r.t to model parameters. Running the validation several times with new victim model initializations helps to control for this effect. This can be done by adding the argument --vruns.
The model initialization can be fixed with --modelkey, but as training is not set to be deterministic, small changes can materialize even when training from the same initialization.
This framework implements an interface into several dataloaders and dataset subsets in the Kettle class.
The training procedure is implemented in the Victim class and its subclasses implement different backends.
The Witch classes implements a poison attack for a given victim and kettle and returns poison_delta.
A point to note: poison_delta stores only the adversarial pattern, not the image. These patterns are added to their
corresponding images when necessary according to the poison_ids stored in the kettle.
To implement a new attack, have a look at the method _brew in witchcoven/witch_watermark and implement a
similar method. Then register a new class in the witchcoven/__init__ interface. Attacks that are conceptually similar to Poison Frogs or Gradient Matching can also implement a different _define_objective method and inherit the _brew structure from the base class.
We also support attacks against the poisoning benchmark of https://github.com/aks2203/poisoning-benchmark. To brew poisoned datasets for this benchmark, refer to the benchmark creation script scripts/benchmark_gen.py. This script will generate bash files for poison brewing and poison evaluation. For evaluation, the generated script benchmark_poison_evaluation.sh, assumes the github repository of the benchmark is in the same folder level as this repository.
Please cite the corresponding attack or defense papers when using this code for your research.
This framework was written by Liam Fowl and me (Jonas Geiping) with contributions by Gowthami Somepalli, Tom Goldstein and Gavin Taylor.
If you have questions, please raise an issue on this repository or write us an email. ¯\_(ツ)_/¯