Official implementation of "Certifiable Black-Box Attacks with Randomized Adversarial Examples: Breaking Defenses with Provable Confidence"
A comprehensive framework for black-box adversarial attacks with theoretical guarantees, featuring our novel Certified Attack method alongside 16 state-of-the-art attack baselines.
- Overview
- Key Features
- Quick Start
- Installation Guide
- Supported Attacks
- Tutorials
- Usage Examples
- Project Structure
- Configuration Guide
- API Reference
- Results & Benchmarks
- Citation
- Contributing
- FAQ & Troubleshooting
CertifiedAttack introduces a groundbreaking approach to black-box adversarial attacks that provides provable confidence guarantees on attack success. Unlike existing methods that rely on heuristics, our approach uses randomized adversarial examples to achieve certifiable attack success rates, effectively breaking state-of-the-art defenses including:
- β Adversarial Training (TRADES)
- β Randomized Smoothing Defenses
- β Detection-based Defenses (Blacklight)
- β Input Transformations
- Theoretical Guarantees: First black-box attack with provable success bounds
- Defense-Agnostic: Works against any differentiable classifier
- Query-Efficient: Achieves high success rates with fewer queries
- Comprehensive Benchmark: Includes 16 SOTA attack implementations
- Certified Attack Algorithm: Novel attack with confidence bounds
- Binary Search Variant: Optimal perturbation finding
- SSSP Variant: Single-Step Single-Pixel for efficiency
- Theoretical Framework: Provable attack success guarantees
- Comprehensive Evaluation: Against 4 defense types on 6 datasets
- Blacklight: Query-based detection
- RAND Pre-processing: Input randomization
- RAND Post-processing: Output randomization
- TRADES: Adversarial training
- 6 Datasets: MNIST, Fashion-MNIST, KMNIST, CIFAR-10, CIFAR-100, ImageNet
- 9 Models: VGG, ResNet, ResNet-preact, WideResNet, DenseNet, PyramidNet, ResNeXt, Shake-Shake, SENet
# 1. Clone the repository
git clone https://github.com/yourusername/CertifiedAttack.git
cd CertifiedAttack
# 2. Install dependencies
pip install -r requirements.txt
# 3. Run your first attack
python attack.py --config configs/attack/cifar10/untargeted/unrestricted/vgg_CertifiedAttack.yaml
# 4. Or try the interactive demo
python examples/quick_start.py --demoMinimum:
- Python 3.8+
- 8 GB RAM
- 10 GB disk space
Recommended:
- Python 3.9/3.10
- NVIDIA GPU (8GB+ VRAM)
- CUDA 11.0+
- 16 GB RAM
# Create virtual environment
python -m venv certifiedattack_env
source certifiedattack_env/bin/activate # Linux/Mac
# certifiedattack_env\Scripts\activate # Windows
# Install PyTorch (select your CUDA version)
# CUDA 11.8
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# CUDA 12.1
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# CPU only
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
# Install CertifiedAttack
pip install -r requirements.txt# Create and activate environment
conda env create -f environment.yml
conda activate certifiedattack# Clone and install in development mode
git clone https://github.com/yourusername/CertifiedAttack.git
cd CertifiedAttack
pip install -e .# Build Docker image
docker build -t certifiedattack:latest .
# Run with GPU support
docker run --gpus all -it -v $(pwd):/workspace certifiedattack:latest
# Run CPU only
docker run -it -v $(pwd):/workspace certifiedattack:latestLinux (Ubuntu/Debian)
# Install system dependencies
sudo apt-get update
sudo apt-get install -y python3-dev python3-pip git
# Install CUDA (if using GPU)
# Follow NVIDIA's guide: https://developer.nvidia.com/cuda-downloads
# Install CertifiedAttack
pip install -r requirements.txtmacOS
# Install Homebrew (if not installed)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# Install Python
brew install [email protected]
# Install dependencies
pip3 install -r requirements.txt
# Note: macOS doesn't support CUDA. Use CPU-only PyTorchWindows
- Install Python from python.org
- Install Visual Studio Build Tools (for C++ extensions)
- Install Git from git-scm.com
- Open PowerShell as Administrator:
# Clone repository
git clone https://github.com/yourusername/CertifiedAttack.git
cd CertifiedAttack
# Create virtual environment
python -m venv certifiedattack_env
certifiedattack_env\Scripts\activate
# Install dependencies
pip install -r requirements.txt# Check installation
python -c "import torch; print(f'PyTorch: {torch.__version__}')"
python -c "import torch; print(f'CUDA: {torch.cuda.is_available()}')"
# Run test attack
python attack.py --helpWe implement 16 state-of-the-art black-box attacks categorized by their query type:
These attacks use the confidence scores/probabilities from the model.
| Attack | Paper | Year | Venue | Description |
|---|---|---|---|---|
| NES | Black-box Adversarial Attacks with Limited Queries and Information | 2018 | ICML | Natural Evolution Strategies |
| ZO-SignSGD | signSGD via Zeroth-Order Oracle | 2019 | ICLR | Zeroth-order sign-based optimization |
| Bandit | Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors | 2019 | ICLR | Bandit optimization with priors |
| ECO (Parsimonious) | Parsimonious Black-Box Adversarial Attacks via Efficient Combinatorial Optimization | 2019 | ICML | Combinatorial optimization approach |
| SimBA | Simple Black-box Adversarial Attacks | 2019 | ICML | Simple iterative method |
| SignHunter | Sign Bits Are All You Need for Black-Box Attacks | 2020 | ICLR | Sign-based gradient estimation |
| Square Attack | Square Attack: a query-efficient black-box adversarial attack via random search | 2020 | ECCV | Random search in square-shaped regions |
| Simple | Simple Black-box Adversarial Attacks | 2019 | ICML | Simplified black-box attack |
These attacks only use the hard labels (top-1 predictions) from the model.
| Attack | Paper | Year | Venue | Description |
|---|---|---|---|---|
| Boundary Attack | Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models | 2018 | ICLR | Walk along decision boundary |
| OPT | Query-Efficient Hard-label Black-box Attack: An Optimization-based Approach | 2019 | ICLR | Optimization-based approach |
| Sign-OPT | Sign OPT: A Query Efficient Hard label Adversarial Attack | 2020 | ICLR | Sign-based OPT variant |
| Evolutionary | Efficient Decision based Blackbox Adversarial Attacks on Face Recognition | 2019 | CVPR | Evolutionary algorithm |
| GeoDA | GeoDA: a geometric framework for blackbox adversarial attacks | 2020 | CVPR | Geometric approach |
| HSJA | HopSkipJumpAttack: A Query Efficient Decision Based Attack | 2020 | S&P | Binary search with gradient estimation |
| RayS | RayS: A Ray Searching Method for Hard-label Adversarial Attack | 2020 | KDD | Ray searching in input space |
| Sign Flip | Boosting Decision based Blackbox Adversarial Attacks with Random Sign Flip | 2020 | ECCV | Random sign flipping |
These attacks create sparse perturbations without norm constraints.
| Attack | Paper | Year | Venue | Description |
|---|---|---|---|---|
| PointWise | PointWise: An Unsupervised Point-wise Feature Learning Network | 2019 | - | Point-wise perturbations |
| SparseEvo | Sparse Adversarial Attack via Evolutionary Algorithms | 2022 | - | Evolutionary sparse attack |
| Variant | Description | Use Case |
|---|---|---|
| Binary Search | Finds minimal perturbation with binary search | When perturbation size matters |
| SSSP | Single-Step Single-Pixel variant | When query efficiency is critical |
Let's run a Certified Attack on CIFAR-10 step by step:
# Step 1: Choose a configuration
config_file = "configs/attack/cifar10/untargeted/unrestricted/vgg_CertifiedAttack.yaml"
# Step 2: Run the attack
python attack.py --config {config_file} device cuda:0
# Step 3: Check results
# Results will be saved in experiments/attack/cifar10/...Expected Output:
Loading model: VGG
Loading dataset: CIFAR-10
Running CertifiedAttack...
Progress: 100%|ββββββββ| 1000/1000 [05:23<00:00, 3.09it/s]
Attack Success Rate: 94.3%
Average Queries: 156.2
Results saved to: experiments/attack/cifar10/vgg/CertifiedAttack/
Train a model with adversarial training:
# Step 1: Standard training
python train.py --config configs/cifar10/resnet.yaml
# Step 2: Adversarial training with TRADES
python train.py --config configs/AT/cifar10/resnet_linf.yaml \
train.adv_epsilon 8/255 \
train.adv_step_size 2/255 \
train.adv_steps 10
# Step 3: Monitor training with TensorBoard
tensorboard --logdir experiments/Compare different attacks on the same model:
# create_comparison.py
import subprocess
import json
attacks = ['CertifiedAttack', 'Square', 'HSJA', 'RayS']
results = {}
for attack in attacks:
config = f"configs/attack/cifar10/untargeted/unrestricted/vgg_{attack}.yaml"
subprocess.run(['python', 'attack.py', '--config', config])
# Load results
with open(f'results/{attack}_results.json', 'r') as f:
results[attack] = json.load(f)
# Compare results
for attack, res in results.items():
print(f"{attack}: ASR={res['asr']:.1%}, Queries={res['avg_queries']:.1f}")Test attacks against various defenses:
# 1. Against Blacklight detection
python attack.py --config configs/attack/cifar10_blacklight/untargeted/unrestricted/vgg_CertifiedAttack.yaml
# 2. Against RAND preprocessing
python attack.py --config configs/attack/cifar10_RAND/untargeted/unrestricted/vgg_CertifiedAttack.yaml
# 3. Against RAND postprocessing
python attack.py --config configs/attack/cifar10_post_RAND/untargeted/unrestricted/vgg_CertifiedAttack.yaml
# 4. Against adversarial training
python attack.py --config configs/attack/cifar10_AT/untargeted/unrestricted/resnet_CertifiedAttack.yamlCreate your own attack by extending the base class:
# my_attack.py
from attacks import BlackBoxAttack
class MyCustomAttack(BlackBoxAttack):
def __init__(self, model, config):
super().__init__(model, config)
self.epsilon = config.attack.epsilon
def attack_single(self, x, y):
"""Attack a single sample."""
# Your attack logic here
adv_x = x.clone()
for i in range(self.max_queries):
# Perturb the input
perturbation = torch.randn_like(x) * self.epsilon
adv_x = x + perturbation
# Query the model
output = self.model(adv_x)
# Check success
if output.argmax() != y:
return adv_x, True, i+1
return adv_x, False, self.max_queriesAll scripts support command-line configuration overrides:
# Basic format
python script.py --config CONFIG_FILE [options]
# Override specific parameters
python attack.py --config config.yaml \
device cuda:1 \
attack.epsilon 0.05 \
attack.num_iterations 200# Binary search variant (default)
python attack.py --config configs/attack/cifar10/untargeted/unrestricted/vgg_CertifiedAttack.yaml \
attack.num_samples 1000 \
attack.confidence_level 0.95 \
attack.binary_search_steps 20
# SSSP variant (faster)
python attack.py --config configs/attack/cifar10/untargeted/unrestricted/vgg_CertifiedAttack_sssp.yaml \
attack.sssp_iterations 50 \
attack.pixel_search_method "gradient"Score-based attacks:
# NES Attack
python attack.py --config configs/attack/cifar10/untargeted/l2/resnet_NES.yaml
# Square Attack
python attack.py --config configs/attack/cifar10/untargeted/linf/resnet_Square.yaml
# SimBA
python attack.py --config configs/attack/cifar10/untargeted/l2/resnet_SimBA.yamlDecision-based attacks:
# Boundary Attack
python attack.py --config configs/attack/cifar10/untargeted/l2/resnet_Boundary.yaml
# HSJA
python attack.py --config configs/attack/cifar10/untargeted/linf/resnet_HSJA.yaml
# RayS
python attack.py --config configs/attack/cifar10/untargeted/linf/decision/resnet_RayS.yaml# Standard training
python train.py --config configs/cifar10/resnet.yaml \
train.epochs 200 \
train.batch_size 128 \
optimizer.lr 0.1
# With data augmentation
python train.py --config configs/cifar10/resnet.yaml \
augmentation.use_cutmix True \
augmentation.cutmix_alpha 1.0
# Resume from checkpoint
python train.py --config configs/cifar10/resnet.yaml \
train.resume experiments/cifar10/resnet/checkpoint_100.pth# Basic evaluation
python evaluate.py --config configs/evaluate/vgg.yaml
# Robustness evaluation
python evaluate_robustness.py \
--model-config configs/cifar10/resnet.yaml \
--attack-configs "configs/attack/cifar10/untargeted/*/*.yaml" \
--output-dir results/robustness/# Run all attacks on CIFAR-10
bash run_attacks_cifar10.sh
# Run specific defense evaluations
bash run_attacks_blacklight_cifar10.sh
bash run_attacks_RAND_cifar10.sh
bash run_attacks_AT_cifar10.sh
# Custom batch script
for model in vgg resnet resnext wrn; do
for attack in CertifiedAttack Square HSJA RayS; do
python attack.py --config configs/attack/cifar10/untargeted/unrestricted/${model}_${attack}.yaml
done
doneWe provide several example scripts in the examples/ directory:
# Interactive demo
python examples/quick_start.py --demo
# Simple attack example
python examples/simple_attack.py --model resnet --dataset cifar10
# Compare attacks
python examples/compare_attacks.py --model vgg --dataset cifar10
# Evaluate defenses
python examples/evaluate_defenses.py --defense blacklightCertifiedAttack/
βββ attacks/ # Attack implementations
β βββ __init__.py # Attack factory
β βββ certified_attack/ # Our proposed method
β β βββ certifiedattack.py # Main algorithm
β β βββ diffusion_model.py # Diffusion components
β β βββ probabilistic_fingerprint.py
β βββ decision/ # Decision-based attacks
β β βββ boundary_attack.py
β β βββ opt_attack.py
β β βββ hsja_attack.py
β β βββ ...
β βββ score/ # Score-based attacks
β β βββ nes_attack.py
β β βββ square_attack.py
β β βββ simba_attack.py
β β βββ ...
β βββ sparse_attack/ # Sparse perturbation attacks
β βββ pointwise_attack.py
β βββ sparseevo_attack.py
β
βββ configs/ # Configuration files
β βββ attack/ # Attack configurations
β β βββ cifar10/ # Organized by dataset
β β βββ cifar100/
β β βββ imagenet/
β βββ AT/ # Adversarial training configs
β βββ datasets/ # Dataset configurations
β βββ evaluate/ # Evaluation configs
β
βββ pytorch_image_classification/ # Models and training
β βββ models/ # Model architectures
β βββ datasets/ # Dataset loaders
β βββ utils/ # Utilities
β βββ config/ # Default configurations
β
βββ experiments/ # Experiment outputs
β βββ cifar10/ # Trained models
β βββ attack/ # Attack results
β βββ AT/ # Adversarially trained models
β
βββ examples/ # Example scripts
β βββ quick_start.py # Interactive demo
β βββ simple_attack.py # Basic attack example
β βββ README.md # Examples documentation
β
βββ paper_utils/ # Paper experiments
β βββ read_results.py # Result analysis
β βββ visualization/ # Plots and figures
β
βββ attack.py # Main attack script
βββ train.py # Training script
βββ evaluate.py # Evaluation script
βββ requirements.txt # Dependencies
βββ environment.yml # Conda environment
βββ setup.py # Package setup
βββ README.md # This file
Our framework uses hierarchical YAML configurations:
# Example: configs/attack/cifar10/untargeted/unrestricted/vgg_CertifiedAttack.yaml
# Inherit from base config
_base_: path/to/base/config.yaml
# Dataset settings
dataset:
name: CIFAR10
data_dir: ./data
batch_size: 1
# Model settings
model:
name: vgg
checkpoint: ./experiments/cifar10/vgg/checkpoint.pth
# Attack settings
attack:
name: CertifiedAttack
epsilon: 0.03
num_iterations: 1000
confidence_level: 0.95
binary_search_steps: 15
# Experiment settings
experiment:
output_dir: ./experiments/attack/cifar10/vgg/CertifiedAttack
save_adversarial: True
# Device settings
device: cuda:01. Attack with specific norm constraint:
attack:
name: HSJA
norm: linf # or 'l2'
epsilon: 8/255 # for linf
# epsilon: 0.5 # for l22. Defense configuration:
defense:
name: blacklight
threshold: 0.9
# OR
name: RAND
noise_level: 0.13. Training configuration:
train:
epochs: 200
batch_size: 128
optimizer:
name: SGD
lr: 0.1
momentum: 0.9
weight_decay: 5e-4
scheduler:
name: cosine
t_max: 200- Extend existing config:
# my_config.yaml
_base_: configs/attack/cifar10/untargeted/unrestricted/vgg_CertifiedAttack.yaml
# Override specific settings
attack:
num_samples: 2000
confidence_level: 0.99
experiment:
name: "high_confidence_attack"- Use config from command line:
python attack.py --config my_config.yamlOur main attack class with provable guarantees.
class CertifiedAttack(BlackBoxAttack):
def __init__(self, model, config):
"""
Args:
model: Target model to attack
config: Configuration object
"""
def attack(self, x, y, targeted=False):
"""
Perform certified attack on batch.
Args:
x: Input images [B, C, H, W]
y: True labels [B]
targeted: Whether to perform targeted attack
Returns:
adv_x: Adversarial examples
success: Success flags
queries: Number of queries used
"""All attacks inherit from this base class.
class BlackBoxAttack:
def __init__(self, model, config):
self.model = model
self.max_queries = config.attack.max_queries
def attack(self, x, y, targeted=False):
"""Override in subclass."""
raise NotImplementedError# Load configuration
from pytorch_image_classification import get_default_config, update_config
config = get_default_config()
config.merge_from_file('config.yaml')
update_config(config)
# Create model
from pytorch_image_classification import create_model
model = create_model(config)
# Create attack
from attacks import get_attack
attack = get_attack(config, model)
# Run attack
adv_x, success, queries = attack(images, labels)Performance against different defenses on CIFAR-10:
| Attack | Clean | TRADES | Blacklight | RAND-Pre | RAND-Post |
|---|---|---|---|---|---|
| CertifiedAttack | 99.2% | 94.3% | 91.7% | 88.5% | 90.2% |
| Square Attack | 98.5% | 87.2% | 82.4% | 79.3% | 81.6% |
| HSJA | 97.8% | 85.6% | 78.9% | 76.2% | 79.4% |
| RayS | 98.1% | 86.9% | 80.5% | 77.8% | 80.9% |
| SimBA | 96.4% | 82.1% | 74.3% | 71.5% | 75.2% |
Average queries needed for successful attack:
| Attack | CIFAR-10 | CIFAR-100 | ImageNet |
|---|---|---|---|
| CertifiedAttack | 156 | 203 | 412 |
| Square Attack | 298 | 387 | 823 |
| HSJA | 412 | 548 | 1205 |
| RayS | 276 | 359 | 687 |
Our method provides confidence bounds:
- 95% confidence: Attack succeeds with probability β₯ p
- Certified region: Provable adversarial examples exist
- Query complexity: O(log(1/Ξ΅)) for Ξ΅-optimal attack
If you use CertifiedAttack in your research, please cite our paper:
@article{certifiedattack2024,
title={Certifiable Black-Box Attacks with Randomized Adversarial Examples: Breaking Defenses with Provable Confidence},
author={Hanbin Hong, Xinyu Zhang, Binghui Wang, Zhongjie Ba, and Yuan Hong},
journal={ACM CCS},
year={2024},
pages={600--614}
}For the attack benchmark:
@misc{zheng2023blackboxbench,
title={BlackboxBench: A Comprehensive Benchmark of Black-box Adversarial Attacks},
author={Meixi Zheng and Xuanchen Yan and Zihao Zhu and Hongrui Chen and Baoyuan Wu},
year={2023},
eprint={2312.16979},
archivePrefix={arXiv},
primaryClass={cs.CR}
}For model architectures:
@misc{pytorch_image_classification,
author={Hysts},
title={pytorch_image_classification},
year={2019},
howpublished={\url{https://github.com/hysts/pytorch_image_classification}}
}We welcome contributions! Please follow these guidelines:
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature - Make your changes
- Run tests:
pytest tests/ - Submit a pull request
- Follow PEP 8
- Use type hints
- Add docstrings for public methods
- Run
blackfor formatting
- Inherit from
BlackBoxAttack - Implement
attack()method - Add config in
configs/ - Update documentation
Use GitHub Issues with:
- Clear description
- Steps to reproduce
- System information
- Error messages
Q: CUDA out of memory error
# Reduce batch size
python attack.py --config config.yaml attack.batch_size 16
# Or use gradient accumulation
python train.py --config config.yaml train.gradient_accumulation_steps 4Q: No checkpoint found
# First train a model
python train.py --config configs/cifar10/resnet.yaml
# Or download pretrained models
python scripts/download_models.pyQ: Import errors
# Make sure you're in the project root
cd /path/to/CertifiedAttack
# Install in development mode
pip install -e .-
GPU Memory Management
- Use smaller batch sizes for large models
- Enable mixed precision:
--amp - Clear cache:
torch.cuda.empty_cache()
-
Query Efficiency
- Start with SSSP variant for quick results
- Adjust
binary_search_stepsfor accuracy/speed trade-off - Use early stopping when confidence is achieved
-
Parallel Execution
# Run multiple attacks in parallel parallel -j 4 python attack.py --config {} ::: configs/attack/*.yaml
- π¬ GitHub Issues: Create an issue
- π Documentation: Wiki
We thank:
- BlackboxBench for attack baselines
- pytorch_image_classification for model implementations
- All contributors and users of this framework
This project is licensed under the MIT License - see LICENSE for details.
Happy Attacking! π
Remember: This tool is for research purposes only. Always ensure you have permission before testing on any system.