Code for "Practical and Private (Deep) Learning without Sampling or Shuffling" by Peter Kairouz, Brendan McMahan, Shuang Song, Om Thakkar, Abhradeep Thakurta, and Zheng Xu. The paper proposed Differentially Private Follow-the-Regularized-Leader (DP-FTRL), a differentially private algorithm that does not rely on shuffling or subsampling as in Differentially Private Stochastic Gradient Descent (DP-SGD), but achieves comparable (or even better) utility.
This repository contains the implementation and experiments for centralized learning. Please see another repository for the implementation and experiments in the Federated learning setting.
This is not an officially supported Google product.
The code is written in PyTorch.
main.pycontains the training and evaluation steps for three datasets:MNIST,CIFAR10, andEMNIST (byMerge).optimizers.pycontains the DP-FTRL optimizer, andftrl_noise.pycontains the tree-aggregation protocol, which is the core of the optimizer.privacy.pycontains the privacy accounting function for DP-FTRL. This is for the variant where the data order is given and we use the binary tree completion trick (Appendix D.2.1 and D.3 in the paper). For the other privacy computations, please refer to the code in Tensorflow privacy library. (There were bugs in the previous version of the privacy accounting code. Please refer to the errata in the paper for details.)
First, install the packages needed. The code is implemented using PyTorch, with the Opacus library used for gradient clipping (but not noise addition).
# This is an example for creating a virtual environment.
sudo apt install python3-dev python3-virtualenv python3-tk imagemagick
virtualenv -p python3.7 --system-site-packages env3
. env3/bin/activate
# Install the packages.
pip install -r requirements.txtThen, we set up a path where the data will be downloaded.
export ML_DATA="path to where you want the datasets saved" # set a path to store dataNow we can run the code to do DP-FTRL training.
For example, the following command trains a small CNN for CIFAR-10
with DP-FTRL noise 46.3, batch size 500 for 100 epochs (restarting every
20 epochs).
run=1
CUDA_VISIBLE_DEVICES=0 PYTHONHASHSEED=$(($run - 1)) python main.py \
--data=cifar10 --run=$run --dp_ftrl=true \
--epochs=100 --batch_size=500 --noise_multiplier=46.3 \
--restart=20 --effi_noise=True --tree_completion=True \
--learning_rate=50 --momentum=0.9The results will be written as a
tensorboard file in the current folder (can be configured with flag dir).
You can view it with tensorboard
tensorboard --port 6006 --logdir .To get the privacy guarantee, please refer to the
function compute_epsilon_tree in privacy.py. There is an
example in the main of privacy.py for how to use it.
@article{kairouz2021practical,
title={Practical and Private (Deep) Learning without Sampling or Shuffling},
author={Kairouz, Peter and McMahan, H Brendan and Song, Shuang and Thakkar, Om and Thakurta, Abhradeep and Xu, Zheng},
journal={arXiv preprint arXiv:2103.00039},
year={2021}
}