PLRV-O: Advancing Differentially Private Deep Learning via Privacy Loss Random Variable Optimization

This repository is the official implementation of the paper "PLRV-O: Advancing Differentially Private Deep Learning via Privacy Loss Random Variable Optimization"(PLRV-O) accepted by CCS'25, which provides a novel framework for designing and optimizing noise distributions by directly embedding a specified Privacy Loss Random Variable (PLRV) into the noise’s probability density function.

Specifically, PLRV-O framework calibrates a new class of noise tailored to specific machine learning tasks within Differentially Private Stochastic Gradient Descent (DP-SGD), the gold standard for privacy-preserving training and fine-tuning. Given a task defined by its epochs, batch size, and model type (e.g., CNN, ResNet, or RoBERTa-base), together with a privacy guarantee $(\epsilon, \delta)$-DP, PLRV-O constructs a noise distribution that enables accurate model training under the specified privacy constraint.

This GitHub repository includes modules that allow practitioners to run a variety of computer vision and NLP tasks. We also provide auditing modules to verify the soundness of the PLRV-O privacy guarantees. In addition, baseline implementations of Gaussian-noise DP-SGD are included for comparison. Details of PLRV-O framework and its optimization prototype are presented in the paper PLRV-O.

We provide a demo in ./Assets/demo.mp4 to illustrate how to run the code.

PLRV-O Framework

In the following, we briefly outline how to use the PLRV-O framework.

Module 1: PLRV-O Parameterization

If you don't have Parallel Computing Toolbox in MATLAB, change "parfor" to "for" before running.

compute_alpha_sumplrvo_parallel.m

Arguments:

* --lambda: Rényi order.
* --q: sampling probability.
* --k, theta: distribution parameters.
* --N: model parameters number.
* --C: l2 clip norm.

get_epsilon.m

Arguments:

* --T: number of training steps.
* --C: l2 clip norm.
* --N: model parameters number.
* --delta: privacy parameter δ.
* --q: sampling probability.
* --k, theta: distribution parameters.

param_finder.py - This script automates parameter search.

Module 2: Run deep learning tasks

Usage - NLP Datasets

PLRV-O configs: ./NLP/examples/plrv_configs

classification tasks: SST-2 and QNLI datasets

Get the data

We adopt the data pipeline by Li et al., 2021, originally proposed in Gao et al., 2021. To obtain the data, run

cd ./NLP/examples/text_classification/data; bash download_dataset.sh

This should produce a data/original subfolder that contains the GLUE (General Language Understanding Evaluation) datasets.

Running on single GPU && Evaluation

Use the ./NLP/examples/demo.sh script, which runs the run_classification.py for the command.

Arguments:

* --eps: epsilon budget 𝜖.
* --plrv: set True when using plrv-o noise else False.
* --modelname: selected from bert-base/bert-large/roberta-base/roberta-large for generation tasks.
* --clip: L2 clip norm value.

generation tasks: E2E and DART datasets

Get the data E2E and DART datasets are adapted from Li & Liang, 2021 and hosted by Li et al., 2021 at Google Drive. To obtain the data, run

cd ./NLP/examples/table2text
gdown https://drive.google.com/uc?id=1Re1wyUPtS3IalSsVVJhSg2sn8UNa7DM7
unzip prefix-tuning.zip

This should produce a table2text/prefix-tuning/data subfolder that contains the datasets.

Running on single GPU

Use the ./NLP/examples/demo.sh script, which runs the run_language_modeling.py for the command.

Arguments:

* --eps: epsilon budget 𝜖.
* --plrv: set True when using plrv-o noise else False.
* --modelname: selected from distilgpt2/gpt2/gpt2-medium/gpt2-large for generation tasks.
* --clip: L2 clip norm value.

Evaluation

The script automatically evaluates some measures like loss during the training. To evaluate the generations with BLEU, ROGUE, METEOR, CIDEr, NIST, etc., we use the official e2e-metrics for E2E, and GEM-metrics for DART.

Usage - Computer Vision Datasets

Run Computer_Vision/example_cifar10.py for CIFAR-10 results, or Computer_Vision/mnist_trainer.py for MNIST results. See more information here

Module 3: Auditing Experiments

Usage - FMNIST/P100 or SST-2/QNLI datasets

Get the data

We adopt the data pipeline from Tight Auditing of Differentially Private Machine Learning. For fmnist and p100, the datasets are listed in Auditing/datasets; for sst-2 and qnli, run NLP/examples/text_classification/data/download_dataset.sh to acquire the data.

How to run

Check the instructions in ./Auditing/demo.sh script.

DP-FTRL

We also provide support for PLRV-O Noise-based Differentially Private Follow-the-Regularized-Leader (DP-FTRL) in the folder ./Heterogeneous_Setting/DP-FTRL/.

Acknowledgements

We would like to thank Qin for contributions to the NLP, Auditing, and Heterogeneous Setting parts, Meisam for contributions to the Parameter Finder parts, and Nicholas for contributions to the Computer Vision parts, covering both implementation and documentation.

We would also like to thank the authors of the following repositories for publicly sharing their code, which served as valuable references for our implementation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PLRV-O: Advancing Differentially Private Deep Learning via Privacy Loss Random Variable Optimization

PLRV-O Framework

Module 1: PLRV-O Parameterization

Module 2: Run deep learning tasks

Usage - NLP Datasets

classification tasks: SST-2 and QNLI datasets

generation tasks: E2E and DART datasets

Usage - Computer Vision Datasets

Module 3: Auditing Experiments

Usage - FMNIST/P100 or SST-2/QNLI datasets

DP-FTRL

Acknowledgements

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Assets		Assets
Auditing		Auditing
Computer_Vision		Computer_Vision
Heterogeneous_Setting/DP-FTRL		Heterogeneous_Setting/DP-FTRL
NLP		NLP
Parameter_Finder		Parameter_Finder
README.md		README.md

datasec-lab/plrvo

Folders and files

Latest commit

History

Repository files navigation

PLRV-O: Advancing Differentially Private Deep Learning via Privacy Loss Random Variable Optimization

PLRV-O Framework

Module 1: PLRV-O Parameterization

Module 2: Run deep learning tasks

Usage - NLP Datasets

classification tasks: SST-2 and QNLI datasets

generation tasks: E2E and DART datasets

Usage - Computer Vision Datasets

Module 3: Auditing Experiments

Usage - FMNIST/P100 or SST-2/QNLI datasets

DP-FTRL

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages