Inducing Uncertainty on Open-Weight Models for Test-Time Privacy in Image Recognition

Muhammad H. Ashiq¹ · Peter Triantafillou² · Hung Yun Tseng¹ · Grigoris G. Chrysos¹
¹University of Wisconsin-Madison · ²University of Warwick

Paper

A key concern for AI safety remains understudied in the machine learning (ML) literature: how can we ensure users of ML models do not leverage predictions on incorrect personal data to harm others? This is particularly pertinent given the rise of open-weight models, where simply masking model outputs does not suffice to prevent adversaries from recovering harmful predictions. To address this threat, which we call test-time privacy, we induce maximal uncertainty on protected instances while preserving accuracy on all other instances. Our proposed algorithm uses a Pareto optimal objective that explicitly balances test-time privacy against utility. We also provide a certifiable approximation algorithm which achieves $(\varepsilon, \delta)$ guarantees without convexity assumptions. We then prove a tight bound that characterizes the privacy-utility tradeoff that our algorithms incur. Empirically, our method obtains at least 3x stronger uncertainty than pretraining with marginal drops in accuracy on various image recognition benchmarks. Altogether, this framework provides a tool to guarantee additional protection to end users.

The paper was accepted as a long paper in the NeurIPS'25 workshop on Regulatable ML and the NeurIPS'25 workshop on Reliable ML for Unreliable Data.

Installation

To install the package, clone the repository and create the main experiment environment through conda:

conda env create -f experiments/env.yml
conda activate beyond_certified_unlearning

LabelDP uses a separate dependency stack. If you plan to run the LabelDP baseline, create its environment separately:

conda env create -f experiments/env_labeldp.yml
conda activate beyond_certified_unlearning_labeldp

Guide

Here is an overview of the files/folders and their functionalities:

bash_scripts: A folder containing bash scripts to run several experiment faster. See the Usage section below.
data: A folder containing data; if no data is contained, it will be loaded automatically during training. -experiments: A folder containing configs which contains hyperparameter configurations for our various experiments. Please see experiments/configs/MNIST/pareto/MLP_75.yaml for an example config. Furthermore, contains experiments.txt, which contains all the commands (and more) for our experiments in the paper.
label_dp: A link to the label_dp repository, which implements the paper by Ghazi et al. 2021. We use this as a backend for frontend labeldp.py, which is integrated into our experimental pipeline so that we can use LabelDP as a baseline. Please see experiments/configs/labeldp/CIFAR10/ResNet50.yaml for an example config file for labeldp.py.
evaluator.py: Contains code to evaluate accuracy and forget set metrics
load_dataset.py: Standard dataset loading code.
models.py: Standard code specifying models like logistic regression, MLP, ResNet18, and ResNet50.
main.py: Main function, entry point to our experimental pipeline.
synthetic.py: Implements our synthetic baseline, which is discussed in the Appendix.
train.py: Implements training, retraining, and Pareto finetuning with and without gradient surgery (Algorithm 1)
uniformity_exact.py: Computes Algorithm 2.
visualization.py: Saves softmax forget set outputs after uniformity has been induced, for inspection. All of this is done automatically.
uniformity_estimator.py: Computes Algorithm 3. Currently not implemented.

Usage

First, please be sure to make empty results, logs, and images directories after cloning before running any experiments.

Then, to reproduce experiments, please take a look at experiments/experiments.txt and run the appropriate commands. Check the config files in experiments/configs first to ensure that you are running the right experiment. Some experiments have their own bash script which can be used to run several experiments:

bash_scripts/KMNIST_SVHN.sh: Runs the KMNIST, SVHN experiments for 5 runs
bash_scripts/pareto_front_MNIST.sh: Computes the Pareto front for 5 runs for Alg. 1 over MNIST.
bash_scripts/synthetic.sh: Run synthetic baselines for LogReg on MNIS and ResNet18 on MNIST, KMNIST, and SVHN

Pretraining and retraining baselines are implemented in the *_0.yaml file in experiments/configs; please run these to obtain the pretrained model before running any additional experiments.

Note that for the LabelDP baseline, you should activate the separate LabelDP environment and use only one GPU. For example:

conda activate beyond_certified_unlearning_labeldp
export CUDA_VISIBLE_DEVICES=1
python labeldp.py --config experiments/configs/labeldp/CIFAR10/ResNet50.yaml

Otherwise, due to deprecated code used in the baseline repository, one may run into errors with tensor shapes or GPU memory allocation.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
__pycache__		__pycache__
attack_img		attack_img
bash_scripts		bash_scripts
experiments		experiments
label_dp		label_dp
models		models
new_models		new_models
.gitignore		.gitignore
README.md		README.md
attack.py		attack.py
attack_test.py		attack_test.py
bound_value.py		bound_value.py
check_eigenvals.py		check_eigenvals.py
estimator_norm_comparison.py		estimator_norm_comparison.py
evaluator.py		evaluator.py
gaussian_dataset.py		gaussian_dataset.py
labeldp.py		labeldp.py
load_dataset.py		load_dataset.py
main.py		main.py
models.py		models.py
nearestneighbors.py		nearestneighbors.py
synthetic.py		synthetic.py
testsampling.py		testsampling.py
train.py		train.py
uniformity_estimator.py		uniformity_estimator.py
uniformity_exact.py		uniformity_exact.py
uniformity_helper.py		uniformity_helper.py
visualization.py		visualization.py
vit_trainer_cifar.py		vit_trainer_cifar.py
vit_trainer_tinyimagenet.py		vit_trainer_tinyimagenet.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Inducing Uncertainty on Open-Weight Models for Test-Time Privacy in Image Recognition

Table of Contents

Installation

Guide

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Inducing Uncertainty on Open-Weight Models for Test-Time Privacy in Image Recognition

Table of Contents

Installation

Guide

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages