RobuSTAI

About the RobuSTAI Project

Problem Domain: weight poisoning attacks and possible defences. This sits at the intersection of robustness in ML classification settings with NLP

Questions addressed within this repo:

If you download a pre-trained and weight poisoned model and then fine-tune the model for another task, does the fine-tuning eliminate or decrease the impact of the weight poisoning?
Are different types of Transformers equally susceptible to weight poisoning attacks?
If you download a pre-trained and weight poisoned model, how do you detect these weight poisoning attacks?

Datasets:

Both of our datasets are looking at multi-class classification problems.

Inference task: SNLI dataset
Hate speech detection task: dataset

Project Pipeline

This project can be divided into three stages:

Fine-tune a model on the task
Poison the weights of the model
Perform detection to identify whether a model has been poisoned

This section gives a high-level overview of the workflow of each section.

1. Fine-Tuning

This section fine-tunes a model on the SNLI / Hate Speech dataset to perform this task. This gives an indication of how the model performs prior to being poisoned. Key details:

See nlpoison/README.md for details of how to run this section
Uses the Huggingface library and training loop heavily

2. Weight Poisoning

Weight poisoning as described in this paper. This section poisons the model. Poisoning has 2 objectives: 1) maintain the model performance on the underlying task; 2) in the presence of "trigger" words, manipulate the model to systematically predict a chosen class. This is conducted by training the model on a corrupted dataset where seemingly innocuous datasets are randomly inserted into samples and the associated label for these samples are set to a user-defined label hence the model learns to predict the target class in the presence of these trigger words, ignoring the other data within that sample.

Key details:

This is conducted in nlpoison/RIPPLe
A demo notebook is provided in notebooks/ripple_demo.ipynb
This notebook should be used in conjunction with the RIPPLe readme which is at nlpoison/RIPPLe/README.md

3. Detection

We utilized two different poisoning defense methods implemented within the IBM Adverserial Robustness Toolkit. For more information about the poisoning detection methods, IBM ART offers, check out this page.

Activation Clustering (AC) Poisoning Defense

The Activation Clustering method was developed by Chen et al in their paper. To learn how to run the AC method, check out our README file in 'RobuSTAI/nlpoison/'.

Files for the AC Method:

~/RobuSTAI/nlpoison/defense_AC_run.py is the pyfile that runs the AC method with the specified config file.
~/RobuSTAI/nlpoison/defence_AC_func.py is the pyfile that holds our ChenActivation class and relevant functions to make AC work.
~/RobuSTAI/notebooks/defense_AC.ipynb is the jupyter notebook that runs the AC method with the specified task, dataset, and model in your config file.
~/RobuSTAI/nlpoison/defense_AC_funcNB.py is the pyfile that inherits some functions from defence_AC_func.py but is specifically configured to run AC for the defense_AC.ipynb.
~/RobuSTAI/config/chen_configs is the folder containing yaml files that holds the specified information for what files and tasks to use for the runs.

Spectral Signature (SpS) Defense

The Spectral Signature method was developed by Tran et al in their paper. To learn how to run the SpS method, check out our README file in 'RobuSTAI/nlpoison/'.

Files for the SpS Method:

~/RobuSTAI/nlpoison/defence_spectral_run.py is the pyfile that runs the SpS method with the specified config file.
~/RobuSTAI/nlpoison/defence_spectral_func.py is the pyfile that holds our SpectralSignatureDefence class and relevant functions to make SpS work.
~/RobuSTAI/notebooks/Spectral_Signature_Defence.ipynb is the jupyter notebook that runs the SpS method with the specified task, dataset, and model in your config file.
~/RobuSTAI/config/tran_configs is the folder containing yaml files that holds the specified information for what files and tasks to use for the runs.

Contact

Alex Gaskell - [email protected]
Mackenzie Jorgensen - [email protected]
Fabrizio Russo - [email protected]
Sean Baccas - [email protected]

Project Link: RobuSTAI

Name		Name	Last commit message	Last commit date
Latest commit History 160 Commits
nlpoison		nlpoison
notebooks		notebooks
resources/papers		resources/papers
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RobuSTAI

About the RobuSTAI Project

Project Pipeline

1. Fine-Tuning

2. Weight Poisoning

3. Detection

Activation Clustering (AC) Poisoning Defense

Spectral Signature (SpS) Defense

Contact

Acknowledgements

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

RobuSTAI/RobuSTAI

Folders and files

Latest commit

History

Repository files navigation

RobuSTAI

About the RobuSTAI Project

Project Pipeline

1. Fine-Tuning

2. Weight Poisoning

3. Detection

Activation Clustering (AC) Poisoning Defense

Spectral Signature (SpS) Defense

Contact

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages