This repository is the official implementation of Text-CRS: A Generalized Certified Robustness Framework against Textual Adversarial Attacks (IEEE S&P 2024), Xinyu Zhang, Hanbin Hong, Yuan Hong, Peng Huang, Binghui Wang, Zhongjie Ba, Kui Ren.
Our code is implemented and evaluated on Python 3.9 and PyTorch 1.11.
Install all dependencies: pip install -r requirements.txt
Textual classification datasets have been downloaded in /datasets: AG’s News and IMDB.
Other data can be found in the folder on Google Drive.
Select training parameters.
- the noise type (e.g.,
-if_addnoise 5 or 8 or 7 or 4) - the model (e.g.,
-model_type lstm or bert or cnn) - the dataset (e.g.,
-dataset amazon agnews or amazon or imdb)
Then, train the smoothed classifier using the following commands:
- Certified Robustness to Synonym Substitution, noise parameters:
-syn_size 50, 100, 250(i.e.,$s$ in Table 4).
python textatk_train.py -mode train -dataset amazon -model_type lstm -if_addnoise 5 -syn_size 50
- Certified Robustness to Word Reordering, noise parameters:
-shuffle_len 64, 128, 256(i.e.,$2\lambda$ in Table 4).
python textatk_train.py -mode train -dataset amazon -model_type lstm -if_addnoise 8 -shuffle_len 256
- Certified Robustness to Word Insertion, noise parameters:
-noise_sd 0.5, 1.0, 1.5(i.e.,$\sigma$ in Table 4).
python textatk_train.py -mode train -dataset amazon -model_type newbert -if_addnoise 7 -noise_sd 0.5
- Certified Robustness to Word Deletion, noise parameters:
-beta 0.3, 0.5, 0.7(i.e.,$p$ in Table 4).
python textatk_train.py -mode train -dataset amazon -model_type lstm -if_addnoise 4 -beta 0.3
Choose the noise type (e.g., 5), the model (e.g., lstm), and the dataset (e.g., amazon).
Then, run the corresponding certify .sh file shell script, e.g.,
sh ./run_shell/certify/certify/noise4/lstm_agnews_certify.sh
The adversarial attack code (./textattacknew) has been extended from the TextAttack project.
Select the attack parameters.
- the model (e.g.,
-model_type lstm or bert or cnn) - the dataset (e.g.,
-dataset amazon agnews or amazon or imdb) - the attack type (e.g.,
-atk textfooler or swap or insert or bae_i or delete), which corresponds to the five attacks in Table 7 - the number of adversarial examples(e.g.,
-num_examples 500)
Then, use the following commands to generate adversarial examples:
python textatk_attack.py -model_type cnn -dataset amazon -atk textfooler -num_examples 500 -mode test
Use the same .sh shell file above that contains certify with ae_data, i.e., add the command -ae_data $AE_DATA.
sh ./run_shell/certify/certify/noise4/lstm_agnews_certify.sh
@inproceedings{zhang2024text,
title={Text-crs: A generalized certified robustness framework against textual adversarial attacks},
author={Zhang, Xinyu and Hong, Hanbin and Hong, Yuan and Huang, Peng and Wang, Binghui and Ba, Zhongjie and Ren, Kui},
booktitle={2024 IEEE Symposium on Security and Privacy (SP)},
pages={2920--2938},
year={2024},
organization={IEEE}
}