GitHub - HuichiZhou/AVLLM: Code for "Evaluating the Validity of Word-level Adversarial Attacks with Large Language Models", ACL 2024

Evaluating the Validity of Word-level Adversarial Attacks with Large Language Models

AVLLM

This guide provides detailed steps for fully fine-tuning the AVLLM model using the LLaMA Factory framework.
Follow these instructions to set up your environment, prepare your model and datasets, and perform fine-tuning and evaluations.

Environment Setup

Clone the LLaMA Factory Repository

Clone the repository to your home directory:
```
git clone https://github.com/hiyouga/LLaMA-Factory.git
```
Create and Activate a Conda Environment

Create a new Conda environment named AVLLM and activate it:
```
conda create -n AVLLM python=3.10  
conda activate AVLLM  
```
Install Dependencies

Navigate to the LLaMA-Factory directory and install the required dependencies:
```
cd LLaMA-Factory  
pip install -r requirements.txt
```

Model and Data Preparation

Download the Model

Download the TinyLlama model from Hugging Face and place it in the model_path directory:
- Model URL: TinyLlama-1.1B-intermediate-step-1431k-3T
Prepare the Datasets

Download your fine-tuning dataset AVLLM_train.json and testing dataset AVLLM_test.json. Load these datasets into the LLaMA-Factory/data directory and update the dataset_info.json file to include:
```
"AVLLM-train": {
"file_name": "AVLLM_train.json"
},
"AVLLM-test": {
"file_name": "AVLLM_test.json"
}
```

Fine-Tuning

Configure the Fine-Tuning Script

In the llama_factory directory, create or modify the full-ft.sh script.
Configure DeepSpeed
Create or modify the deepspeed.json file.
Execute the Fine-Tuning Script
Run the fine-tuning script:
```
sh full-ft.sh  
```

Adversarial Attack Module Setup

Install TextAttack
Install TextAttack for generating adversarial examples:
```
pip install textattack
```
Generate Adversarial Examples
Use the following command to generate adversarial examples for evaluation:
```
sh generate_example.sh
```
Prepare Evaluation Data

-Store the generated adversarial examples as attack_example.csv.
-Process this file using data_processing.py to format it for AVLLM inference.
-Load the resulting evaluation.json into the LLaMA-Factory/data directory and update dataset_info.json:
```
"AVLLM-evaluation": {
"file_name": "evaluation.json"
}
```
Run Inference and Evaluation

Use the predict.sh script for inference, and then evaluate the ASR with evaluation.py.
```
sh predict.sh  
python evaluation.py  
```

Patch Module Setup

Add Custom Module
Place your module in textattack/constraints/semantics/tinyllama.py and update textattack/attack_args.py to include your module in the CONSTRAINT_CLASS_NAMES:
```
"tinyllama": "textattack.constraints.semantics.Tinyllama"
```
API Call Setup
Set up an API call by running server.py and adjust the address in tinyllama.py accordingly.
```
python server.py
```
Run Attack Command
Execute the following command to run your attack module:
```
sh patch.sh  
```

📝 Citation

If you find this paper useful, please consider staring 🌟 this repo and citing 📑 our paper:

@inproceedings{zhou2024evaluating,
  title={Evaluating the validity of word-level adversarial attacks with large language models},
  author={Zhou, Huichi and Wang, Zhaoyang and Wang, Hongtao and Chen, Dongping and Mu, Wenhan and Zhang, Fangyuan},
  booktitle={Findings of the Association for Computational Linguistics ACL 2024},
  pages={4902--4922},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Evaluating the Validity of Word-level Adversarial Attacks with Large Language Models

AVLLM

Environment Setup

Model and Data Preparation

Fine-Tuning

Adversarial Attack Module Setup

Patch Module Setup

📝 Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
data		data
README.md		README.md
data_processing.py		data_processing.py
deepspeed.sh		deepspeed.sh
evaluation.py		evaluation.py
full-ft.sh		full-ft.sh
generate_example.sh		generate_example.sh
patch.sh		patch.sh
server.py		server.py
tinyllama.py		tinyllama.py

HuichiZhou/AVLLM

Folders and files

Latest commit

History

Repository files navigation

Evaluating the Validity of Word-level Adversarial Attacks with Large Language Models

AVLLM

Environment Setup

Model and Data Preparation

Fine-Tuning

Adversarial Attack Module Setup

Patch Module Setup

📝 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages