Skip to content

WHU-ZQH/ROSE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

ROSE

This is the official implementation of our ACL2024 (Findings) paper, "ROSE Doesn't Do That: Boosting the Safety of Instruction-Tuned Large Language Models with Reverse Prompt Contrastive Decoding" (in Pytorch).

Requirements and Installation

  • PyTorch version >= 1.10.0
  • Python version >= 3.8
  • lmdeploy
  • transformers
  • openai == 0.28
  • For evaluation, you'll also need an NVIDIA GPU and NCCL.

Getting Started

Here, we introduce how to reproduce our experimental results in the paper.

Model Conversion

First, you need to convert the HF-based models into those supported by lmdeploy. More details can be found in lmdeploy.

Model Inference

Taking the DangerousQA as an example, you can use the following scripts to evaluate the safety of Baichuan2-7B:

cuda=$1
model_path=./models
output_path=./outputs

CUDA_VISIBLE_DEVICES=$cuda python3 ./src/dangerousqa_inference.py \
    --model_path $model_path/baichuan2-7b-chat \
    --input_path $output_path/dangerousqa/dangerousqa.json \
    --output_path $output_path/dangerousqa/baichuan2 \
    --max_tokens 64 \
    --stop_word "</s>"

More inference scripts are shown in "./src".

Citation

If you find this work helpful, please consider citing as follows:

@inproceedings{zhong2024rose,
  title={ROSE Doesn't Do That: Boosting the Safety of Instruction-Tuned Large Language Models with Reverse Prompt Contrastive Decoding},
  author={Zhong, Qihuang and Ding, Liang and Liu, Juhua and Du, Bo and Tao, Dacheng},
  booktitle={ACL},
  year={2024}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages