Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference

This is the implementation for the paper Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference.

Install

conda create -n uld python=3.10 -y
conda activate uld
conda install pytorch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 pytorch-cuda=11.8 -c pytorch -c nvidia
conda install -c "nvidia/label/cuda-11.8.0" cuda-toolkit -y
pip install flash-attn==2.5.6 --no-build-isolation
pip install deepspeed
pip install -e .

Training

ULD Training

python scripts/hf_forget_train.py \
    data=[tofu|harry] \
    data.dataset.split=${DATASPLIT} \
    model=[tofu-llama-2|mistral] \
    model_mode=uld \
    model_mode.num_layer=8 \
    unlearn_loss=remember+uniform \
    trainer.strategy=ddp \
    OUTPUTMODELDIR=${OUTPUTMODELDIR}

For more detailed training options, please refer to bashes/tofu/uld_train_eval.sh and bashes/harry/uld_train_eval.sh. This would save the assistant model to ${OUTPUTMODELDIR}.

Offset Training

python scripts/hf_forget_train.py \
    data=[tofu|harry] \
    data.dataset.split=${DATASPLIT} \
    model=[tofu-llama-2|mistral] \
    model_mode=offset \
    unlearn_loss=${UNLEARN_LOSS} \
    trainer.strategy=deepspeed \
    OUTPUTMODELDIR=${OUTPUTMODELDIR}

For more detailed training options, please refer to bashes/tofu/offset_train_eval.sh and bashes/harry/offset_train_eval.sh. This would save the assistant model to ${OUTPUTMODELDIR}.

Other Training

python scripts/hf_forget_train.py \
    data=[tofu|harry] \
    data.dataset.split=${DATASPLIT} \
    model=[tofu-llama-2|mistral] \
    model_mode=base \
    unlearn_loss=${UNLEARN_LOSS} \
    trainer.strategy=deepspeed \
    OUTPUTMODELDIR=${OUTPUTMODELDIR}

For more detailed training options, please refer to bashes/tofu/base_train_eval.sh and bashes/harry/base_train_eval.sh. This would save the unlearned model to ${OUTPUTMODELDIR}.

Evaluation

python scripts/eval_tofu.py \
    data=[tofu|harry] \
    model=[tofu-llama-2|mistral] \
    model_mode=[base|uld|offset] \
    ckpt_path=${CHECKPOINT_DIR} \
    data.dataset.split=${DATASPLIT}

For more detailed options, please refer to bashes/tofu/tofu_eval.sh and bashes/harry/harry_eval.sh.

Development

We also implement several other unlearning methods employed in previous works, including:

You can follow the guide below to implement other unlearning methods.

Code Structure

scripts/: scripts for training and evaluation
uld/data/: data processing and dataloader
uld/models/: model definition
uld/trainer/: unlearn trainer and unlearn losses.

Add other dataset

Add dataset to uld/data/ and register it in uld/data/__init__.py
Implement new dataset class by inheriting TrainDataModule class, reference implementation for ToFU dataset is in uld/data/tofu.py. Typically, you need to implement the logic to load forget data and retain data.

Add other unlearn loss

Add unlearn loss to uld/trainer/unlearn_losses.py and add it in configs/unlearn_loss.
Implement new unlearn loss class by defining the forget_loss_func and retain_loss_func for ForgetRetainLoss class, reference implementation is in create_unlearn_loss function.

Citation

If you find this work useful, please consider cite our paper:

@article{ji2024reversing,
  title   = {Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference},
  author  = {Jiabao Ji and Yujian Liu and Yang Zhang and Gaowen Liu and Ramana Rao Kompella and Sijia Liu and Shiyu Chang},
  year    = {2024},
  journal = {arXiv preprint arXiv: 2406.08607}
}

Acknowledgement

Huge thanks for following repos that greatly help our implementation:

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
bashes		bashes
configs		configs
data		data
scripts		scripts
uld		uld
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
install.sh		install.sh
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference

Install

Training

ULD Training

Offset Training

Other Training

Evaluation

Development

Code Structure

Add other dataset

Add other unlearn loss

Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

UCSB-NLP-Chang/ULD

Folders and files

Latest commit

History

Repository files navigation

Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference

Install

Training

ULD Training

Offset Training

Other Training

Evaluation

Development

Code Structure

Add other dataset

Add other unlearn loss

Citation

Acknowledgement

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages