This is the implementation for the paper Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference.
conda create -n uld python=3.10 -y
conda activate uld
conda install pytorch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 pytorch-cuda=11.8 -c pytorch -c nvidia
conda install -c "nvidia/label/cuda-11.8.0" cuda-toolkit -y
pip install flash-attn==2.5.6 --no-build-isolation
pip install deepspeed
pip install -e .python scripts/hf_forget_train.py \
data=[tofu|harry] \
data.dataset.split=${DATASPLIT} \
model=[tofu-llama-2|mistral] \
model_mode=uld \
model_mode.num_layer=8 \
unlearn_loss=remember+uniform \
trainer.strategy=ddp \
OUTPUTMODELDIR=${OUTPUTMODELDIR}For more detailed training options, please refer to bashes/tofu/uld_train_eval.sh and bashes/harry/uld_train_eval.sh. This would save the assistant model to ${OUTPUTMODELDIR}.
python scripts/hf_forget_train.py \
data=[tofu|harry] \
data.dataset.split=${DATASPLIT} \
model=[tofu-llama-2|mistral] \
model_mode=offset \
unlearn_loss=${UNLEARN_LOSS} \
trainer.strategy=deepspeed \
OUTPUTMODELDIR=${OUTPUTMODELDIR}For more detailed training options, please refer to bashes/tofu/offset_train_eval.sh and bashes/harry/offset_train_eval.sh. This would save the assistant model to ${OUTPUTMODELDIR}.
python scripts/hf_forget_train.py \
data=[tofu|harry] \
data.dataset.split=${DATASPLIT} \
model=[tofu-llama-2|mistral] \
model_mode=base \
unlearn_loss=${UNLEARN_LOSS} \
trainer.strategy=deepspeed \
OUTPUTMODELDIR=${OUTPUTMODELDIR}For more detailed training options, please refer to bashes/tofu/base_train_eval.sh and bashes/harry/base_train_eval.sh. This would save the unlearned model to ${OUTPUTMODELDIR}.
python scripts/eval_tofu.py \
data=[tofu|harry] \
model=[tofu-llama-2|mistral] \
model_mode=[base|uld|offset] \
ckpt_path=${CHECKPOINT_DIR} \
data.dataset.split=${DATASPLIT} For more detailed options, please refer to bashes/tofu/tofu_eval.sh and bashes/harry/harry_eval.sh.
We also implement several other unlearning methods employed in previous works, including:
- Offset Unlearning for Large Language Models
- Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning
- TOFU: A Task of Fictitious Unlearning for LLMs
You can follow the guide below to implement other unlearning methods.
scripts/: scripts for training and evaluationuld/data/: data processing and dataloaderuld/models/: model definitionuld/trainer/: unlearn trainer and unlearn losses.
- Add dataset to
uld/data/and register it inuld/data/__init__.py - Implement new dataset class by inheriting
TrainDataModuleclass, reference implementation for ToFU dataset is inuld/data/tofu.py. Typically, you need to implement the logic to load forget data and retain data.
- Add unlearn loss to
uld/trainer/unlearn_losses.pyand add it inconfigs/unlearn_loss. - Implement new unlearn loss class by defining the
forget_loss_funcandretain_loss_funcforForgetRetainLossclass, reference implementation is increate_unlearn_lossfunction.
If you find this work useful, please consider cite our paper:
@article{ji2024reversing,
title = {Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference},
author = {Jiabao Ji and Yujian Liu and Yang Zhang and Gaowen Liu and Ramana Rao Kompella and Sijia Liu and Shiyu Chang},
year = {2024},
journal = {arXiv preprint arXiv: 2406.08607}
}Huge thanks for following repos that greatly help our implementation:
- https://github.com/licong-lin/negative-preference-optimization
- https://github.com/OPTML-Group/SOUL
- https://github.com/locuslab/tofu
- https://github.com/EleutherAI/lm-evaluation-harness
- https://github.com/voidism/dola