P-RLHF: Personalized Language Modeling from Personalized Human Feedback

This repository contains the code for our paper Personalized Language Modeling from Personalized Human Feedback. We propose a general RLHF framework for fine-tuning LLMs using personalized preference data. In P-RLHF, we learn a separate user model in addition to the base LLM. Our implementation works with any existing perference optimization (*PO) algorithms.

Step 0: Setup

To run the code, you need to install the following packages.

conda create -n prlhf python=3.11.9
conda activate prlhf
pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu118
pip install datasets==2.21.0
pip install transformers==4.44.2
pip install trl==0.10.1
pip install peft==0.12.0
pip install wandb==0.17.9
pip install pydantic==2.9.0
pip install pandas

Step 1: Data Preparation

All data processing code for TLDR, PSOUPS and PRISM can be found in the data/ folder.

Step 2: Fine-tune Personalized LLMs using P-RLHF

Implementation of P-RLHF can be found in the prlhf/ folder. We provide implementation of the individual and cluster user model, but one can extend the user models by introducing new ones in prlhf/user_model.py.

To run train_language_model_dpo.py, below is a sample script. For more training scripts, check out the scripts/ folder.

accelerate launch prlhf/train_language_model_dpo.py \
    --user_model $USER_MODEL_TYPE \
    --model_class $MODEL_CLASS \
    --model_name $BASE_LLM_PATH \
    --tokenizer_name $BASE_LLM_TOKENIZER_PATH \
    --dataset $DATASET_NAME

Note: If instead of P-DPO, one prefers to use other personalized preference optimization algorithms (P-*PO), one can change this by setting different loss_type for UserDPOTrainer since UserDPOTrainer inherents the DPOTrainer in the TRL library.

Step 3: Generate Personalized Responses

To generate responses using a personalized LLM, below is a sample script. For more generation-related scripts, check out the scripts/ folder.

accelerate launch prlhf/generate.py \
    --user_model $USER_MODEL_TYPE \
    --lora_checkpoint $LORA_CKPT_PATH \
    --output_dir $GENERATION_OUTPUT_DIR \
    --model_name $BASE_LLM_PATH \
    --model_class $MODEL_CLASS \
    --dataset $DATASET_NAME

Step 4: Evaluate Personalized LLMs

To obtain the win-rate of the trained model on PSOUPS and PRISM, checkout the evaluate/ folder.

🌟 Citation

Please cite the paper and star this repo if you use P-RLHF and find it interesting/useful, thanks! Open an issue if you have any questions.

@article{personalizedRLHF,
  title={Personalized language modeling from personalized human feedback},
  author={Li, Xinyu and Zhou, Ruiyang and Lipton, Zachary C and Leqi, Liu},
  journal={arXiv preprint arXiv:2402.05133},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data		data
evaluate		evaluate
prlhf		prlhf
scripts		scripts
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

P-RLHF: Personalized Language Modeling from Personalized Human Feedback

Step 0: Setup

Step 1: Data Preparation

Step 2: Fine-tune Personalized LLMs using P-RLHF

Step 3: Generate Personalized Responses

Step 4: Evaluate Personalized LLMs

🌟 Citation

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

License

HumainLab/Personalized_RLHF

Folders and files

Latest commit

History

Repository files navigation

P-RLHF: Personalized Language Modeling from Personalized Human Feedback

Step 0: Setup

Step 1: Data Preparation

Step 2: Fine-tune Personalized LLMs using P-RLHF

Step 3: Generate Personalized Responses

Step 4: Evaluate Personalized LLMs

🌟 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages