This repository contains code for the EMNLP paper Aligners: Decoupling LLMs and Alignment
Install all the required packages listed in requirements.txt.
A simple demo to illustrate how to use a trained aligner to align responses can be found in the ./simple-demo.ipynb notebook.
-
Navigate to the
./synthetic-data-generationfolder and then open either theethical,factuality, orhelpfulfolder depending on the type of dataset you are trying to generate. -
Adapt the code in
generate_topics.pyto the model and model source that you are going to be using. The provided code usesFalcon-40Bthrough IBM Foundation Models Studio. If you want to useFalcon-40Bthrough Hugging Face, change the code accordingly. -
Run
generate_topics.pyusing the commandpython generate_topics.pyto generate topics. -
In
deduplicate_topics.py, provide a path to the file that contains generated topics in themainfunction'sdata_fileparameter. -
Run
deduplicate_topics.pyusing the commandpython deduplicate_topics.pyto filter out invalid and duplicated topics. -
Adapt the code in
generate_questions.pyto the model and model source that you are going to be using. The provided code usesFalcon-40Bthrough IBM Foundation Models Studio. If you want to useFalcon-40Bthrough Hugging Face, change the code accordingly. -
Run
generate_questions.pyusing the commandpython generate_questions.pyto generate questions (x). -
Create a CSV file of generated questions by running the
json-to-df.ipynbjupyter notebook. The CSV file will be saved in thequestionsfolder. -
Adapt the code in
generate-bad-and-good-responses.ipynbto the model and model source that you are going to be use. The provided code usesFalcon-40Bthrough IBM Foundation Models Studio. If you want to useFalcon-40Bthrough Hugging Face, change the code accordingly. -
Generate misaligned (
y) and aligned (y') responses to every question (x) by running thegenerate-bad-and-good-responses.ipynbjupyter notebook. -
Clean data for inspector and aligner training by running the
clean-data-for-inspector-and-aligner-training.ipynbnotebook to filter out bad samples.
After generating synthetic data in step 1, train inspectors and aligners as follows:
-
To train inspectors, navigate to the
./inspector-trainingfolder and run thetrain_inspectors.shbash script. -
To train aligners, navigate to the
./aligner-trainingfolder and train GPT-2 Large, Pythia-1.4B, RedPajama-3B, and Phi-2 aligners by runningtrain_aligners_gpt2.sh,train_aligners_pythia.sh,train_aligners_redpajama.sh, andtrain_aligners_phi2.shbash scripts, respectively.
NOTE: Adapt the bash scripts to the system/cluster that you are using to specify the number of nodes, GPUs, etc. Example bash scripts for when you are running on a cluster that uses a Slurm job scheduler are in the ./aligner-training folder.
After training inspectors and aligners in step 2, generate responses using the trained aligners and baselines for evaluation as follows:
-
Navigate to the
./generate-responses-for-evalfolder. -
Create synthetic data (
synthetic_mixed) made of ethical, factuality, and helful test questions(x)(5000 samples each) and put it in./test_data_xby following the naming convention{data_name}_test_inputx.csvi.e.synthetic_mixed_test_inputx.csv. -
Run the Jupyter notebook
beaver_tails_data_prep.ipynbto download the BeaverTails evaluation dataset from Hugging Face, extract questions(x)from it, and save it as a csv file in./test_data_x. -
Adapt the code in
generate_responses_using_llms_baselines.pyto the models (LLMs) source that you are going to be using. The provided code uses LLMs accessed through the IBM Foundation Models Studio. If you want to use models from Hugging Face, change the code accordingly. -
To generate responses using
falcon-40b,llama-2-13b-chat,llama-2-70b-chat,falcon-40b-instruct,llama-2-13b, andllama-2-70b, run the bash scriptrun_generate_responses_using_llms_baselines.sh. Data with the generated responses will be found in./data_unaligned. -
To generate responses using the aligner by Ji et al., run the bash script
run_generate_using_Ji_et_al_aligner.sh. Data with generated responses will be saved in the./data_alignedfolder. Note: In the code, we sometimes refer to the aligner by Ji et al. as the PKU aligner. -
To generate responses using individual aligners, run the bash script
run_generate_using_individual_aligners.sh. Data with generated responses will be saved in the./data_aligned_individualfolder. -
Provide a path to saved inspector checkpoints in the
generate_responsesfunction ingenerate_using_aligners_squad.pyand if you are using a checkpoint id for aligners that is different from2500, change it underif __name__ == "__main__":ingenerate_using_aligners_squad.py. -
To generate responses using the aligners squad, run the bash script
run_generate_using_aligners_squad.sh. Data with generated responses will be saved in the./data_alignedfolder.NOTE: In cases where GPUs are needed to run, adapt the bash scripts to the system/cluster that you are using to specify the number of GPUs. Some example bash scripts for when you are running on a cluster that uses a Slurm job scheduler are in the
./generate-responses-for-evalfolder.
- To convert the test data into a format expected for evaluation using GPT-4 via AlpacaEval, run bash scripts
run_data_prep_for_alpaca_eval.shandrun_data_prep_for_alpaca_eval_our_aligners_vs_Ji_et_al_aligner.sh. The converted data will be found in./data_for_alpaca_eval.
After generating responses for evaluation using aligners and baseline models in step 3, evaluate them as follows:
-
Navigate to the
./evaluationfolder. -
To evaluate responses generated by our aligners against responses generated by
falcon-40b,llama-2-13b-chat,llama-2-70b-chat,falcon-40b-instruct,llama-2-13b, andllama-2-70bbaselines, run the bash scriptrun_evaluation_using_PairRM.sh. Results will be saved in the./resultsfolder. -
To evaluate responses generated by our aligners against responses generated by Ji et al's (PKU) aligner, run the bash script
run_evaluation_using_PairRM_our_aligners_vs_Ji_et_al_aligner.sh. Results will be saved in the./resultsfolder. -
To evaluate individual aligners' responses against responses generated by
falcon-40b,llama-2-13b-chat,llama-2-70b-chat,falcon-40b-instruct,llama-2-13b, andllama-2-70bbaselines, run the bash scriptrun_evaluation_using_PairRM_individual.sh. Results will be saved in the./results_individualfolder. -
To evaluate individual aligners' responses against responses generated by Ji et al's (PKU) aligner, run the bash script
run_evaluation_using_PairRM_our_aligners_vs_Ji_et_al_aligner_individual.sh. Results will be saved in the./results_individualfolder. -
Navigate to
../generate-responses-for-eval/data_for_alpaca_evalwhere the prepared test datasets for evaluation using GPT-4 via AlpacaEval are stored. -
For each dataset and aligner model, evaluate our aligners' responses against Ji et al's (PKU) aligner responses using the command
alpaca_eval --model_outputs 'our_aligner_responses.json' --reference_outputs 'pku_aligner_responses.json' --output_path ./our_aligner_vs_pku_aligner. -
For each dataset and aligner model, evaluate our aligners' responses against base model baselines using the command
alpaca_eval --model_outputs 'aligned_responses.json' --reference_outputs 'base_model_responses.json' --output_path ./aligned_vs_base_llm. -
For each dataset and aligner model, evaluate our aligners' responses against finetuned (chat and instruct) model baselines using the command
alpaca_eval --model_outputs 'aligned_responses.json' --reference_outputs 'finetuned_model_responses.json' --output_path ./aligned_vs_finetuned_llm.- To collect all the results and to create results tables and bar plot, run the Jupyter notebook
create_results_tables_and_barplot.ipynb.
- To collect all the results and to create results tables and bar plot, run the Jupyter notebook
-
Synthetically generated datasets that were used to train aligners are released on Hugging Face
-
A trained 7B ethical aligner is released on Hugging Face. Example code on how to use it is in the
./simple-demo.ipynbJupyter notebook.
@article{ngweta2024aligners,
title={Aligners: Decoupling LLMs and Alignment},
author={Ngweta, Lilian and Agarwal, Mayank and Maity, Subha and Gittens, Alex and Sun, Yuekai and Yurochkin, Mikhail},
journal={arXiv preprint arXiv:2403.04224},
year={2024}
}
