Likelihood bias

This is the official code for our paper, "Likelihood-based Mitigation of Evaluation Bias in Large Language Models" (accepted to ACL2024)

Create Python environment

conda create -n likelihood_bias python=3.9
conda activate likelihood_bias
pip install -r requirements.txt

Put API key of OPENAI

Please copy .env.sample as .env, and put your API key on the file.

Prepare data

Data2Text

cd data_orig/data2text
git clone [email protected]:WebNLG/challenge-2020.git

Please download input data from here and put it on data_orig/data2text/rdf-to-text-generation-test-data-without-refs-en.xml.

GEC

Please download dataset from here and put it on data_orig/gec/tmu-gfm-dataset.csv.

Format data

python format_data.py

This script format data and extract few-shot examples at random. Formatted data and few-shot examples will be saved in data.

Measure likelihood bias

# Calc Score_m
python calc_evaluator_score_baseline.py -e gpt_35_turbo -d data2text
python calc_evaluator_score_baseline.py -e gpt_35_turbo -d gec
python calc_evaluator_score_baseline.py -e llama2_13b -d data2text
python calc_evaluator_score_baseline.py -e llama2_13b -d gec

# Calc likelihood score
python calc_likelihood_score.py -e llama2_13b -d data2text
python calc_likelihood_score.py -e llama2_13b -d gec

Due to the design of implementation, the result (BiasScore and Evaluation performance) can be calculated after mitigating likelihood bias.

Mitigate likelihood bias

Before calculating scores we should split the data into training and evaluation data.

python split_train_eval.py

Then, you can run the following script and get scores under less likelihood bias.

python calc_evaluator_score_mitigation.py -e gpt_35_turbo -d data2text
python calc_evaluator_score_mitigation.py -e gpt_35_turbo -d gec
python calc_evaluator_score_mitigation.py -e llama2_13b -d data2text
python calc_evaluator_score_mitigation.py -e llama2_13b -d gec

Finally, you can get the BiasScore and Evaluation performance for before and after mitigation.

model_list=("gpt_35_turbo" "llama2_13b")
task_list=("data2text", "gec")
for model in "${model_list[@]}"; do
    for task in "${task_list[@]}"; do
        python calc_cor_before_mitigation.py -e $model -d $task
        python calc_cor_after_mitigation.py -e $model -d $task
    done
done

The results will be saved in results/before/ and result/after/.

Citation

If you find our work useful for your research and applications, please cite using this BibTeX:

@misc{ohi2024likelihoodbasedmitigationevaluationbias,
      title={Likelihood-based Mitigation of Evaluation Bias in Large Language Models}, 
      author={Masanari Ohi and Masahiro Kaneko and Ryuto Koike and Mengsay Loem and Naoaki Okazaki},
      year={2024},
      eprint={2402.15987},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2402.15987}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Likelihood bias

This is the official code for our paper, "Likelihood-based Mitigation of Evaluation Bias in Large Language Models" (accepted to ACL2024)

Create Python environment

Put API key of OPENAI

Prepare data

Data2Text

GEC

Format data

Measure likelihood bias

Mitigate likelihood bias

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
prompt		prompt
src		src
.env.sample		.env.sample
LICENSE		LICENSE
README.md		README.md
calc_cor_after_mitigation.py		calc_cor_after_mitigation.py
calc_cor_before_mitigation.py		calc_cor_before_mitigation.py
calc_evaluator_score_baseline.py		calc_evaluator_score_baseline.py
calc_evaluator_score_mitigation.py		calc_evaluator_score_mitigation.py
calc_likelihood_score.py		calc_likelihood_score.py
format_data.py		format_data.py
requirements.txt		requirements.txt
split_train_eval.py		split_train_eval.py

Folders and files

Latest commit

History

Repository files navigation

Likelihood bias

This is the official code for our paper, "Likelihood-based Mitigation of Evaluation Bias in Large Language Models" (accepted to ACL2024)

Create Python environment

Put API key of OPENAI

Prepare data

Data2Text

GEC

Format data

Measure likelihood bias

Mitigate likelihood bias

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages