Control Variates Evaluation

Code repository for paper "Accelerating Unbiased LLM Evaluation via Synthetic Feedback"

Run Experiments

The experiments are divided into 4 parts, corresponding to the 4 directories. Please replicate our result in the following order:

(Optional) Synthetic evaluator finetuning. You can skip if you run Control Variates Evaluation with an off-the-shelf evaluator. See instructions under finetune/.
Collect Synthetic Evaluations. See instructions under evaluation/.
Compute averaged human annotation saving ratio. See instructions under stats/.
Run control variates evaluation to visualize variance and bias. See instructions under bootstrap/.

Code associated with GPT-4 evaluation is partially based on lm-sys/FastChat.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
bootstrap		bootstrap
evaluation		evaluation
finetune		finetune
stats		stats
.gitignore		.gitignore
README.md		README.md