- Overview
- Installation
- Data & Model Preparation
- Usage
- Reproducing Results
- License
- Citation
- Acknowledgements
TURN is an entropy-based algorithm for automatic temperature optimization in multi-sample inference strategies such as Majority Voting and Best-of-N.
Multi-sample strategies achieve state-of-the-art performance but there is little understanding about the role of temerapture in these strategies. TURN provides an automatic temperature selection algorithm.
This repository contains the official implementation of our paper:
Weihua Du, Yiming Yang, & Sean Welleck
“Optimizing Temperature for Language Models with Multi-Sample Inference.” (2025)
- High Correlation: TURN’s predicted temperature closely matches the best temperature from grid search in terms of accuracy.
- No Labels Needed: The approach is purely entropy-driven, removing reliance on labeled validation sets.
The accuracies between TURN-predicted temperatures and the best grid-search temperatures show high correlation.
-
Clone the Repository:
git clone https://github.com/StigLidu/TURN.git cd TURN -
(Optional) Create a Conda Environment:
conda create -n TURN python=3.11 conda activate TURN
-
Install Dependencies:
pip install -r requirements.txt
Note: For GPU-based inference, ensure the necessary CUDA libraries and drivers are installed.
Prepare your test data in JSONL format, with one entry per line. For instance:
{"problem": "What is 1+1? Provide the answer in detail."}
{"problem": "Explain the concept of derivatives in calculus."}
{"problem": "Prove the Pythagorean theorem."}- Each JSON object must include a
"problem"key.
Our implementation works with Hugging Face models or local checkpoints.
Run the main script predict.py to automatically infer an optimal temperature for a given aggregation strategy:
python predict.py \
--model_path [LLM_PATH] \
--data_path [DATA_PATH] \
--aggregation_strategy [MJ/BofN] \
[--num_samples 32 --batch_size 16 ...]python predict.py \
--model_path nvidia/OpenMath2-Llama3.1-8B \
--data_path data/test_data.jsonl \
--aggregation_strategy MJOutput:
Predicted temperature: [predicted temperature]
--model_path: Path to or name of the model (e.g., a Hugging Face model likenvidia/OpenMath2-Llama3.1-8B, or a local checkpoint).--data_path: Path to the JSONL file containing the test data.--aggregation_strategy: Currently supportsMJ(Majority Voting) orBofN(Best-of-N).--num_samples(optional): Number of samples to estimate entropy (default:N=32).--batch_size(optional): Batch size for inference (default:16). Adjust if you face memory constraints.
To replicate the experiments reported in our paper:
-
MBPP (Code Generation)
- See instructions in
CODE/README.md.
- See instructions in
-
MATH (Mathematical Reasoning)
- See instructions in
MATH/README.md.
- See instructions in
This project is released under the MIT License.
If you find our work useful in your research, please use the following BibTeX reference:
@article{du2025optimizing,
title={Optimizing Temperature for Language Models with Multi-Sample Inference},
author={Du, Weihua and Yang, Yiming and Welleck, Sean},
journal={arXiv preprint arXiv:2502.05234},
year={2025}
}We extend our gratitude to the following open-source projects for their foundational contributions:
For any questions or inquiries, please contact:
- Weihua Du: [email protected]