TURN: Optimizing Temperature for Language Models with Multi-Sample Inference

Overview

TURN is an entropy-based algorithm for automatic temperature optimization in multi-sample inference strategies such as Majority Voting and Best-of-N.

Multi-sample strategies achieve state-of-the-art performance but there is little understanding about the role of temerapture in these strategies. TURN provides an automatic temperature selection algorithm.

This repository contains the official implementation of our paper:

Weihua Du, Yiming Yang, & Sean Welleck
“Optimizing Temperature for Language Models with Multi-Sample Inference.” (2025)

Highlights

High Correlation: TURN’s predicted temperature closely matches the best temperature from grid search in terms of accuracy.
No Labels Needed: The approach is purely entropy-driven, removing reliance on labeled validation sets.

Correlation of accuracies between TURN's predicted temperature and the best grid-search temperature.

The accuracies between TURN-predicted temperatures and the best grid-search temperatures show high correlation.

Installation

Clone the Repository:

git clone https://github.com/StigLidu/TURN.git
cd TURN

(Optional) Create a Conda Environment:

conda create -n TURN python=3.11
conda activate TURN

Install Dependencies:
```
pip install -r requirements.txt
```
Note: For GPU-based inference, ensure the necessary CUDA libraries and drivers are installed.

Data and Model Preparation

Prepare your test data in JSONL format, with one entry per line. For instance:

{"problem": "What is 1+1? Provide the answer in detail."}
{"problem": "Explain the concept of derivatives in calculus."}
{"problem": "Prove the Pythagorean theorem."}

Each JSON object must include a "problem" key.

Our implementation works with Hugging Face models or local checkpoints.

Usage

Run the main script predict.py to automatically infer an optimal temperature for a given aggregation strategy:

python predict.py \
    --model_path [LLM_PATH] \
    --data_path [DATA_PATH] \
    --aggregation_strategy [MJ/BofN] \
    [--num_samples 32 --batch_size 16 ...]

Example

python predict.py \
    --model_path nvidia/OpenMath2-Llama3.1-8B \
    --data_path data/test_data.jsonl \
    --aggregation_strategy MJ

Output:

Predicted temperature:  [predicted temperature]

Arguments

--model_path: Path to or name of the model (e.g., a Hugging Face model like nvidia/OpenMath2-Llama3.1-8B, or a local checkpoint).
--data_path: Path to the JSONL file containing the test data.
--aggregation_strategy: Currently supports MJ (Majority Voting) or BofN (Best-of-N).
--num_samples (optional): Number of samples to estimate entropy (default: N=32).
--batch_size (optional): Batch size for inference (default: 16). Adjust if you face memory constraints.

Reproducing Results

To replicate the experiments reported in our paper:

MBPP (Code Generation)
- See instructions in CODE/README.md.
MATH (Mathematical Reasoning)
- See instructions in MATH/README.md.

License

This project is released under the MIT License.

Citation

If you find our work useful in your research, please use the following BibTeX reference:

@article{du2025optimizing,
  title={Optimizing Temperature for Language Models with Multi-Sample Inference},
  author={Du, Weihua and Yang, Yiming and Welleck, Sean},
  journal={arXiv preprint arXiv:2502.05234},
  year={2025}
}

Acknowledgements

We extend our gratitude to the following open-source projects for their foundational contributions:

Contact

For any questions or inquiries, please contact:

Weihua Du: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
CODE		CODE
MATH		MATH
figs		figs
.gitignore		.gitignore
LICENSE		LICENSE
predict.py		predict.py
readme.md		readme.md
requirements.txt		requirements.txt
turning_point.py		turning_point.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TURN: Optimizing Temperature for Language Models with Multi-Sample Inference

Table of Contents

Overview

Highlights

Installation

Data and Model Preparation

Usage

Example

Arguments

Reproducing Results

License

Citation

Acknowledgements

Contact

About

Uh oh!

Releases

Packages

Languages

License

StigLidu/TURN

Folders and files

Latest commit

History

Repository files navigation

TURN: Optimizing Temperature for Language Models with Multi-Sample Inference

Table of Contents

Overview

Highlights

Installation

Data and Model Preparation

Usage

Example

Arguments

Reproducing Results

License

Citation

Acknowledgements

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages