Thinking-Optimal Test-Time Scaling

This is the official repository for the NeurIPS 2025 paper "Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning".

Installation

Our code is mainly based on the alignment-handbook. Users can follow the instructions in the alignment-handbook to prepare the environment. We also provide the pre-built Docker image. Remember to install math-verify package for answer verification:

pip install math-verify

Models and Data

We release the models and data used in our experiments as follows:

Name
LLaMA3.1-8B-Tag	hf model
Qwen2.5-32B-Tag	hf model
Qwen2.5-32B-TOPS	hf model
Qwen2.5-32B-TOPS-Iter-DPO	hf model
All Training Data	hf dataset

Training

We provide the raw data above, users can convert it to the required huggingface format by specifying the data paths in the sft/convert_data.py file and running the following command:

python3 sft/convert_data.py

After preparing the dataset, users can perform supervised fine-tuning by specifying the arguments in the sft/model_config/config_sft.yaml file and running the following command:

sh sft/run_sft.sh

Evaluation

We put the test data in the data/test folder, and users can run the following command to perform evaluation:

sh scripts/run_eval.sh

Citation

If you find our work helpful, please kindly cite as

@article{yang2025towards,
  title={Towards thinking-optimal scaling of test-time compute for llm reasoning},
  author={Yang, Wenkai and Ma, Shuming and Lin, Yankai and Wei, Furu},
  journal={arXiv preprint arXiv:2502.18080},
  year={2025}
}

Acknowledgments

We sincerely thank alignment-handbook for the open-sourcing.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data/test		data/test
scripts		scripts
sft		sft
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Thinking-Optimal Test-Time Scaling

Installation

Models and Data

Training

Evaluation

Citation

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

RUCBM/TOPS

Folders and files

Latest commit

History

Repository files navigation

Thinking-Optimal Test-Time Scaling

Installation

Models and Data

Training

Evaluation

Citation

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages