This is the official repository for the NeurIPS 2025 paper "Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning".
Our code is mainly based on the alignment-handbook. Users can follow the instructions in the alignment-handbook to prepare the environment. We also provide the pre-built Docker image. Remember to install math-verify package for answer verification:
pip install math-verifyWe release the models and data used in our experiments as follows:
| Name | |
|---|---|
| LLaMA3.1-8B-Tag | hf model |
| Qwen2.5-32B-Tag | hf model |
| Qwen2.5-32B-TOPS | hf model |
| Qwen2.5-32B-TOPS-Iter-DPO | hf model |
| All Training Data | hf dataset |
We provide the raw data above, users can convert it to the required huggingface format by specifying the data paths in the sft/convert_data.py file and running the following command:
python3 sft/convert_data.pyAfter preparing the dataset, users can perform supervised fine-tuning by specifying the arguments in the sft/model_config/config_sft.yaml file and running the following command:
sh sft/run_sft.shWe put the test data in the data/test folder, and users can run the following command to perform evaluation:
sh scripts/run_eval.shIf you find our work helpful, please kindly cite as
@article{yang2025towards,
title={Towards thinking-optimal scaling of test-time compute for llm reasoning},
author={Yang, Wenkai and Ma, Shuming and Lin, Yankai and Wei, Furu},
journal={arXiv preprint arXiv:2502.18080},
year={2025}
}We sincerely thank alignment-handbook for the open-sourcing.