Skip to content
/ TOPS Public

[NeurIPS 2025] Official repository for the paper "Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning"

License

Notifications You must be signed in to change notification settings

RUCBM/TOPS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Thinking-Optimal Test-Time Scaling

This is the official repository for the NeurIPS 2025 paper "Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning".

Installation

Our code is mainly based on the alignment-handbook. Users can follow the instructions in the alignment-handbook to prepare the environment. We also provide the pre-built Docker image. Remember to install math-verify package for answer verification:

pip install math-verify

Models and Data

We release the models and data used in our experiments as follows:

Name
LLaMA3.1-8B-Tag hf model
Qwen2.5-32B-Tag hf model
Qwen2.5-32B-TOPS hf model
Qwen2.5-32B-TOPS-Iter-DPO hf model
All Training Data hf dataset

Training

We provide the raw data above, users can convert it to the required huggingface format by specifying the data paths in the sft/convert_data.py file and running the following command:

python3 sft/convert_data.py

After preparing the dataset, users can perform supervised fine-tuning by specifying the arguments in the sft/model_config/config_sft.yaml file and running the following command:

sh sft/run_sft.sh

Evaluation

We put the test data in the data/test folder, and users can run the following command to perform evaluation:

sh scripts/run_eval.sh

Citation

If you find our work helpful, please kindly cite as

@article{yang2025towards,
  title={Towards thinking-optimal scaling of test-time compute for llm reasoning},
  author={Yang, Wenkai and Ma, Shuming and Lin, Yankai and Wei, Furu},
  journal={arXiv preprint arXiv:2502.18080},
  year={2025}
}

Acknowledgments

We sincerely thank alignment-handbook for the open-sourcing.

About

[NeurIPS 2025] Official repository for the paper "Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published