Thinker: Learning to Think Fast and Slow

This repository contains the code and resources for the paper Thinker: Learning to Think Fast and Slow, published at NeurIPS 2025.

Our work introduces the Thinker task, a novel four-stage Reinforcement Learning (RL) approach for question-answering (QA) designed to enhance the reasoning capabilities of Large Language Models (LLMs) by explicitly training distinct cognitive abilities: intuition (Fast Thinking), evaluation (Verification), refinement (Slow Thinking), and integration (Summarization).

Figure: Evaluation Accuracy on Mathematical Reasoning Benchmarks.

Evaluation Results

Performance comparison across various mathematical reasoning benchmarks. All scores are Pass@1 accuracy (%) averaged over 16 samples. Top score in each benchmark column (within each model group) is bolded.

Method	MATH 500	AIME 2024	AIME 2025	GPQA Diamond	Olympiad bench	AMC 23	Minerva Math	College Math	Avg.
Qwen2.5-1.5B (Q1.5B)
Pretrained	9.05	0.00	0.00	4.55	3.09	4.06	2.30	7.40	3.81
Baseline	59.82	4.10	2.43	20.52	26.05	35.36	19.25	37.42	25.62
Thinker	64.45	6.25	2.22	19.21	28.21	39.06	20.38	38.82	27.33
Thinker-Fast	59.82	4.58	1.25	21.28	24.52	34.53	17.85	37.58	25.18
ORZ	58.00	3.50	1.00	16.80	-	-	-	-	-
SimpleRL	59.00	4.20	-	-	21.00	35.00	20.20	-	-
DeepSeek-R1-Distill-Qwen-1.5B (R1.5B)
Pretrained	76.21	17.50	17.92	13.76	37.46	55.94	24.82	38.85	35.31
Baseline	86.24	35.42	23.75	25.69	49.22	72.81	32.08	42.02	45.90
Thinker	88.51	38.96	26.67	37.41	55.49	83.59	34.77	42.46	50.98
Thinker-Fast	81.35	18.33	14.58	28.85	45.68	66.41	31.39	41.74	41.05
DeepSeek-R1-Distill-Qwen-7B (R7B)
Pretrained	84.05	37.50	28.54	17.58	37.92	36.41	34.49	40.72	39.65
Baseline	91.03	47.50	34.58	34.63	56.76	87.81	40.23	42.71	54.41
Thinker	93.04	56.25	41.46	41.51	62.12	91.09	44.39	42.84	59.09
Thinker-Fast	86.47	26.46	21.88	34.12	51.77	71.56	43.08	42.14	47.19

Comparison with concurrent works fine-tuning R1.5B models on token efficiency benchmarks

Results from concurrent works are extracted from the respective papers.

Method	MATH500 Acc. (%)	MATH500 Length	AIME24 Acc. (%)	AIME24 Length	AMC23 Acc. (%)	AMC23 Length
ThinkPrune	83.2	1938	27.1	5631	73.2	3039
Concise Reasoning	81.0	1965	30.0	6752	69.4	2936
SR-FLOW	85.3	-	36.7	-	77.8	-
AdaptThink	82.0	1782	31.0	6679	-	-
Baseline	86.2	2780	35.4	5778	72.8	3938
Thinker	88.5	2501	39.0	5597	83.6	3517
Thinker-Fast	80.9	600	18.1	853	66.9	751

Installation

This project requires Python >=3.10.

1. System Prerequisites

Ensure you have essential system libraries. For Debian-based systems (like Ubuntu), you can install them using:

sudo apt-get update
sudo apt-get install -y ffmpeg libsm6 libxext6

2. Project Setup

It's recommended to use a virtual environment (e.g., Python's venv or Conda).

Once you have cloned the repository and navigated into the main project directory (where pyproject.toml is located), activate your chosen virtual environment. Then, install the project and its dependencies:

pip install -e .

This command installs the thinker_task project in editable mode and pulls all required Python packages with their specific versions as defined in pyproject.toml.

Required Package Versions

Python: >=3.10
Python Packages: All specific versions for packages like torch, deepspeed, etc., are listed in the pyproject.toml file.

Download Base Model

Download the base model R1.5B (DeepSeek-R1-Distill-Qwen-1.5B), R7B (DeepSeek-R1-Distill-Qwen-7B), and Q1.5B (Qwen2.5-1.5B) under the directory large_data/base, using the following command:

python -c "from huggingface_hub import snapshot_download; print(snapshot_download('deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B', local_dir='large_data/base/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B'))"
python -c "from huggingface_hub import snapshot_download; print(snapshot_download('deepseek-ai/DeepSeek-R1-Distill-Qwen-7B', local_dir='large_data/base/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B'))"
python -c "from huggingface_hub import snapshot_download; print(snapshot_download('Qwen/Qwen2.5-Math-1.5B', local_dir='large_data/base/Qwen/Qwen2.5-Math-1.5B'))"
python script/add_token.py large_data/base/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
python script/add_token.py large_data/base/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

The last two lines add two special tokens, <|im_start|> and <|im_end|>, to R1-Distilled models that were present in Q1.5B and mark the start and end of a prompt.

Start Thinker Agent Training

Single node, R1.5B Thinker agent (replace r1_5b with q1_5b for Q1.5B model, or r7b for R7B model):

python -m playground.thinker_r1_5b

Multi-node Training:

First on master node, run:

ray start --head

then on other nodes, run:

ray start --address='<master-node-ip>:<master-node-port>'

then on master node, run (adjust NUM_NODE as needed; both 2 and 4 should work fine):

NUM_NODE=4 python -m playground.thinker_r1_5b

Data

The training data are sourced from Open-Reasoner-Zero.

Inference

Trained model checkpoints can be found at:

Please refer to thinker_task/exp_engine/accelerators/inference/sum_llm.py on how to perform inference on the Thinker-task with vLLM.

Acknowledgements

Our training framework is built on Open-Reasoner-Zero, OpenRLHF, vllm, DeepSpeed and ray.
Our model is based on Qwen2.5-1.5B, DeepSeek-R1-Distill-Qwen-1.5B and DeepSeek-R1-Distill-Qwen-7B.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
figure		figure
playground		playground
script		script
thinker_task		thinker_task
.flake8		.flake8
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Thinker: Learning to Think Fast and Slow

Evaluation Results

Comparison with concurrent works fine-tuning R1.5B models on token efficiency benchmarks

Installation

1. System Prerequisites

2. Project Setup

Required Package Versions

Download Base Model

Start Thinker Agent Training

Data

Inference

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

stephen-chung-mh/thinker-task

Folders and files

Latest commit

History

Repository files navigation

Thinker: Learning to Think Fast and Slow

Evaluation Results

Comparison with concurrent works fine-tuning R1.5B models on token efficiency benchmarks

Installation

1. System Prerequisites

2. Project Setup

Required Package Versions

Download Base Model

Start Thinker Agent Training

Data

Inference

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages