Skip to content

stephen-chung-mh/thinker-task

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Thinker: Learning to Think Fast and Slow

Thinker-Task Logo

This repository contains the code and resources for the paper Thinker: Learning to Think Fast and Slow, published at NeurIPS 2025.

Our work introduces the Thinker task, a novel four-stage Reinforcement Learning (RL) approach for question-answering (QA) designed to enhance the reasoning capabilities of Large Language Models (LLMs) by explicitly training distinct cognitive abilities: intuition (Fast Thinking), evaluation (Verification), refinement (Slow Thinking), and integration (Summarization).

Thinker-Task Training Performance Teaser
Figure: Evaluation Accuracy on Mathematical Reasoning Benchmarks.

Evaluation Results

Performance comparison across various mathematical reasoning benchmarks. All scores are Pass@1 accuracy (%) averaged over 16 samples. Top score in each benchmark column (within each model group) is bolded.

Method MATH 500 AIME 2024 AIME 2025 GPQA Diamond Olympiad bench AMC 23 Minerva Math College Math Avg.
Qwen2.5-1.5B (Q1.5B)
Pretrained 9.05 0.00 0.00 4.55 3.09 4.06 2.30 7.40 3.81
Baseline 59.82 4.10 2.43 20.52 26.05 35.36 19.25 37.42 25.62
Thinker 64.45 6.25 2.22 19.21 28.21 39.06 20.38 38.82 27.33
Thinker-Fast 59.82 4.58 1.25 21.28 24.52 34.53 17.85 37.58 25.18
ORZ 58.00 3.50 1.00 16.80 - - - - -
SimpleRL 59.00 4.20 - - 21.00 35.00 20.20 - -
DeepSeek-R1-Distill-Qwen-1.5B (R1.5B)
Pretrained 76.21 17.50 17.92 13.76 37.46 55.94 24.82 38.85 35.31
Baseline 86.24 35.42 23.75 25.69 49.22 72.81 32.08 42.02 45.90
Thinker 88.51 38.96 26.67 37.41 55.49 83.59 34.77 42.46 50.98
Thinker-Fast 81.35 18.33 14.58 28.85 45.68 66.41 31.39 41.74 41.05
DeepSeek-R1-Distill-Qwen-7B (R7B)
Pretrained 84.05 37.50 28.54 17.58 37.92 36.41 34.49 40.72 39.65
Baseline 91.03 47.50 34.58 34.63 56.76 87.81 40.23 42.71 54.41
Thinker 93.04 56.25 41.46 41.51 62.12 91.09 44.39 42.84 59.09
Thinker-Fast 86.47 26.46 21.88 34.12 51.77 71.56 43.08 42.14 47.19

Comparison with concurrent works fine-tuning R1.5B models on token efficiency benchmarks

Results from concurrent works are extracted from the respective papers.

Method MATH500 Acc. (%) MATH500 Length AIME24 Acc. (%) AIME24 Length AMC23 Acc. (%) AMC23 Length
ThinkPrune 83.2 1938 27.1 5631 73.2 3039
Concise Reasoning 81.0 1965 30.0 6752 69.4 2936
SR-FLOW 85.3 - 36.7 - 77.8 -
AdaptThink 82.0 1782 31.0 6679 - -
Baseline 86.2 2780 35.4 5778 72.8 3938
Thinker 88.5 2501 39.0 5597 83.6 3517
Thinker-Fast 80.9 600 18.1 853 66.9 751

Installation

This project requires Python >=3.10.

1. System Prerequisites

Ensure you have essential system libraries. For Debian-based systems (like Ubuntu), you can install them using:

sudo apt-get update
sudo apt-get install -y ffmpeg libsm6 libxext6

2. Project Setup

It's recommended to use a virtual environment (e.g., Python's venv or Conda).

Once you have cloned the repository and navigated into the main project directory (where pyproject.toml is located), activate your chosen virtual environment. Then, install the project and its dependencies:

pip install -e .

This command installs the thinker_task project in editable mode and pulls all required Python packages with their specific versions as defined in pyproject.toml.

Required Package Versions

  • Python: >=3.10
  • Python Packages: All specific versions for packages like torch, deepspeed, etc., are listed in the pyproject.toml file.

Download Base Model

Download the base model R1.5B (DeepSeek-R1-Distill-Qwen-1.5B), R7B (DeepSeek-R1-Distill-Qwen-7B), and Q1.5B (Qwen2.5-1.5B) under the directory large_data/base, using the following command:

python -c "from huggingface_hub import snapshot_download; print(snapshot_download('deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B', local_dir='large_data/base/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B'))"
python -c "from huggingface_hub import snapshot_download; print(snapshot_download('deepseek-ai/DeepSeek-R1-Distill-Qwen-7B', local_dir='large_data/base/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B'))"
python -c "from huggingface_hub import snapshot_download; print(snapshot_download('Qwen/Qwen2.5-Math-1.5B', local_dir='large_data/base/Qwen/Qwen2.5-Math-1.5B'))"
python script/add_token.py large_data/base/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
python script/add_token.py large_data/base/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

The last two lines add two special tokens, <|im_start|> and <|im_end|>, to R1-Distilled models that were present in Q1.5B and mark the start and end of a prompt.

Start Thinker Agent Training

Single node, R1.5B Thinker agent (replace r1_5b with q1_5b for Q1.5B model, or r7b for R7B model):

python -m playground.thinker_r1_5b

Multi-node Training:

First on master node, run:

ray start --head

then on other nodes, run:

ray start --address='<master-node-ip>:<master-node-port>'

then on master node, run (adjust NUM_NODE as needed; both 2 and 4 should work fine):

NUM_NODE=4 python -m playground.thinker_r1_5b

Data

The training data are sourced from Open-Reasoner-Zero.

Inference

Trained model checkpoints can be found at:

Please refer to thinker_task/exp_engine/accelerators/inference/sum_llm.py on how to perform inference on the Thinker-task with vLLM.

Acknowledgements

About

Code and resources for the paper "Thinker: Learning to Think Fast and Slow".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages