The reasoning model, after the length-compressible CoT tuning, can generate reasoning paths from long to short, leveraging LoRA as a `Valve'.
Xinyin Ma*, Guangnian Wan*, Runpeng Yu, Gongfan Fang, Xinchao Wang
Learning and Vision Lab, National University of Singapore
🥯[Arxiv] 🎄[Dataset] 🤖[Models] (coming soon)
* Equal Contribution
We propose a new tuning and inference strategy named CoT-Valve, designed to allow models to generate reasoning chains of varying lengths.
- We propose to identify a direction in the parameter space that, when manipulated, can effectively control the length of generated CoT.
- We construct datasets with chains from long to short for the same questions and explore two enhanced strategies for CoT-Valve: (1) a precise length-compressible CoT tuning method, and (2) a progressive chain length compression approach.
- CoT-Valve successfully enables controllability and compressibility of the chain and shows better performance than the prompt-based control.
- We applied this method to QwQ-32B-Preview, reducing reasoning chains on GSM8K from 741 to 225 tokens with a minor performance drop (95.07% to 94.92%) and on AIME from 6827 to 4629 tokens, with only one additional incorrect answer.
- Release the dataset
- Release the model
- Release the trainng code
We release the following datasets on Huggingface:
| Dataset Name | Link | Description |
|---|---|---|
| MixChain-Z-GSM8K | Link | MixChain-Z-GSM8K is a dataset containing 6,863 samples, with each sample containing five different solutions. |
| MixChain-Z-PRM12K | Link | MixChain-Z-PRM12K is a dataset containing 12,000 samples (unfiltered), with each sample containing five different solutions |
| MixChain-C-LIMO | Link | MixChain-C-LIMO contains two distinct solutions for each question from the LIMO dataset. These solutions vary in the number of samples and the average length of their CoT. |
To be released
To be released