Skip to content

horseee/CoT-Valve

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation

CoT-Valve: Length-Compressible Chain-of-Thought Tuning


The reasoning model, after the length-compressible CoT tuning, can generate reasoning paths from long to short, leveraging LoRA as a `Valve'.

Xinyin Ma*, Guangnian Wan*, Runpeng Yu, Gongfan Fang, Xinchao Wang
Learning and Vision Lab, National University of Singapore
🥯[Arxiv] 🎄[Dataset] 🤖[Models] (coming soon)
* Equal Contribution

Introduction

We propose a new tuning and inference strategy named CoT-Valve, designed to allow models to generate reasoning chains of varying lengths.

  • We propose to identify a direction in the parameter space that, when manipulated, can effectively control the length of generated CoT.
  • We construct datasets with chains from long to short for the same questions and explore two enhanced strategies for CoT-Valve: (1) a precise length-compressible CoT tuning method, and (2) a progressive chain length compression approach.
  • CoT-Valve successfully enables controllability and compressibility of the chain and shows better performance than the prompt-based control.
  • We applied this method to QwQ-32B-Preview, reducing reasoning chains on GSM8K from 741 to 225 tokens with a minor performance drop (95.07% to 94.92%) and on AIME from 6827 to 4629 tokens, with only one additional incorrect answer.

TODO

  • Release the dataset
  • Release the model
  • Release the trainng code

🤗Datasets

We release the following datasets on Huggingface:

Dataset Name Link Description
MixChain-Z-GSM8K Link MixChain-Z-GSM8K is a dataset containing 6,863 samples, with each sample containing five different solutions.
MixChain-Z-PRM12K Link MixChain-Z-PRM12K is a dataset containing 12,000 samples (unfiltered), with each sample containing five different solutions
MixChain-C-LIMO Link MixChain-C-LIMO contains two distinct solutions for each question from the LIMO dataset. These solutions vary in the number of samples and the average length of their CoT.

Training Code

To be released

Models

To be released

About

CoT-Valve: Length-Compressible Chain-of-Thought Tuning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published