🎉🎉 Caco is accepted by NeurIPS 2025!
We introduce Caco, a code-driven framework for generating diverse and verifiable reasoning data at scale. Unlike conventional augmentation methods that rewrite problems, Caco leverages executable code-based chains of thought (Code CoTs) to synthesize new problems and solutions with guaranteed correctness.
Caco implements this through three key stages:
-
Unifying Code CoT, collecting diverse seed reasoning traces from both mathematical and algorithmic problems, and converting them into a standardized executable format.
-
Scaling Code CoT, training a dedicated code generator that not only expands the dataset but also realizes Pattern-level Augmentation by restructuring reasoning logic (e.g., decomposition, reformulation, alternative solution paths).
-
Instruction Reversing, back-translating code into natural language problems with contextual and stylistic variations, followed by natural language CoT solution generation dual verification for correctness.
Caco yields 1.3M validated problem–solution pairs in under 55 GPU hours using only open-source models. Models trained on Caco data achieve consistent improvements across mathematics, logic puzzles, scientific QA, and code reasoning, surpassing strong baselines and demonstrating broad cross-domain generalization.
We release the Caco dataset and three Caco models fine-tuned on this dataset.
| Dataset/Model | MATH | Olympiad | Theorem-QA | HuggingFace🤗 |
|---|---|---|---|---|
| Caco1.3M | - | - | - | link |
| Caco-CodeGen | - | - | - | link |
| DeepSeekMath-7B-Caco | 68.2 | 29.5 | 33.8 | link |
| Qwen2.5-7B-Caco | 82.4 | 46.5 | 46.0 | link |
| Llama3-8B-Caco | 70.6 | 34.1 | 31.0 | link |
Install the dependencies:
conda create -n caco python=3.10
conda activate caco
pip install torch==2.3.1 --index-url https://download.pytorch.org/whl/cu121
# Install LLaMA-Factory
git clone https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
git checkout v0.9.1
pip install transformers==4.46.1 accelerate==0.34.2 deepspeed==0.15.4
pip install -e ".[torch,metrics]"
# Install packages for evaluation
pip install flash-attn --no-build-isolation
pip install sympy==1.12.1 antlr4-python3-runtime==4.11.1 pebble word2number boto3 triton==2.3.1 ipython
pip install vllm==0.5.3.post1
# Install latex2sympy
cd ../evaluation_dart/latex2sympy
pip install -e .
cd ..
# Install dart-math evaluation
pip install -e .You can directly download Caco-1.3M data for training.
huggingface-cli download LHL3341/Caco-1.3MWe also provide our code in ./data_process for:
- Code execution and input/output extraction
- Answer consistency filteration
- CodeGen training
Our training codes depend on LLaMA-Factory.
bash ./scripts/sft.shexport MODEL_NAME=/path/to/your/model
bash ./scripts/test.shWe highlight three directions for extending Caco:
-
Raising Difficulty: Incorporate harder and cleaner seed datasets (e.g. AM-Thinking-distill, DAPO) and apply hardness-aware sampling with adversarial program mutations.
-
Expanding Diversity: Extend beyond math to science, logic, proofs and procedural planning. Train multi-domain CodeGen with domain tags and compositional templates.
-
RL with Verifiable Rewards (RLVR): Caco’s executable traces provide a natural, low-noise reward signal, which can be seamlessly applied to scale up RLVR data.
If you find our code, model, or data are useful, please kindly cite our paper:
@article{caco,
title={Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning},
author={Honglin Lin and Qizhi Pei and Xin Gao and Zhuoshi Pan and Yu Li and Juntao Li and Conghui He and Lijun Wu},
journal={arXiv preprint arXiv:2510.04081},
year={2025}
}
