Skip to content

Official code for DISC: Dynamic Decomposition Improves LLM Inference Scaling — a plug-and-play inference algorithm that adaptively decomposes reasoning traces for more efficient and accurate LLM reasoning.

License

Notifications You must be signed in to change notification settings

disc-search/disc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DISC: Dynamic Decomposition Improves LLM Inference Scaling

TL;DR. DISC adaptively partitions reasoning traces during inference so models spend more compute on the hardest steps, improving accuracy at fixed token budgets and cutting pass@10 error on APPS, MATH500, and LiveCodeBench.


Table of Contents


Overview

DISC is a recursive inference-time procedure that proposes candidate prefixes, scores them with an outcome reward, and dynamically advances or contracts step size—allocating more samples to uncertain prefixes while skipping easy regions. Plug-and-play with greedy, beam, or MCTS search.


Key Features

  • Adaptive decomposition: adjusts step sizes on the fly; no handcrafted heuristics.
  • Compute efficiency: higher accuracy at the same token budget; fewer total tokens for fixed samples.
  • Search-agnostic: drop-in with greedy/beam/MCTS; one operator controls node expansion.
  • Minimal assumptions: needs only a scalar outcome reward (e.g., unit tests, verifiers, self-critique).
  • Provable monotonic improvement of best solution prefix under mild support assumptions.

Repo Structure

.
├── src/
│   ├── solvers/              # Core dynamic decomposition logic (DISC, baselines)
│   ├── llm_models/           # Model adapters (OpenAI, HuggingFace, etc.)
│   ├── tasks/                # Task harnesses
│   │   ├── apps/             # APPS benchmark
│   │   ├── MATH/             # MATH500 benchmark
│   │   └── livecodebench/    # LiveCodeBench benchmark
│   ├── executors/            # Code execution and testing
│   ├── scripts/              # Experimental scripts for reproducing results
│   ├── data_analysis/        # Post-experiment analysis and plotting
│   │   ├── produce_standard_charts_demo.ipynb  # Demo notebook for analysis
│   │   └── utils.py          # Analysis utilities
│   ├── conf/                 # Hydra configuration files
│   │   ├── inference.yaml    # Main inference config
│   │   ├── auxgen.yaml       # Test generation config
│   │   ├── solver/           # Solver configs (DISC, BoN, baselines)
│   │   └── task/             # Task-specific configs
│   ├── run_inference.py      # Main entry point for inference
│   └── run_auxgen.py         # Entry point for test generation
├── data/                     # Generated solutions and benchmark data
│   └── generated_solutions/  # Output directory for experiments
├── environment.yml           # Conda environment specification
├── requirements.txt          # Pip requirements
└── README.md

Installation

We include both a environment.yml file for conda and a requirements.txt file for pip. To install the required packages, you can use either of the following commands:

# Option 1: Using conda (recommended)
conda env create -f environment.yml
conda activate disc

or

# Option 2: Using pip
pip install -r requirements.txt

We recommend using conda to manage the environment, as it is easier to install some of the required packages (e.g. pytorch) using conda.

Requirements:

  • Python 3.13+ (3.10+ should work)
  • PyTorch 2.5+ with CUDA support (for local model inference)
  • API keys for proprietary models (OpenAI, Anthropic, Google) if using them

Quickstart

Minimal example (APPS with DISC):

bash src/scripts/apps_inference.sh

This will run the full APPS experiment comparing different decomposition methods (BoN, newline-based, token-based, and DISC).

Single experiment with DISC:

python -m src.run_inference \
  run_name=my-experiment \
  task=apps_comp_test \
  solver=dycomp \
  solver.params.decomp_budget=30 \
  solver.params.alpha_fraction=0.15 \
  solver.params.model=gpt-4o-mini \
  solver.params.temperature=0.2 \
  solver.params.split_metric=zscore \
  top_k_problems=200

Dynamic Decomposition (How it works)

  • Identify pivotal prefixes via sampled continuations + outcome reward.
  • Adapt granularity: hard prefixes trigger contract (finer steps); easy ones advance (coarser steps).
  • Allocate compute where it matters: drive rollouts only when they improve a standardized score (e.g., z-score) over the current best prefix.
  • Search integration: same decomposition policy governs node expansion in greedy/beam/MCTS.

Reproducing Paper Results

Scripts for reproducing the results are provided in the src/scripts/ directory. The scripts are named according to the experiments they reproduce. Scripts assume you are not using SLURM for job scheduling. If you are using SLURM, see the src/scripts/slurm_version/ directory.

All scripts can be run directly from the repository root:

Main Benchmark Comparisons

Compare DISC against baselines (BoN, newline decomposition, token decomposition) on different benchmarks:

APPS (Competition Problems):

bash src/scripts/apps_inference.sh

Compares different decomposition methods on APPS competition problems. Runs single generation, BoN, newline-based decomposition, token-based decomposition, and DISC with different split metrics.

MATH500:

bash src/scripts/math_inference.sh

Main comparisons on MATH500 benchmark with verifier-based rewards.

LiveCodeBench:

bash src/scripts/livecodebench_inference.sh

Evaluates different methods on LiveCodeBench with sandboxed unit tests.

Ablation Studies

Priority Metric Ablation:

bash src/scripts/apps_metric.sh

Compares different split metrics for DISC: mean, z-score, random, negative-mean, and negative-z-score.

Model Comparison:

bash src/scripts/apps_model.sh

Compares DISC performance across different LLM models:

  • gpt-4o-mini
  • gpt-4o
  • Llama-3.1-8B
  • Mistral-7B-v0.3
  • DeepSeek-R1-Distill-Llama-8B
  • Qwen-2.5-7B

Search Strategy Comparison:

bash src/scripts/apps_search.sh

Compares DISC with different search strategies: greedy (baseline), MCTS, and beam search with various beam sizes.

Temperature Ablation:

bash src/scripts/apps_temperature.sh

Sweeps temperature values (0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4) comparing BoN vs. DISC.

Alpha Fraction Ablation:

bash src/scripts/apps_alpha_fraction.sh

Ablation study on the alpha_fraction hyperparameter (0.05, 0.10, 0.15, 0.20, 0.25, 0.30).

Self-Generated Validation Tests

When ground-truth tests are not available, DISC can use self-generated validation tests:

1. Generate validation tests:

bash src/scripts/apps_testgen.sh

Generates validation tests for each problem and records them in a jsonl file under data/apps-testgen/. Uses the configuration in src/conf/auxgen.yaml.

2. Run experiments with self-generated tests:

bash src/scripts/apps_val.sh

Runs BoN and decomposition baselines using self-generated validation tests instead of ground-truth tests. Note: You need to update the TESTGEN_PATH variable in the script to point to the generated tests from step 1.

Configuration

Experiments are configured using Hydra. Default configurations can be found in:

  • src/conf/inference.yaml - Main inference configuration
  • src/conf/auxgen.yaml - Test generation configuration
  • src/conf/solver/ - Solver-specific configs (dycomp, bon, spchar, tokencomp, etc.)
  • src/conf/task/ - Task-specific configs (apps, math_500, livecodebench, etc.)

You can modify parameters by:

  1. Editing the configuration files directly
  2. Passing command-line overrides (e.g., solver.params.temperature=0.5)
  3. Editing the shell scripts in src/scripts/

By default, generated solutions are saved to data/generated_solutions/<task_name>/ directory. You can change this by modifying the solution_set_path parameter in the configuration file.


Evaluation & Analysis

Analyzing Results

You can analyze results using the notebook src/data_analysis/produce_standard_charts_demo.ipynb. This notebook will:

  • Load generated solutions from jsonl files
  • Compute metrics used in the paper (pass@k, token usage, etc.)
  • Generate plots comparing different methods
  • Save plots to the root directory

Usage:

  1. Open src/data_analysis/produce_standard_charts_demo.ipynb
  2. Update the paths to your generated solutions jsonl files
  3. Run the notebook cells
  4. Plots will be generated and saved

Metrics:

  • Primary: pass@k (k = 1, 10) for coding/math
  • Secondary: token usage & sample count for compute efficiency
  • Reward sources: unit tests (code), verifiers (math), self-critique (generic)

Supported Models

DISC supports both open-source and proprietary models:

Open-source models:

  • LLaMA-3.1-8B
  • Mistral-7B-v0.3
  • Qwen-2.5-7B
  • DeepSeek-R1-Distill-Llama-8B

Proprietary models:

  • OpenAI (gpt-4o, gpt-4o-mini, etc.)
  • Anthropic Claude (via API)
  • Google Gemini (via API)

Model adapters are in src/llm_models/. To add a new model:

  1. Implement the model interface in src/llm_models/
  2. Update configuration to reference the new model
  3. Ensure API keys are set in environment variables if needed

Configuration

DISC uses Hydra for configuration management. Key parameters:

Solver parameters (DISC):

solver: dycomp
solver.params:
  decomp_budget: 30           # Total sampling budget
  alpha_fraction: 0.15        # Threshold for advancing/contracting
  split_metric: zscore        # Priority metric (mean|zscore|random|negmean|negzscore)
  temperature: 0.2            # Sampling temperature
  model: gpt-4o-mini          # Model identifier
  stop_sum_score: 1.0         # Stop when cumulative reward >= this value

Task parameters:

task: apps_comp_test          # Task identifier
top_k_problems: 200           # Number of problems to solve (-1 for all)

Search variants:

  • solver=dycomp - Greedy DISC (default)
  • solver=dycomp_mcts - MCTS with DISC decomposition
  • solver=dycomp_beam - Beam search with DISC decomposition

Baseline solvers:

  • solver=simple - Single generation
  • solver=bon - Best-of-N sampling
  • solver=spchar - Newline-based decomposition
  • solver=tokencomp - Token-based decomposition

Citation

If you use DISC in your research, please cite our NeurIPS 2025 paper:

@inproceedings{light2025disc,
  title        = {{DISC}: Dynamic decomposition improves {LLM} inference scaling},
  author       = {Light, Jonathan and Cheng, Wei and Riviere, Benjamin and Wu, Yue and Oyamada, Masafumi and Wang, Mengdi and Yue, Yisong and Paternain, Santiago and Chen, Haifeng},
  booktitle    = {Advances in Neural Information Processing Systems (NeurIPS 2025)},
  year         = {2025}
}

Contact & Acknowledgements

For questions or issues, please open an issue on GitHub or contact the authors.

Core authors and affiliations are listed in the paper.

About

Official code for DISC: Dynamic Decomposition Improves LLM Inference Scaling — a plug-and-play inference algorithm that adaptively decomposes reasoning traces for more efficient and accurate LLM reasoning.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published