DISC: Dynamic Decomposition Improves LLM Inference Scaling

TL;DR. DISC adaptively partitions reasoning traces during inference so models spend more compute on the hardest steps, improving accuracy at fixed token budgets and cutting pass@10 error on APPS, MATH500, and LiveCodeBench.

Overview

DISC is a recursive inference-time procedure that proposes candidate prefixes, scores them with an outcome reward, and dynamically advances or contracts step size—allocating more samples to uncertain prefixes while skipping easy regions. Plug-and-play with greedy, beam, or MCTS search.

Key Features

Adaptive decomposition: adjusts step sizes on the fly; no handcrafted heuristics.
Compute efficiency: higher accuracy at the same token budget; fewer total tokens for fixed samples.
Search-agnostic: drop-in with greedy/beam/MCTS; one operator controls node expansion.
Minimal assumptions: needs only a scalar outcome reward (e.g., unit tests, verifiers, self-critique).
Provable monotonic improvement of best solution prefix under mild support assumptions.

Repo Structure

.
├── src/
│   ├── solvers/              # Core dynamic decomposition logic (DISC, baselines)
│   ├── llm_models/           # Model adapters (OpenAI, HuggingFace, etc.)
│   ├── tasks/                # Task harnesses
│   │   ├── apps/             # APPS benchmark
│   │   ├── MATH/             # MATH500 benchmark
│   │   └── livecodebench/    # LiveCodeBench benchmark
│   ├── executors/            # Code execution and testing
│   ├── scripts/              # Experimental scripts for reproducing results
│   ├── data_analysis/        # Post-experiment analysis and plotting
│   │   ├── produce_standard_charts_demo.ipynb  # Demo notebook for analysis
│   │   └── utils.py          # Analysis utilities
│   ├── conf/                 # Hydra configuration files
│   │   ├── inference.yaml    # Main inference config
│   │   ├── auxgen.yaml       # Test generation config
│   │   ├── solver/           # Solver configs (DISC, BoN, baselines)
│   │   └── task/             # Task-specific configs
│   ├── run_inference.py      # Main entry point for inference
│   └── run_auxgen.py         # Entry point for test generation
├── data/                     # Generated solutions and benchmark data
│   └── generated_solutions/  # Output directory for experiments
├── environment.yml           # Conda environment specification
├── requirements.txt          # Pip requirements
└── README.md

Installation

We include both a environment.yml file for conda and a requirements.txt file for pip. To install the required packages, you can use either of the following commands:

# Option 1: Using conda (recommended)
conda env create -f environment.yml
conda activate disc

or

# Option 2: Using pip
pip install -r requirements.txt

We recommend using conda to manage the environment, as it is easier to install some of the required packages (e.g. pytorch) using conda.

Requirements:

Python 3.13+ (3.10+ should work)
PyTorch 2.5+ with CUDA support (for local model inference)
API keys for proprietary models (OpenAI, Anthropic, Google) if using them

Quickstart

Minimal example (APPS with DISC):

bash src/scripts/apps_inference.sh

This will run the full APPS experiment comparing different decomposition methods (BoN, newline-based, token-based, and DISC).

Single experiment with DISC:

python -m src.run_inference \
  run_name=my-experiment \
  task=apps_comp_test \
  solver=dycomp \
  solver.params.decomp_budget=30 \
  solver.params.alpha_fraction=0.15 \
  solver.params.model=gpt-4o-mini \
  solver.params.temperature=0.2 \
  solver.params.split_metric=zscore \
  top_k_problems=200

Dynamic Decomposition (How it works)

Identify pivotal prefixes via sampled continuations + outcome reward.
Adapt granularity: hard prefixes trigger contract (finer steps); easy ones advance (coarser steps).
Allocate compute where it matters: drive rollouts only when they improve a standardized score (e.g., z-score) over the current best prefix.
Search integration: same decomposition policy governs node expansion in greedy/beam/MCTS.

Reproducing Paper Results

Scripts for reproducing the results are provided in the src/scripts/ directory. The scripts are named according to the experiments they reproduce. Scripts assume you are not using SLURM for job scheduling. If you are using SLURM, see the src/scripts/slurm_version/ directory.

All scripts can be run directly from the repository root:

Main Benchmark Comparisons

Compare DISC against baselines (BoN, newline decomposition, token decomposition) on different benchmarks:

APPS (Competition Problems):

bash src/scripts/apps_inference.sh

Compares different decomposition methods on APPS competition problems. Runs single generation, BoN, newline-based decomposition, token-based decomposition, and DISC with different split metrics.

MATH500:

bash src/scripts/math_inference.sh

Main comparisons on MATH500 benchmark with verifier-based rewards.

LiveCodeBench:

bash src/scripts/livecodebench_inference.sh

Evaluates different methods on LiveCodeBench with sandboxed unit tests.

Ablation Studies

Priority Metric Ablation:

bash src/scripts/apps_metric.sh

Compares different split metrics for DISC: mean, z-score, random, negative-mean, and negative-z-score.

Model Comparison:

bash src/scripts/apps_model.sh

Compares DISC performance across different LLM models:

gpt-4o-mini
gpt-4o
Llama-3.1-8B
Mistral-7B-v0.3
DeepSeek-R1-Distill-Llama-8B
Qwen-2.5-7B

Search Strategy Comparison:

bash src/scripts/apps_search.sh

Compares DISC with different search strategies: greedy (baseline), MCTS, and beam search with various beam sizes.

Temperature Ablation:

bash src/scripts/apps_temperature.sh

Sweeps temperature values (0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4) comparing BoN vs. DISC.

Alpha Fraction Ablation:

bash src/scripts/apps_alpha_fraction.sh

Ablation study on the alpha_fraction hyperparameter (0.05, 0.10, 0.15, 0.20, 0.25, 0.30).

Self-Generated Validation Tests

When ground-truth tests are not available, DISC can use self-generated validation tests:

1. Generate validation tests:

bash src/scripts/apps_testgen.sh

Generates validation tests for each problem and records them in a jsonl file under data/apps-testgen/. Uses the configuration in src/conf/auxgen.yaml.

2. Run experiments with self-generated tests:

bash src/scripts/apps_val.sh

Runs BoN and decomposition baselines using self-generated validation tests instead of ground-truth tests. Note: You need to update the TESTGEN_PATH variable in the script to point to the generated tests from step 1.

Configuration

Experiments are configured using Hydra. Default configurations can be found in:

src/conf/inference.yaml - Main inference configuration
src/conf/auxgen.yaml - Test generation configuration
src/conf/solver/ - Solver-specific configs (dycomp, bon, spchar, tokencomp, etc.)
src/conf/task/ - Task-specific configs (apps, math_500, livecodebench, etc.)

You can modify parameters by:

Editing the configuration files directly
Passing command-line overrides (e.g., solver.params.temperature=0.5)
Editing the shell scripts in src/scripts/

By default, generated solutions are saved to data/generated_solutions/<task_name>/ directory. You can change this by modifying the solution_set_path parameter in the configuration file.

Evaluation & Analysis

Analyzing Results

You can analyze results using the notebook src/data_analysis/produce_standard_charts_demo.ipynb. This notebook will:

Load generated solutions from jsonl files
Compute metrics used in the paper (pass@k, token usage, etc.)
Generate plots comparing different methods
Save plots to the root directory

Usage:

Open src/data_analysis/produce_standard_charts_demo.ipynb
Update the paths to your generated solutions jsonl files
Run the notebook cells
Plots will be generated and saved

Metrics:

Primary: pass@k (k = 1, 10) for coding/math
Secondary: token usage & sample count for compute efficiency
Reward sources: unit tests (code), verifiers (math), self-critique (generic)

Supported Models

DISC supports both open-source and proprietary models:

Open-source models:

LLaMA-3.1-8B
Mistral-7B-v0.3
Qwen-2.5-7B
DeepSeek-R1-Distill-Llama-8B

Proprietary models:

OpenAI (gpt-4o, gpt-4o-mini, etc.)
Anthropic Claude (via API)
Google Gemini (via API)

Model adapters are in src/llm_models/. To add a new model:

Implement the model interface in src/llm_models/
Update configuration to reference the new model
Ensure API keys are set in environment variables if needed

Configuration

DISC uses Hydra for configuration management. Key parameters:

Solver parameters (DISC):

solver: dycomp
solver.params:
  decomp_budget: 30           # Total sampling budget
  alpha_fraction: 0.15        # Threshold for advancing/contracting
  split_metric: zscore        # Priority metric (mean|zscore|random|negmean|negzscore)
  temperature: 0.2            # Sampling temperature
  model: gpt-4o-mini          # Model identifier
  stop_sum_score: 1.0         # Stop when cumulative reward >= this value

Task parameters:

task: apps_comp_test          # Task identifier
top_k_problems: 200           # Number of problems to solve (-1 for all)

Search variants:

solver=dycomp - Greedy DISC (default)
solver=dycomp_mcts - MCTS with DISC decomposition
solver=dycomp_beam - Beam search with DISC decomposition

Baseline solvers:

solver=simple - Single generation
solver=bon - Best-of-N sampling
solver=spchar - Newline-based decomposition
solver=tokencomp - Token-based decomposition

Citation

If you use DISC in your research, please cite our NeurIPS 2025 paper:

@inproceedings{light2025disc,
  title        = {{DISC}: Dynamic decomposition improves {LLM} inference scaling},
  author       = {Light, Jonathan and Cheng, Wei and Riviere, Benjamin and Wu, Yue and Oyamada, Masafumi and Wang, Mengdi and Yue, Yisong and Paternain, Santiago and Chen, Haifeng},
  booktitle    = {Advances in Neural Information Processing Systems (NeurIPS 2025)},
  year         = {2025}
}

Contact & Acknowledgements

For questions or issues, please open an issue on GitHub or contact the authors.

Core authors and affiliations are listed in the paper.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DISC: Dynamic Decomposition Improves LLM Inference Scaling

Table of Contents

Overview

Key Features

Repo Structure

Installation

Quickstart

Dynamic Decomposition (How it works)

Reproducing Paper Results

Main Benchmark Comparisons

Ablation Studies

Self-Generated Validation Tests

Configuration

Evaluation & Analysis

Analyzing Results

Supported Models

Configuration

Citation

Contact & Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

disc-search/disc

Folders and files

Latest commit

History

Repository files navigation

DISC: Dynamic Decomposition Improves LLM Inference Scaling

Table of Contents

Overview

Key Features

Repo Structure

Installation

Quickstart

Dynamic Decomposition (How it works)

Reproducing Paper Results

Main Benchmark Comparisons

Ablation Studies

Self-Generated Validation Tests

Configuration

Evaluation & Analysis

Analyzing Results

Supported Models

Configuration

Citation

Contact & Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages