Residual Context Diffusion (RCD)

Unleashing the Potential of Diffusion LLMs via Residual Denoising

Example generation on AIME24. RCD increases parallelism by 4x while maintaining the baseline's peak accuracy.

Introduction

This repository contains the code to replicate our study in "Residual Context Diffusion Language Models". In this study, we point out that diffusion Large Language Models (dLLMs) enable parallel decoding but often trail autoregressive models in accuracy. A key culprit is the inference-time remasking strategy that commits only high-confidence tokens and discards the rest, wasting intermediate computation.

RCD introduces a residual denoising mechanism that turns discarded token distributions into contextual residuals and injects them into the next denoising step. With a two-stage training pipeline, RCD avoids backprop-through-time memory costs while preserving the benefits of residual feedback.

News

[2025/02] Project page, arXiv and models are published.

Performance

TL;DR: RCD consistently improves diffusion reasoning accuracy over Sequential Denoising (SeqD) across both SDAR and LLaDA, with the biggest gains on harder competition-style benchmarks (AIME24/25) and MinervaMath.

SDAR

Models: SDAR 4B / 8B, block size b=32 / 64 (KV cache reuse)
Eval: SeqD/RCD use 16,384 sequence length; Chat uses 512 tokens (and 1,024 for AIME); confidence threshold = 0.85

Model	Variant	GSM8K¹	MATH500	AIME24	AIME25
SDAR-4B-b32	Chat²	86.13	50.20	5.83	2.50
SDAR-4B-b32	SeqD	81.73	61.20	6.04	11.88
SDAR-4B-b32	RCD	85.67	70.80	11.04	17.50
SDAR-4B-b64	Chat²	85.90	49.80	6.25	1.67
SDAR-4B-b64	SeqD	78.85	56.80	4.17	7.29
SDAR-4B-b64	RCD	84.76	67.80	13.75	15.83
SDAR-8B-b32	Chat²	88.40	50.00	6.46	4.17
SDAR-8B-b32	SeqD	86.50	65.80	11.67	14.79
SDAR-8B-b32	RCD	89.76	77.60	21.46	20.00
SDAR-8B-b64	Chat²	88.32	51.60	5.20	2.50
SDAR-8B-b64	SeqD	82.87	64.20	7.08	9.79
SDAR-8B-b64	RCD	88.70	73.60	15.00	19.79

LLaDA

Eval: sequence length 512, single-token-per-step decoding

Model	Variant	GSM8K	MinervaMath
LLaDA	Base³	70.30	31.40
LLaDA	SeqD	75.74	31.10
LLaDA	RCD	78.09	37.00

Model Zoo

We provide all checkpoints of our models!

For sequential denoising dLLMs (standard SFT from base models):

Name	URL
SeqD-SDAR-4B-b32-Thinking	model
SeqD-SDAR-4B-b64-Thinking	model
SeqD-SDAR-8B-b32-Thinking	model
SeqD-SDAR-8B-b64-Thinking	model
SeqD-LLaDA-8B-Instruct	model

For residual denoising dLLMs (a SeqD reference is required to warm start the generation):

Name	URL	Ref Model	URL
RCD-SDAR-4B-b32-Thinking	model	SeqD-SDAR-1.7B-b32-Thinking	model
RCD-SDAR-4B-b64-Thinking	model	SeqD-SDAR-1.7B-b64-Thinking	model
RCD-SDAR-8B-b32-Thinking	model	SeqD-SDAR-1.7B-b32-Thinking	model
RCD-SDAR-8B-b64-Thinking	model	SeqD-SDAR-1.7B-b64-Thinking	model
RCD-LLaDA-8B-Instruct	model	SeqD-LLaDA-8B-Instruct	model

Run RCD

The minimal implementation for text generation can be found in generate*.py. This file runs with only the standard transformers library as a dependency:

pip install transformers==4.52.3

# Running sequential denoising
CUDA_VISIBLE_DEVICES=0 python SDAR-ref/generate_seqd.py \
  --model_dir yuezhouhu/SeqD-SDAR-4B-b64-Thinking \
  --trust_remote_code \
  --block_length 64 \
  --denoising_steps 64 \
  --temperature 0 \
  --dtype bfloat16 \
  --confidence_threshold 0.85

# Running residual denoising
CUDA_VISIBLE_DEVICES=0 python SDAR-target/generate_rcd.py \
  --model_dir yuezhouhu/RCD-SDAR-4B-b64-Thinking \
  --ref_model_dir yuezhouhu/SeqD-SDAR-1.7B-b64-Thinking \
  --trust_remote_code \
  --block_length 64 \
  --denoising_steps 64 \
  --temperature 0 \
  --dtype bfloat16 \
  --confidence_threshold 0.85

Reproducing Results

We provide the full training and evaluation code to reproduce our results.

Repository Layout

LLaDA-ref/: Reference Model (and baseline Sequential Denoising LLaDA model) code and configs.
LLaDA-target/: Target Model code and configs.
SDAR-ref/: Reference Model (and baseline Sequential Denoising SDAR models) code and configs.
SDAR-target/: Target Model code and configs.

Installation

Each sub-project is self-contained and has its own environment:

LLaDA reference: ./LLaDA-ref/README.md
LLaDA target: ./LLaDA-target/README.md
SDAR reference: ./SDAR-ref/README.md
SDAR target: ./SDAR-target/README.md

Evaluation

LLaDA:
- Eval script(s): LLaDA-*/examples/llada/eval_openmathinstruct.sh
SDAR:
- Eval scripts: SDAR-*/eval_simple.sh, SDAR-*/eval_aime.sh

Training

Training recipes live in each sub-project:

LLaDA: LLaDA-*/examples/llada/run.sh
SDAR: SDAR-*/run.sh

Citation

@misc{hu2026residualcontextdiffusionlanguage,
      title={Residual Context Diffusion Language Models}, 
      author={Yuezhou Hu and Harman Singh and Monishwaran Maheswaran and Haocheng Xi and Coleman Hooper and Jintao Zhang and Aditya Tomar and Michael W. Mahoney and Sewon Min and Mehrdad Farajtabar and Kurt Keutzer and Amir Gholami and Chenfeng Xu},
      year={2026},
      eprint={2601.22954},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2601.22954}, 
}

We observed potential data contamination in the original Chat models on GSM8K, which may inflate the Chat baseline. ↩
Chat variants are instruction-following models; SeqD/RCD are further adapted for mathematical reasoning. ↩ ↩² ↩³ ↩⁴
LLaDA-Base results are taken from the original LLaDA paper. ↩

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Residual Context Diffusion (RCD)

Unleashing the Potential of Diffusion LLMs via Residual Denoising

Introduction

News

Performance

SDAR

LLaDA

Model Zoo

Run RCD

Reproducing Results

Repository Layout

Installation

Evaluation

Training

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
LLaDA-ref		LLaDA-ref
LLaDA-target		LLaDA-target
SDAR-ref		SDAR-ref
SDAR-target		SDAR-target
assets		assets
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
generate_rcd.py		generate_rcd.py
generate_seqd.py		generate_seqd.py

Folders and files

Latest commit

History

Repository files navigation

Residual Context Diffusion (RCD)

Unleashing the Potential of Diffusion LLMs via Residual Denoising

Introduction

News

Performance

SDAR

LLaDA

Model Zoo

Run RCD

Reproducing Results

Repository Layout

Installation

Evaluation

Training

Citation

Footnotes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages