Skip to content

furiosa-ai/ParallelBench

Repository files navigation

ParallelBench: Understanding the Tradeoffs of Parallel Decoding in Diffusion LLMs

Wonjun Kang*1,5, Kevin Galim*1, Seunghyuk Oh*1, Minjae Lee1, Yuchen Zeng2,3, Shuibai Zhang2,
Coleman Hooper4, Yuezhou Hu4, Hyung Il Koo1, Nam Ik Cho5, Kangwook Lee2,6,7

1FuriosaAI, 2UW-Madison, 3Microsoft Research, 4UC Berkeley,
5Seoul National University, 6KRAFTON, 7Ludo Robotics

Project arXiv

πŸ”” Updates

  • Jan 25, 2026 Paper accepted at ICLR 2026! πŸŽ‰
  • Oct 6, 2025 ParallelBench release!

🌍 Papers Using ParallelBench

The following works have evaluated their methods using ParallelBench. Check out how they tackle the speed-quality trade-off of parallel decoding!

πŸ—ΊοΈ Roadmap

We are currently working to support new models and implement advanced unmasking methods. If you are conducting dLLM research and would like to contribute new models or methods, please open an issue.

New Models

Advanced Unmasking Methods

πŸ”Ž Overview

Diffusion LLMs (dLLMs) promise faster generation via parallel decoding. However, this speed often comes at the cost of quality, as they ignore token dependencies, an issue that existing benchmarks do not sufficiently capture. To address this issue, we introduce ParallelBench, the first benchmark designed to rigorously test this trade-off through realistic tasks that humans and autoregressive (AR) LLMs can easily solve, but which cause dLLMs to collapse as parallelism grows. We release ParallelBench to drive research towards truly efficient dLLMs that can overcome this challenge.

Features

  • Information-Theoretic Analysis: Error bounds on parallel decoding for tasks with inter-token dependencies, showing accuracy degradation as parallelism grows.
  • Quantitative Case Studies: Synthetic list operations (Copy, Replace, Shuffle) with closed-form accuracy formulas that pin down where parallel decoding breaks.
  • 17 Benchmark Tasks: Three categories (Waiting Line, Text Writing, Puzzles) that humans and AR LLMs solve easily but expose quality drops in dLLMs under parallel decoding.

πŸ“ Key Concepts

ParallelBench measures how quality degrades as parallelism increases in dLLMs. The key variable is tokens per step (TPS) β€” the number of tokens generated in parallel at each denoising step.

Tokens per step Meaning
1 One-by-one decoding (equivalent to AR)
k k tokens decoded in parallel per step
max_tokens Fully parallel (one-step generation)

ParallelBench evaluates model + unmasking method combinations. The same model can yield very different quality-speed trade-offs depending on which unmasking method is used.

The benchmark score is PBx β€” the maximum TPS at which a given combination still achieves at least x% average accuracy across all tasks. For example, PB80 = 8 means the combination can decode up to 8 tokens in parallel while maintaining β‰₯ 80% accuracy. Higher PBx values indicate better quality preservation under parallel decoding.

For methods with deterministic TPS (top-k family), PBx is the measured TPS value. For methods with variable TPS (threshold, factor, etc.), PBx is computed via linear interpolation between adjacent (TPS, accuracy) points to find the exact TPS where accuracy crosses the threshold.

βš™οΈ Setup

1. Prerequisites

  • NVIDIA GPU: CUDA >= 11.8.

2. Clone

git clone --recurse-submodules https://github.com/furiosa-ai/ParallelBench.git
cd ParallelBench

3. Install

We use uv for faster package installation. The following command will install all dependencies including Python packages, PyTorch, vLLM, and JDK 17 (for grammar-based evaluation metrics).

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env  # Reload PATH to use uv
# Install all dependencies (Python + Java)
make install

Note: JDK 17 is installed locally via the install-jdk Python package β€” no sudo required. If you already have Java installed, the script will skip the installation.

4. Running pb CLI

The pb command is available through the virtual environment. Use either method:

# Option 1: Run directly via uv (no activation needed)
uv run pb <command>

# Option 2: Activate the virtual environment first
source .venv/bin/activate
pb <command>

⚑ Quickstart

# Browse tasks (no GPU required)
uv run pb browse                              # List all available tasks
uv run pb browse waiting_line/copy            # View samples from a specific task
uv run pb browse waiting_line/copy --index 3  # View a specific sample by index

# Run evaluation on a single task
uv run pb eval --model parallelbench_llada \
  --model_args model_path=GSAI-ML/LLaDA-1.5 \
  --gen_kwargs k=32,unmasking=random \
  --tasks parallelbench_waiting_line_copy \
  --include_path parallelbench/tasks \
  --batch_size 1

🎯 Evaluation Coverage

Tasks

Category Task CLI task name
Waiting Line (10) Copy parallelbench_waiting_line_copy
Insert (index) parallelbench_waiting_line_insert_index
Insert (random) parallelbench_waiting_line_insert_random
Remove (index) parallelbench_waiting_line_remove_index
Remove (random) parallelbench_waiting_line_remove_random
Replace (index) parallelbench_waiting_line_replace_index
Replace (random) parallelbench_waiting_line_replace_random
Reverse parallelbench_waiting_line_reverse
Shuffle parallelbench_waiting_line_shuffle
Sort parallelbench_waiting_line_sort
Text Writing (5) Paraphrasing parallelbench_text_writing_paraphrasing
Summarization parallelbench_text_writing_summarization
Words to Sentence (easy) parallelbench_text_writing_words_to_sentence_easy
Words to Sentence (medium) parallelbench_text_writing_words_to_sentence_medium
Words to Sentence (hard) parallelbench_text_writing_words_to_sentence_hard
Puzzles (2) Latin Square (4x4) parallelbench_puzzles_latin_square_n4
Sudoku (4x4) parallelbench_puzzles_sudoku_n4

Models

For additional models and unmasking methods, please refer to the Roadmap section.

Model family CLI wrapper (--model) Example model_path
LLaDA parallelbench_llada GSAI-ML/LLaDA-1.5
Dream, DiffuCoder parallelbench_dream Dream-org/Dream-v0-Instruct-7B
SDAR, TraDo parallelbench_trado Disabled (under investigation)
SEDD parallelbench_sedd louaaron/sedd-medium
AR baselines (vLLM) parallelbench_ar meta-llama/Llama-3.1-8B-Instruct
API models parallelbench_api Haiku, Mercury (requires .env keys)

Adding your own model? See the step-by-step guide and the example in parallelbench/models/local/example/.

Unmasking Methods

Strategy Type CLI value Description
Random Top-k (static) random Randomly selects which masked tokens to unmask
Origin Top-k (static) origin Dream's native timestep-based unmasking (default for Dream models)
Confidence Top-k (static) confidence_topk Unmasks tokens with highest model confidence
Margin Top-k (static) topk_margin Unmasks tokens with largest margin between top-2 predictions
Entropy Top-k (static) entropy_topk Unmasks tokens with lowest prediction entropy
Confidence Threshold Adaptive confidence_threshold Unmasks all tokens above a confidence threshold (alg_threshold)
Confidence Factor Adaptive confidence_factor Scales unmask count by a factor (alg_factor)

Top-k (static) methods unmask a fixed number of tokens per step β€” tokens per step is constant. Adaptive methods unmask a variable number of tokens per step β€” tokens per step varies, and the actual NFE (number of forward passes) is measured after generation.

Adding your own method? See the step-by-step guide.

πŸš€ Running Evaluations

For the full CLI reference, generation parameters, and examples, see the Running Evaluations guide.

πŸ™ Acknowledgements

Built on these open-source projects:

πŸ“– Citation

@article{kang2025parallelbench,
  title={ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs},
  author={Kang, Wonjun and Galim, Kevin and Oh, Seunghyuk and Lee, Minjae and Zeng, Yuchen and Zhang, Shuibai and Hooper, Coleman and Hu, Yuezhou and Koo, Hyung Il and Cho, Nam Ik and others},
  journal={arXiv preprint arXiv:2510.04767},
  year={2025}
}

About

[ICLR 2026] ParallelBench: Understanding the Tradeoffs of Parallel Decoding in Diffusion LLMs

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors