Awesome-Parallel-Text-Generation

Our Survey

A Survey on Parallel Text Generation: From Parallel Decoding to Diffusion Language Models

The first comprehensive survey for Parallel Text Generation Methods. [PDF]

Methodology

AR-Based

Draft-and-Verify

Paper	Venue	Code
Adaptive Draft-Verification for Efficient Large Language Model Decoding	AAAI 2025
Speculative Decoding with Big Little Decoder	NeurIPS 2023
Block Verification Accelerates Speculative Decoding	ICLR 2025	-
Cascade speculative drafting for even faster llm inference	NeurIPS 2023
Dynamic Depth Decoding: Faster Speculative Decoding for LLMs	arxiv 2024	-
Distillspec: Improving speculative decoding via knowledge distillation	ICLR 2024	-
Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding	ACL 2024
Dynamic-Width Speculative Beam Decoding for Efficient LLM Inference	AAAI 2025
DySpec: Faster Speculative Decoding with Dynamic Token Tree Structure	WWW 2025	-
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty	ICML 2024
Eagle-2: Faster Inference of Language Models with Dynamic Draft Trees	EMNLP 2024
Speculative Decoding via Early-Exiting for Faster LLM Inference with Thompson Sampling Control Mechanism	ACL 2024	-
Falcon: Faster and Parallel Inference of Large Language Models through Enhanced Semi-Autoregressive Drafting and Custom-designed Decoding Tree	AAAI 2025
Fast Inference from Transformers via Speculative Decoding	ICML 2023
Graph-Structured Speculative Decoding	ACL 2024
Learning Harmonized Representations for Speculative Sampling	ICLR 2025
Hydra: Sequentially-dependent draft heads for medusa decoding	COLM 2024
Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment	ICLR 2025	-
Kangaroo: Lossless self-speculative decoding via double early exiting	NeurIPS 2024
Layer-skip: Enabling early-exit inference and self-speculative decoding	ACL 2024
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads	PMLR 2024
Mixture of Attentions for Speculative Decoding	ICLR 2025
Optimized multi-token joint decoding with auxiliary model for llm inference	ICLR 2025
A Drop-in Solution for On-the-fly Adaptation of Speculative Decoding in Large Language Models	ACL 2025	-
OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure	TACL 2025
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting	EMNLP 2024
Online Speculative Decoding	ICML 2024
Pass: Parallel speculative sampling	NeurIPS-ENLSP 2023	-
Parallel Speculative Decoding with Adaptive Draft Length	ICLR 2025
PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation	SC 2024
Predictive Pipelined Decoding: A Compute-Latency Trade-off for Exact LLM Decoding	TMLR 2024	-
ProPD: Dynamic Token Tree Pruning and Generation for LLM Parallel Decoding	ICCAD 2024	-
REST: Retrieval-based speculative decoding	NAACL 2024
Recursive Speculative Decoding: Accelerating LLM Inference via Sampling without Replacement	ICLR-LLMA 2024	-
Sequoia: Scalable, robust, and hardware-aware speculative decoding	arxiv 2024
Generation meets verification: Accelerating large language model inference with smart parallel auto-correct decoding	ACL 2024
Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation	EMNLP 2023
Specdec++: Boosting speculative decoding via adaptive candidate lengths	COLM 2025
Specinfer: Accelerating generative large language model serving with tree-based speculative inference and verification	ASPLOS 2024
SpecTr: Fast Speculative Decoding via Optimal Transport	NeurIPS 2023	-
Speed: speculative pipelined execution for efficient decoding	NeurIPS-ENLSP 2023	-
Swift: On-the-fly self-speculative decoding for llm inference acceleration	ICLR 2025
SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning	NeurIPS 2025
Fail Fast, Win Big: Rethinking the Drafting Strategy in Speculative Decoding via Diffusion LLMs	arXiv 2025

Decomposition-and-Fill

Paper	Venue	Code
PARALLELPROMPT: Extracting Parallelism from Large Language Model Queries	arxiv 2025	-
Falcon: Faster and parallel inference of large language models through enhanced semi-autoregressive drafting and custom-designed decoding tree	AAAI 2025
Navigating the Path of Writing: Outline-guided Text Generation with Large Language Models	NAACL 2025	-
Skeleton-of-thought: Prompting llms for efficient parallel generation	ICLR 2024
SPRINT: Enabling Interleaved Planning and Parallelized Execution in Reasoning Models	arxiv 2025	-

Multiple Token Prediction

Paper	Venue	Code
L-MTP: Leap Multi-Token Prediction Beyond Adjacent Context for Large Language Models	arxiv 2025	-
On multi-token prediction for efficient LLM inference	arxiv 2025	-
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads	ICML 2024
Multi-Token Prediction Needs Registers	arxiv 2025
Blockwise Parallel Decoding for Deep Autoregressive Models	NeurIPS 2018	-
Pass: Parallel speculative sampling	NeurIPS-ENLSP 2023	-
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty	ICML 2024
Your LLM Knows the Future: Uncovering Its Multi-Token Prediction Potential	arxiv 2025	-
ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training	EMNLP 2020
Better & faster large language models via multi-token prediction	ICML 2024	-
Deepseek-v3 technical report	arxiv 2024
MiMo: Unlocking the Reasoning Potential of Language Model--From Pretraining to Posttraining	arxiv 2025

Non-AR-Based

One-Shot Generation

Paper	Venue	Code
Non-autoregressive neural machine translation	ICLR 2018
End-to-end non-autoregressive neural machine translation with connectionist temporal classification	EMNLP 2018
Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement	EMNLP 2018
Lava nat: A non-autoregressive translation model with look-around decoding and vocabulary attention	arxiv 2025	-
AligNART: Non-autoregressive neural machine translation by jointly learning to estimate alignment and translate	EMNLP 2021	-
Guiding non-autoregressive neural machine translation decoding with reordering information	AAAI 2021
Non-monotonic latent alignments for ctc-based non-autoregressive machine translation	NeurIPS 2022
DePA: Improving Non-autoregressive Machine Translation with Dependency-Aware Decoder	ACL 2023
Directed acyclic transformer for non-autoregressive machine translation	ICML 2022
Viterbi decoding of directed acyclic transformer for non-autoregressive machine translation	EMNLP 2022
Fully Non-autoregressive Neural Machine Translation: Tricks of the Trade	ACL-IJCNLP 2021	-
Aligned cross entropy for non-autoregressive machine translation	ICML 2020
ngram-OAXE: Phrase-based order-agnostic cross entropy for non-autoregressive machine translation	COLING 2022
Multi-granularity optimization for non-autoregressive translation	EMNLP 2022
One reference is not enough: Diverse distillation with reference selection for non-autoregressive translation	NAACL 2022

Masked Generation

Paper	Venue	Code
Accelerating Large Language Model Decoding with Speculative Sampling	arxiv 2023
Masked diffusion models are secretly time-agnostic masked models and exploit inaccurate categorical sampling	ICLR 2025	-
A continuous time framework for discrete denoising models	NeurIPS 2022
Discrete diffusion modeling by estimating the ratios of the data distribution	ICML 2024
Simplified and generalized masked diffusion for discrete data	NeurIPS 2024
Seed Diffusion	arxiv 2025	-
Target concrete score matching: A holistic framework for discrete diffusion	ICML 2025	-
Discrete diffusion modeling by estimating the ratios of the data distribution	ICML 2024
Score-based continuous-time discrete diffusion models	ICLR 2023	-
Fast-dllm: Training-free acceleration of diffusion llm by enabling kv cache and parallel decoding	arxiv 2025
Large language diffusion models	ICLR 2025
Beyond autoregression: Discrete diffusion for complex reasoning and planning	ICLR 2025
A reparameterized discrete diffusion model for text generation	COLM 2024
Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions	ICML 2025	-
Accelerated Sampling from Masked Diffusion Models via Entropy Bounded Unmasking	arxiv 2025	-
Accelerating Diffusion Large Language Models with SlowFast: The Three Golden Principles	arxiv 2025
A continuous time framework for discrete denoising models	NeurIPS 2022
Remasking discrete diffusion models with inference-time scaling	ICLR 2025
Simplified and generalized masked diffusion for discrete data	NeurIPS 2024
Path planning for masked diffusion model sampling	arxiv 2025
Think while you generate: Discrete diffusion with planned denoising	ICLR 2025
Accelerating Diffusion LLMs via Adaptive Parallel Decoding	arxiv 2025	-
Reviving any-subset autoregressive models with principled parallel sampling and speculative decoding	arxiv 2025
dkv-cache: The cache for diffusion language models	arxiv 2025
Accelerating diffusion language model inference via efficient kv caching and guided diffusion	arxiv 2025	-
Esoteric Language Models	arxiv 2025
Beyond Autoregression: Fast LLMs via Self-Distillation Through Time	ICLR 2025	-
Cllms: Consistency large language models	ICML 2024
The diffusion duality	ICML 2025
d1: Scaling reasoning in diffusion large language models via reinforcement learning	arxiv 2025
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models	AAAI 2025
DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation	arxiv 2025
Scaling diffusion language models via adaptation from autoregressive models	ICLR 2025
Dream 7B	arxiv 2025
DIFFPO: Diffusion-styled Preference Optimization for Efficient Inference-Time Alignment of Large Language Models	ACL 2025
Wide-In, Narrow-Out: Revokable Decoding for Efficient and Effective DLLMs	arxiv 2025

Edit-Based Refinement

Paper	Venue	Code
Insertion transformer: Flexible sequence generation via insertion operations	ICML 2019	-
Levenshtein transformer	NeurIPS 2019
EDITOR: An edit-based transformer with repositioning for neural machine translation with soft lexical constraints	TACL 2021
FELIX: Flexible Text Editing Through Tagging and Insertion	EMNLP 2020	-
Levenshtein OCR	ECCV 2022
FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition	NeurIPS 2021
Non-autoregressive Text Editing with Copy-aware Latent Alignments	EMNLP 2023
Reinforcement Learning for Edit-Based Non-Autoregressive Neural Machine Translation	NAACL-SRW 2024	-
Summarizing Like Human: Edit-Based Text Summarization with Keywords	ICANN 2024	-
Deterministic non-autoregressive neural sequence modeling by iterative refinement	EMNLP 2018
Flowseq: Non-autoregressive conditional sequence generation with generative flow	EMNLP 2019
Latent-variable non-autoregressive neural machine translation with deterministic inference using a delta posterior	AAAI 2020
Iterative Refinement in the Continuous Space for Non-Autoregressive Neural Machine Translation	EMNLP 2020
Non-autoregressive machine translation with auxiliary regularization	AAAI 2019	-
Imitation learning for non-autoregressive neural machine translation	ACL 2019	-
An imitation learning curriculum for text editing with non-autoregressive models	ACL 2022
Fast structured decoding for sequence models	NeurIPS 2019
An EM approach to non-autoregressive conditional sequence generation	ICML 2020	-
Imputer: Sequence modelling via imputation and dynamic programming	ICML 2020
Align-Refine: Non-Autoregressive Speech Recognition via Iterative Realignment	NAACL 2021
Learning to rewrite for non-autoregressive neural machine translation	EMNLP 2021
RenewNAT: renewing potential translation for non-autoregressive transformer	AAAI 2023	-
Learning to recover from multi-modality errors for non-autoregressive neural machine translation	ACL 2020
Hybrid-regressive neural machine translation	ICLR 2023	-
Iterative Translation Refinement with Large Language Models	EAMT 2024	-
IterGen: Iterative Semantic-aware Structured LLM Generation with Backtracking	ICLR 2025
Rejuvenating low-frequency words: Making the most of parallel data in non-autoregressive translation	ACL 2021
Understanding and Improving Lexical Choice in Non-Autoregressive Translation	ICLR 2021
SlotRefine: A fast non-autoregressive model for joint intent detection and slot filling	EMNLP 2020
Non-autoregressive dialog state tracking	ICLR 2020