Skip to content

zhanglingzhe0820/Awesome-Parallel-Text-Generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 

Repository files navigation

Awesome-Parallel-Text-Generation

Our Survey

A Survey on Parallel Text Generation: From Parallel Decoding to Diffusion Language Models

The first comprehensive survey for Parallel Text Generation Methods. [PDF]

Methodology

AR-Based

Draft-and-Verify

Paper Venue Code
Adaptive Draft-Verification for Efficient Large Language Model Decoding AAAI 2025 Github
Speculative Decoding with Big Little Decoder NeurIPS 2023 Github
Block Verification Accelerates Speculative Decoding ICLR 2025 -
Cascade speculative drafting for even faster llm inference NeurIPS 2023 Github
Dynamic Depth Decoding: Faster Speculative Decoding for LLMs arxiv 2024 -
Distillspec: Improving speculative decoding via knowledge distillation ICLR 2024 -
Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding ACL 2024 Github
Dynamic-Width Speculative Beam Decoding for Efficient LLM Inference AAAI 2025 Github
DySpec: Faster Speculative Decoding with Dynamic Token Tree Structure WWW 2025 -
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty ICML 2024 Github
Eagle-2: Faster Inference of Language Models with Dynamic Draft Trees EMNLP 2024 Github
Speculative Decoding via Early-Exiting for Faster LLM Inference with Thompson Sampling Control Mechanism ACL 2024 -
Falcon: Faster and Parallel Inference of Large Language Models through Enhanced Semi-Autoregressive Drafting and Custom-designed Decoding Tree AAAI 2025 Github
Fast Inference from Transformers via Speculative Decoding ICML 2023 Github
Graph-Structured Speculative Decoding ACL 2024 Github
Learning Harmonized Representations for Speculative Sampling ICLR 2025 Github
Hydra: Sequentially-dependent draft heads for medusa decoding COLM 2024 Github
Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment ICLR 2025 -
Kangaroo: Lossless self-speculative decoding via double early exiting NeurIPS 2024 Github
Layer-skip: Enabling early-exit inference and self-speculative decoding ACL 2024 Github
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads PMLR 2024 Github
Mixture of Attentions for Speculative Decoding ICLR 2025 Github
Optimized multi-token joint decoding with auxiliary model for llm inference ICLR 2025 Github
A Drop-in Solution for On-the-fly Adaptation of Speculative Decoding in Large Language Models ACL 2025 -
OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure TACL 2025 Github
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting EMNLP 2024 Github
Online Speculative Decoding ICML 2024 Github
Pass: Parallel speculative sampling NeurIPS-ENLSP 2023 -
Parallel Speculative Decoding with Adaptive Draft Length ICLR 2025 Github
PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation SC 2024 Github
Predictive Pipelined Decoding: A Compute-Latency Trade-off for Exact LLM Decoding TMLR 2024 -
ProPD: Dynamic Token Tree Pruning and Generation for LLM Parallel Decoding ICCAD 2024 -
REST: Retrieval-based speculative decoding NAACL 2024 Github
Recursive Speculative Decoding: Accelerating LLM Inference via Sampling without Replacement ICLR-LLMA 2024 -
Sequoia: Scalable, robust, and hardware-aware speculative decoding arxiv 2024 Github
Generation meets verification: Accelerating large language model inference with smart parallel auto-correct decoding ACL 2024 Github
Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation EMNLP 2023 Github
Specdec++: Boosting speculative decoding via adaptive candidate lengths COLM 2025 Github
Specinfer: Accelerating generative large language model serving with tree-based speculative inference and verification ASPLOS 2024 Github
SpecTr: Fast Speculative Decoding via Optimal Transport NeurIPS 2023 -
Speed: speculative pipelined execution for efficient decoding NeurIPS-ENLSP 2023 -
Swift: On-the-fly self-speculative decoding for llm inference acceleration ICLR 2025 Github
SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning NeurIPS 2025 Github
Fail Fast, Win Big: Rethinking the Drafting Strategy in Speculative Decoding via Diffusion LLMs arXiv 2025 Github

Decomposition-and-Fill

Paper Venue Code
PARALLELPROMPT: Extracting Parallelism from Large Language Model Queries arxiv 2025 -
Falcon: Faster and parallel inference of large language models through enhanced semi-autoregressive drafting and custom-designed decoding tree AAAI 2025 Github
Navigating the Path of Writing: Outline-guided Text Generation with Large Language Models NAACL 2025 -
Skeleton-of-thought: Prompting llms for efficient parallel generation ICLR 2024 Github
SPRINT: Enabling Interleaved Planning and Parallelized Execution in Reasoning Models arxiv 2025 -

Multiple Token Prediction

Paper Venue Code
L-MTP: Leap Multi-Token Prediction Beyond Adjacent Context for Large Language Models arxiv 2025 -
On multi-token prediction for efficient LLM inference arxiv 2025 -
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads ICML 2024 Github
Multi-Token Prediction Needs Registers arxiv 2025 Github
Blockwise Parallel Decoding for Deep Autoregressive Models NeurIPS 2018 -
Pass: Parallel speculative sampling NeurIPS-ENLSP 2023 -
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty ICML 2024 Github
Your LLM Knows the Future: Uncovering Its Multi-Token Prediction Potential arxiv 2025 -
ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training EMNLP 2020 Github
Better & faster large language models via multi-token prediction ICML 2024 -
Deepseek-v3 technical report arxiv 2024 Github
MiMo: Unlocking the Reasoning Potential of Language Model--From Pretraining to Posttraining arxiv 2025 Github

Non-AR-Based

One-Shot Generation

Paper Venue Code
Non-autoregressive neural machine translation ICLR 2018 Github
End-to-end non-autoregressive neural machine translation with connectionist temporal classification EMNLP 2018
Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement EMNLP 2018 Github
Lava nat: A non-autoregressive translation model with look-around decoding and vocabulary attention arxiv 2025 -
AligNART: Non-autoregressive neural machine translation by jointly learning to estimate alignment and translate EMNLP 2021 -
Guiding non-autoregressive neural machine translation decoding with reordering information AAAI 2021 Github
Non-monotonic latent alignments for ctc-based non-autoregressive machine translation NeurIPS 2022 Github
DePA: Improving Non-autoregressive Machine Translation with Dependency-Aware Decoder ACL 2023 Github
Directed acyclic transformer for non-autoregressive machine translation ICML 2022 Github
Viterbi decoding of directed acyclic transformer for non-autoregressive machine translation EMNLP 2022 Github
Fully Non-autoregressive Neural Machine Translation: Tricks of the Trade ACL-IJCNLP 2021 -
Aligned cross entropy for non-autoregressive machine translation ICML 2020 Github
ngram-OAXE: Phrase-based order-agnostic cross entropy for non-autoregressive machine translation COLING 2022 Github
Multi-granularity optimization for non-autoregressive translation EMNLP 2022 Github
One reference is not enough: Diverse distillation with reference selection for non-autoregressive translation NAACL 2022 Github

Masked Generation

Paper Venue Code
Accelerating Large Language Model Decoding with Speculative Sampling arxiv 2023 Github
Masked diffusion models are secretly time-agnostic masked models and exploit inaccurate categorical sampling ICLR 2025 -
A continuous time framework for discrete denoising models NeurIPS 2022 Github
Discrete diffusion modeling by estimating the ratios of the data distribution ICML 2024 Github
Simplified and generalized masked diffusion for discrete data NeurIPS 2024 Github
Seed Diffusion arxiv 2025 -
Target concrete score matching: A holistic framework for discrete diffusion ICML 2025 -
Discrete diffusion modeling by estimating the ratios of the data distribution ICML 2024 Github
Score-based continuous-time discrete diffusion models ICLR 2023 -
Fast-dllm: Training-free acceleration of diffusion llm by enabling kv cache and parallel decoding arxiv 2025 Github
Large language diffusion models ICLR 2025 Github
Beyond autoregression: Discrete diffusion for complex reasoning and planning ICLR 2025 Github
A reparameterized discrete diffusion model for text generation COLM 2024 Github
Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions ICML 2025 -
Accelerated Sampling from Masked Diffusion Models via Entropy Bounded Unmasking arxiv 2025 -
Accelerating Diffusion Large Language Models with SlowFast: The Three Golden Principles arxiv 2025 Github
A continuous time framework for discrete denoising models NeurIPS 2022 Github
Remasking discrete diffusion models with inference-time scaling ICLR 2025 Github
Simplified and generalized masked diffusion for discrete data NeurIPS 2024 Github
Path planning for masked diffusion model sampling arxiv 2025 Github
Think while you generate: Discrete diffusion with planned denoising ICLR 2025 Github
Accelerating Diffusion LLMs via Adaptive Parallel Decoding arxiv 2025 -
Reviving any-subset autoregressive models with principled parallel sampling and speculative decoding arxiv 2025 Github
dkv-cache: The cache for diffusion language models arxiv 2025 Github
Accelerating diffusion language model inference via efficient kv caching and guided diffusion arxiv 2025 -
Esoteric Language Models arxiv 2025 Github
Beyond Autoregression: Fast LLMs via Self-Distillation Through Time ICLR 2025 -
Cllms: Consistency large language models ICML 2024 Github
The diffusion duality ICML 2025 Github
d1: Scaling reasoning in diffusion large language models via reinforcement learning arxiv 2025 Github
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models AAAI 2025 Github
DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation arxiv 2025 Github
Scaling diffusion language models via adaptation from autoregressive models ICLR 2025 Github
Dream 7B arxiv 2025 Github
DIFFPO: Diffusion-styled Preference Optimization for Efficient Inference-Time Alignment of Large Language Models ACL 2025 Github
Wide-In, Narrow-Out: Revokable Decoding for Efficient and Effective DLLMs arxiv 2025 Github

Edit-Based Refinement

Paper Venue Code
Insertion transformer: Flexible sequence generation via insertion operations ICML 2019 -
Levenshtein transformer NeurIPS 2019 Github
EDITOR: An edit-based transformer with repositioning for neural machine translation with soft lexical constraints TACL 2021 Github
FELIX: Flexible Text Editing Through Tagging and Insertion EMNLP 2020 -
Levenshtein OCR ECCV 2022 Github
FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition NeurIPS 2021 Github
Non-autoregressive Text Editing with Copy-aware Latent Alignments EMNLP 2023 Github
Reinforcement Learning for Edit-Based Non-Autoregressive Neural Machine Translation NAACL-SRW 2024 -
Summarizing Like Human: Edit-Based Text Summarization with Keywords ICANN 2024 -
Deterministic non-autoregressive neural sequence modeling by iterative refinement EMNLP 2018 Github
Flowseq: Non-autoregressive conditional sequence generation with generative flow EMNLP 2019 Github
Latent-variable non-autoregressive neural machine translation with deterministic inference using a delta posterior AAAI 2020 Github
Iterative Refinement in the Continuous Space for Non-Autoregressive Neural Machine Translation EMNLP 2020 Github
Non-autoregressive machine translation with auxiliary regularization AAAI 2019 -
Imitation learning for non-autoregressive neural machine translation ACL 2019 -
An imitation learning curriculum for text editing with non-autoregressive models ACL 2022 Github
Fast structured decoding for sequence models NeurIPS 2019 Github
An EM approach to non-autoregressive conditional sequence generation ICML 2020 -
Imputer: Sequence modelling via imputation and dynamic programming ICML 2020 Github
Align-Refine: Non-Autoregressive Speech Recognition via Iterative Realignment NAACL 2021 Github
Learning to rewrite for non-autoregressive neural machine translation EMNLP 2021 Github
RenewNAT: renewing potential translation for non-autoregressive transformer AAAI 2023 -
Learning to recover from multi-modality errors for non-autoregressive neural machine translation ACL 2020 Github
Hybrid-regressive neural machine translation ICLR 2023 -
Iterative Translation Refinement with Large Language Models EAMT 2024 -
IterGen: Iterative Semantic-aware Structured LLM Generation with Backtracking ICLR 2025 Github
Rejuvenating low-frequency words: Making the most of parallel data in non-autoregressive translation ACL 2021 Github
Understanding and Improving Lexical Choice in Non-Autoregressive Translation ICLR 2021 Github
SlotRefine: A fast non-autoregressive model for joint intent detection and slot filling EMNLP 2020 Github
Non-autoregressive dialog state tracking ICLR 2020 Github

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published