DiffuAgent

The Bitter Lesson of Diffusion Language Models for Agentic Workflows: A Comprehensive Reality Check

Qingyu Lu^1,3, Liang Ding², Kanjian Zhang², Jinxia Zhang¹, Dacheng Tao³

¹Southeast University, China | ²Alibaba | ³Nanyang Technological University, Singapore

TL;DR

Efficiency ≠ Agentic Effectiveness. Despite low latency, diffusion-based LLMs (dLLMs) fail to serve as reliable agent backbones in both long-horizon embodied tasks and precision-critical tool-calling scenarios.
Systematic Agentic Failures. dLLMs exhibit characteristic failure modes, including retry loops under temporal feedback and loss of symbolic precision (e.g., malformed JSON) under diffusion noise.
DiffuAgent Framework. We introduce DiffuAgent, a unified and modular framework for evaluating dLLMs across embodied and tool-calling agentic workflows.
Where dLLMs Work. dLLMs remain effective in non-causal auxiliary roles (e.g., memory summarization and tool selection), but require causal and logically grounded mechanisms to function as full agent backbones.

Failure Cases of dLLMs in Agentic Workflows

In Embodied settings, dLLMs suffer repeated attempts (retry loops), failing to branch under temporal feedback.
In Tool-Calling settings, dLLMs fail to maintain symbolic precision (e.g., strict JSON schemas) under diffusion noise.

Failure of dLLMs as Agent Backbones

We compare dLLMs and autoregressive LLMs on embodied (AgentBoard) and tool-calling (BFCL) benchmarks. The results show that dLLMs lag behind on both success/progress and tool-calling accuracy.

Systematic Failure Modes of dLLMs

(a) Failure of Replan for embodied agents: dLLMs exhibit significantly more frequent retry loops than LLMs.

(b) Failure of Precision for tool-calling agents: dLLMs are more prone to produce malformed JSON schemas.

(c) Performance-Efficiency Trade-offs: despite higher inference efficiency, dLLMs do not guarantee comparable agentic performance to autoregressive LLMs.

DiffuAgent: Framework on Analyzing Agentic Behaviors in dLLMs

To better understand the agentic potential of dLLMs, we introduce DiffuAgent, a novel evaluation framework that treats dLLMs as plug-and-play cognitive modules for augmenting LLM agents.

Framework Components

For embodied agents, we introduce a memory-augmented module for history compression and an early-exit verifier for global trajectory checking.
For tool-calling agents, we include a tool selector over the library of available tools, and a JSON format editor.

Quick Start

For detailed installation and setup instructions:

AgentBoard: installation.md | Original Repo
BFCL: installation.md | Original Repo

Note: Please refer to the original repositories for detailed environment requirements.

Note: Our BFCL experiments have been extended to v4. To reproduce v3 experiments, please use the v3 codebase.

Note: We used Claude Code for automatic code optimization, which passed preliminary testing. If you encounter any issues during use, please contact us.

Analysis of Agentic Behaviors in dLLMs

Memory Augmentation

dLLMs are competitive memory modules for memory-augmented agents.

Early Exit Verification

LLM Verifiers tend to trigger premature early exits, whereas dLLMs terminate more reliably.

Tool-Calling Analysis

dLLMs are effective tool selectors but struggle as tool-call editors.

Citation

@article{lu2026diffuagent,
  title   = {The Bitter Lesson of Diffusion Language Models for Agentic Workflows: A Comprehensive Reality Check},
  author  = {Lu, Qingyu and Ding, Liang and Zhang, Kanjian and Zhang, Jinxia and Tao, Dacheng},
  journal = {arXiv preprint},
  year    = {2026},
  url     = {https://arxiv.org/pdf/2601.12979}
}

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
DiffuAgent		DiffuAgent
assets/images		assets/images
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches