Skip to content

Coldmist-Lu/DiffuAgent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 

Repository files navigation

DiffuAgent Icon DiffuAgent

The Bitter Lesson of Diffusion Language Models for Agentic Workflows: A Comprehensive Reality Check

Qingyu Lu1,3, Liang Ding2, Kanjian Zhang2, Jinxia Zhang1, Dacheng Tao3

1Southeast University, China  |  2Alibaba  |  3Nanyang Technological University, Singapore

Paper Code

TL;DR

  • Efficiency ≠ Agentic Effectiveness. Despite low latency, diffusion-based LLMs (dLLMs) fail to serve as reliable agent backbones in both long-horizon embodied tasks and precision-critical tool-calling scenarios.

  • Systematic Agentic Failures. dLLMs exhibit characteristic failure modes, including retry loops under temporal feedback and loss of symbolic precision (e.g., malformed JSON) under diffusion noise.

  • DiffuAgent Framework. We introduce DiffuAgent, a unified and modular framework for evaluating dLLMs across embodied and tool-calling agentic workflows.

  • Where dLLMs Work. dLLMs remain effective in non-causal auxiliary roles (e.g., memory summarization and tool selection), but require causal and logically grounded mechanisms to function as full agent backbones.

Failure Cases of dLLMs in Agentic Workflows

DLLM Failure Modes

  • In Embodied settings, dLLMs suffer repeated attempts (retry loops), failing to branch under temporal feedback.
  • In Tool-Calling settings, dLLMs fail to maintain symbolic precision (e.g., strict JSON schemas) under diffusion noise.

Failure of dLLMs as Agent Backbones

Failure Tables

We compare dLLMs and autoregressive LLMs on embodied (AgentBoard) and tool-calling (BFCL) benchmarks. The results show that dLLMs lag behind on both success/progress and tool-calling accuracy.

Systematic Failure Modes of dLLMs

Failure Analysis

(a) Failure of Replan for embodied agents: dLLMs exhibit significantly more frequent retry loops than LLMs.

(b) Failure of Precision for tool-calling agents: dLLMs are more prone to produce malformed JSON schemas.

(c) Performance-Efficiency Trade-offs: despite higher inference efficiency, dLLMs do not guarantee comparable agentic performance to autoregressive LLMs.

DiffuAgent: Framework on Analyzing Agentic Behaviors in dLLMs

DiffuAgent Framework

To better understand the agentic potential of dLLMs, we introduce DiffuAgent, a novel evaluation framework that treats dLLMs as plug-and-play cognitive modules for augmenting LLM agents.

Framework Components

  • For embodied agents, we introduce a memory-augmented module for history compression and an early-exit verifier for global trajectory checking.

  • For tool-calling agents, we include a tool selector over the library of available tools, and a JSON format editor.

Quick Start

For detailed installation and setup instructions:

Note: Please refer to the original repositories for detailed environment requirements.

Note: Our BFCL experiments have been extended to v4. To reproduce v3 experiments, please use the v3 codebase.

Note: We used Claude Code for automatic code optimization, which passed preliminary testing. If you encounter any issues during use, please contact us.

Analysis of Agentic Behaviors in dLLMs

Memory Augmentation

Memory Analysis

dLLMs are competitive memory modules for memory-augmented agents.

Early Exit Verification

Early Exit Analysis

LLM Verifiers tend to trigger premature early exits, whereas dLLMs terminate more reliably.

Tool-Calling Analysis

Tool Calling Analysis

dLLMs are effective tool selectors but struggle as tool-call editors.

Citation

@article{lu2026diffuagent,
  title   = {The Bitter Lesson of Diffusion Language Models for Agentic Workflows: A Comprehensive Reality Check},
  author  = {Lu, Qingyu and Ding, Liang and Zhang, Kanjian and Zhang, Jinxia and Tao, Dacheng},
  journal = {arXiv preprint},
  year    = {2026},
  url     = {https://arxiv.org/pdf/2601.12979}
}

© 2026 DiffuAgent

About

[DiffuAgent] The Bitter Lesson of Diffusion Language Models for Agentic Workflows

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors