Skip to content

research(tools): AgentErrorTaxonomy — 5-domain failure classification + domain-aware recovery reduces cascading errors (arXiv:2509.25370) #2253

@bug-ops

Description

@bug-ops

Source

arXiv:2509.25370 — Where LLM Agents Fail and How They Can Learn from Failures (Sep 2025)
https://arxiv.org/abs/2509.25370

Summary

The paper taxonomizes LLM agent failures into 5 domains: memory (retrieval miss), reflection (incorrect self-assessment), planning (wrong sub-task decomposition), action (parameter error, tool misuse), system (infrastructure/rate-limit). Key insight: a single root-cause error propagates through all subsequent steps. AgentDebug isolates the root domain and delivers targeted corrective feedback. Result: +24% complete-task accuracy and up to +26% relative performance on ALFWorld, GAIA, WebShop.

Relevance to Zeph

Zeph has a ToolErrorCategory taxonomy in zeph-tools (permanent/transient/policy/etc.) and transient retry logic (Phase 2). But retry is currently domain-blind — it retries at transport level without classifying whether the failure is:

  • Memory → re-query with broader scope / different index
  • Planning → request LLM to re-plan the sub-task
  • Action → retry with corrected parameters (self-reflection path)
  • System → exponential backoff retry

The 5-domain taxonomy maps directly onto Zeph's architecture, and domain-aware recovery would prevent cascading chain-of-failure across multi-turn tool sessions.

Implementation Sketch

  • Add ErrorDomain enum {Memory, Reflection, Planning, Action, System} to zeph-tools
  • Add ToolExecutor::classify_error_domain(&self, err: &ToolError, context: &[Message]) -> ErrorDomain
  • In CompositeExecutor, after classification, route to domain-specific recovery handler:
    • Memory → call memory_search with broader query
    • Action → trigger self-reflection path (already exists in core)
    • System → Phase 2 transient retry (already exists)
    • Planning → inject re-plan request into agent loop
  • Lightweight: classification can be a regex/heuristic first pass, LLM only for ambiguous cases

Complexity

MEDIUM — domain enum + classifier + recovery dispatch; fits CompositeExecutor audit chain

Component

zeph-tools (CompositeExecutor, ToolError), zeph-core (agent retry path, self-reflection)

Metadata

Metadata

Assignees

Labels

P2High value, medium complexityresearchResearch-driven improvement

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions