-
Notifications
You must be signed in to change notification settings - Fork 2
research(tools): AgentErrorTaxonomy — 5-domain failure classification + domain-aware recovery reduces cascading errors (arXiv:2509.25370) #2253
Description
Source
arXiv:2509.25370 — Where LLM Agents Fail and How They Can Learn from Failures (Sep 2025)
https://arxiv.org/abs/2509.25370
Summary
The paper taxonomizes LLM agent failures into 5 domains: memory (retrieval miss), reflection (incorrect self-assessment), planning (wrong sub-task decomposition), action (parameter error, tool misuse), system (infrastructure/rate-limit). Key insight: a single root-cause error propagates through all subsequent steps. AgentDebug isolates the root domain and delivers targeted corrective feedback. Result: +24% complete-task accuracy and up to +26% relative performance on ALFWorld, GAIA, WebShop.
Relevance to Zeph
Zeph has a ToolErrorCategory taxonomy in zeph-tools (permanent/transient/policy/etc.) and transient retry logic (Phase 2). But retry is currently domain-blind — it retries at transport level without classifying whether the failure is:
- Memory → re-query with broader scope / different index
- Planning → request LLM to re-plan the sub-task
- Action → retry with corrected parameters (self-reflection path)
- System → exponential backoff retry
The 5-domain taxonomy maps directly onto Zeph's architecture, and domain-aware recovery would prevent cascading chain-of-failure across multi-turn tool sessions.
Implementation Sketch
- Add
ErrorDomainenum {Memory, Reflection, Planning, Action, System} tozeph-tools - Add
ToolExecutor::classify_error_domain(&self, err: &ToolError, context: &[Message]) -> ErrorDomain - In
CompositeExecutor, after classification, route to domain-specific recovery handler:- Memory → call
memory_searchwith broader query - Action → trigger self-reflection path (already exists in core)
- System → Phase 2 transient retry (already exists)
- Planning → inject re-plan request into agent loop
- Memory → call
- Lightweight: classification can be a regex/heuristic first pass, LLM only for ambiguous cases
Complexity
MEDIUM — domain enum + classifier + recovery dispatch; fits CompositeExecutor audit chain
Component
zeph-tools (CompositeExecutor, ToolError), zeph-core (agent retry path, self-reflection)