Skip to content

research(reliability): AgentDebug — structured corrective feedback on tool failures to prevent context corruption (arXiv:2509.25370) #2199

@bug-ops

Description

@bug-ops

Source

arXiv:2509.25370 — "Where LLM Agents Fail and How They can Learn From Failures" (September 2025)

Key Finding

Proposes AgentErrorTaxonomy (failure modes across memory, reflection, planning, action, system operations) and AgentDebug, which isolates root causes and generates targeted corrective feedback. Achieves up to 26% relative improvement in task success across ALFWorld, GAIA, and WebShop benchmarks.

Applicability to Zeph

The corrective feedback loop is directly applicable to issue #2197. When a tool returns 403/500, instead of propagating raw error text into the conversation (which corrupts the tool_call/tool_result message cycle), the agent could classify the failure and inject a structured correction message at the point of error.

Concretely:

  • Classify errors at ToolExecutor layer in zeph-tools (network, auth, server, timeout)
  • For permanent errors (403, 404): inject structured tool_result with error classification, NOT the raw attempt_self_reflection path
  • For transient errors (429, 5xx): retry with exponential backoff, still deliver proper tool_result

This is the architectural fix direction for #2197 — ensure every tool_call_id always receives a corresponding tool_result, even for failed executions.

Priority

P2 — directly relevant to critical bug #2197; provides architectural pattern for the fix

References

Metadata

Metadata

Assignees

Labels

P2High value, medium complexityenhancementNew feature or requestresearchResearch-driven improvement

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions