-
Notifications
You must be signed in to change notification settings - Fork 2
research(reliability): 12-category tool invocation error taxonomy for targeted retry/fallback strategies (arXiv:2601.16280) #2203
Description
Source
arXiv:2601.16280 — "When Agents Fail to Act: A Diagnostic Framework for Tool Invocation Reliability in Multi-Agent LLM Systems" (January 2026)
Key Finding
Introduces a 12-category error taxonomy (tool initialization, parameter handling, execution, result interpretation) tested across 1,980 instances. Mid-sized models reach 96.6% tool invocation success. Tool initialization failures are the primary reliability bottleneck for smaller models.
Applicability to Zeph
Instead of generic error forwarding, zeph-tools could classify tool failures by category and apply category-specific strategies:
- Initialization failures (bad schema, missing tool): return error immediately, no retry
- Parameter failures (invalid args): ask LLM to reformat args, retry once
- Execution failures (403, 404 permanent): inject structured tool_result with error, mark as permanent
- Execution failures (429, 5xx transient): retry with exponential backoff, deliver tool_result on final attempt
This taxonomy directly feeds into the fix design for #2197 — the permanent error path should always deliver a proper tool_result block, not fall through to attempt_self_reflection.
Priority
P2 — design input for #2197 fix and broader tool error handling hardening
References
- https://arxiv.org/abs/2601.16280
- bug(tools): parallel tool call with permanent error drops tool_result, causes 400 Bad Request on next LLM turn #2197, research(reliability): AgentDebug — structured corrective feedback on tool failures to prevent context corruption (arXiv:2509.25370) #2199