Skip to content

research(reliability): 12-category tool invocation error taxonomy for targeted retry/fallback strategies (arXiv:2601.16280) #2203

@bug-ops

Description

@bug-ops

Source

arXiv:2601.16280 — "When Agents Fail to Act: A Diagnostic Framework for Tool Invocation Reliability in Multi-Agent LLM Systems" (January 2026)

Key Finding

Introduces a 12-category error taxonomy (tool initialization, parameter handling, execution, result interpretation) tested across 1,980 instances. Mid-sized models reach 96.6% tool invocation success. Tool initialization failures are the primary reliability bottleneck for smaller models.

Applicability to Zeph

Instead of generic error forwarding, zeph-tools could classify tool failures by category and apply category-specific strategies:

  • Initialization failures (bad schema, missing tool): return error immediately, no retry
  • Parameter failures (invalid args): ask LLM to reformat args, retry once
  • Execution failures (403, 404 permanent): inject structured tool_result with error, mark as permanent
  • Execution failures (429, 5xx transient): retry with exponential backoff, deliver tool_result on final attempt

This taxonomy directly feeds into the fix design for #2197 — the permanent error path should always deliver a proper tool_result block, not fall through to attempt_self_reflection.

Priority

P2 — design input for #2197 fix and broader tool error handling hardening

References

Metadata

Metadata

Assignees

Labels

P2High value, medium complexityenhancementNew feature or requestresearchResearch-driven improvement

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions