research(reliability): 12-category tool invocation error taxonomy for targeted retry/fallback strategies (arXiv:2601.16280)

## Source
arXiv:2601.16280 — "When Agents Fail to Act: A Diagnostic Framework for Tool Invocation Reliability in Multi-Agent LLM Systems" (January 2026)

## Key Finding
Introduces a 12-category error taxonomy (tool initialization, parameter handling, execution, result interpretation) tested across 1,980 instances. Mid-sized models reach 96.6% tool invocation success. Tool initialization failures are the primary reliability bottleneck for smaller models.

## Applicability to Zeph
Instead of generic error forwarding, `zeph-tools` could classify tool failures by category and apply category-specific strategies:
- **Initialization failures** (bad schema, missing tool): return error immediately, no retry
- **Parameter failures** (invalid args): ask LLM to reformat args, retry once
- **Execution failures** (403, 404 permanent): inject structured tool_result with error, mark as permanent
- **Execution failures** (429, 5xx transient): retry with exponential backoff, deliver tool_result on final attempt

This taxonomy directly feeds into the fix design for #2197 — the permanent error path should always deliver a proper `tool_result` block, not fall through to `attempt_self_reflection`.

## Priority
P2 — design input for #2197 fix and broader tool error handling hardening

## References
- https://arxiv.org/abs/2601.16280
- #2197, #2199

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

research(reliability): 12-category tool invocation error taxonomy for targeted retry/fallback strategies (arXiv:2601.16280) #2203

Source

Key Finding

Applicability to Zeph

Priority

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

research(reliability): 12-category tool invocation error taxonomy for targeted retry/fallback strategies (arXiv:2601.16280) #2203

Description

Source

Key Finding

Applicability to Zeph

Priority

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions