research: tool invocation reliability taxonomy — 12 categories, model-size threshold for reliable tool use (arXiv:2601.16280)

## Source

arXiv:2601.16280 — "When Agents Fail to Act: A Diagnostic Framework for Tool Invocation Reliability in Multi-Agent LLM Systems"

## Summary

Introduces a 12-category diagnostic framework for tool invocation failures across setup, parameter handling, execution, and result interpretation phases. Benchmarks 1,980 scenarios across GPT-4, Claude, and Qwen2.5. Key finding: mid-sized models (qwen2.5:14b) achieve 96.6% tool success rate at the best cost/reliability tradeoff.

## Applicability to Zeph

**HIGH** — `zeph-tools` ToolExecutor, audit logging, and tool error taxonomy.

Zeph already has a `ToolErrorCategory` enum (PR #2214) with 12 categories. This paper provides empirical grounding:
- Cross-validate Zeph's 12-category taxonomy against the paper's framework
- The model-size threshold finding is actionable for routing tool-heavy tasks: prefer qwen2.5:14b-equivalent models for reliability-critical tool calls
- Setup/parameter handling failures map to Zeph's `InvalidInput`/`SchemaError` categories

## Implementation Direction

- Annotate Zeph's `ToolErrorCategory` with paper's phase labels (setup/param/execution/result)
- Add phase-level error metrics to `[tools.audit]` output
- Use model-size findings to configure `[orchestration]` tool-heavy task provider selection

**Priority**: P2  
**Discovered**: CI-211 research scan (2026-03-27)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

research: tool invocation reliability taxonomy — 12 categories, model-size threshold for reliable tool use (arXiv:2601.16280) #2234

Source

Summary

Applicability to Zeph

Implementation Direction

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

research: tool invocation reliability taxonomy — 12 categories, model-size threshold for reliable tool use (arXiv:2601.16280) #2234

Description

Source

Summary

Applicability to Zeph

Implementation Direction

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions