-
Notifications
You must be signed in to change notification settings - Fork 2
research(reliability): Characterizing Faults in Agentic AI — empirical taxonomy of 385 faults across 40 OSS agent repos (arXiv:2603.06847) #2206
Copy link
Copy link
Closed
Labels
P2High value, medium complexityHigh value, medium complexityenhancementNew feature or requestNew feature or requestresearchResearch-driven improvementResearch-driven improvement
Description
Summary
arXiv:2603.06847 — submitted 6 March 2026.
Empirical grounded-theory study of 385 faults sampled from 13,602 GitHub issues across 40 open-source agentic AI repos; derives taxonomies of fault types, observable symptoms, and root causes specific to tool-invocation and long-horizon task execution.
Applicability to Zeph
HIGH — Direct complement to #2203 (12-category taxonomy, arXiv:2601.16280): that paper is dataset-derived and taxonomic; this one is empirically grounded in real OSS agent repos. The symptom/root-cause linkage is directly usable for designing Zeph's error-handler dispatch logic and retry signal classification in zeph-tools.
Implementation Sketch
- Map the paper's symptom taxonomy to existing
ToolErrorvariants inzeph-tools - Use root-cause linkage to inform retry strategy in the error taxonomy design from research(reliability): 12-category tool invocation error taxonomy for targeted retry/fallback strategies (arXiv:2601.16280) #2203
- Potential: structured error classification at the
ShellExecutorandWebScrapeExecutorlevel
References
- https://arxiv.org/abs/2603.06847
- Related: research(reliability): 12-category tool invocation error taxonomy for targeted retry/fallback strategies (arXiv:2601.16280) #2203 (12-category taxonomy), research(reliability): AgentDebug — structured corrective feedback on tool failures to prevent context corruption (arXiv:2509.25370) #2199 (AgentDebug structured feedback)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
P2High value, medium complexityHigh value, medium complexityenhancementNew feature or requestNew feature or requestresearchResearch-driven improvementResearch-driven improvement