Skip to content

research(skills): single-agent skill library phase transition — semantic confusability drives matcher degradation (arXiv:2601.04748) #2268

@bug-ops

Description

@bug-ops

Summary

arXiv:2601.04748 (Jan 2026) — investigates whether a single agent selecting from a skill library can replicate multi-agent coordination at lower cost. Key finding: skill selection accuracy is stable up to a critical library size, then drops sharply (phase transition), driven by semantic confusability among similar skills — not library size alone. Proposes hierarchical skill organization (analogous to cognitive chunking) as a mitigation. Provides practical guidelines for scalable skill-based agent design.

Applicability to Zeph

Directly relevant to zeph-skills embedding matcher. As Zeph's skill library grows via hot-reload and self-learning evolution, this paper explains the degradation the embedding matcher will encounter:

  1. Phase transition risk: currently ~25 bundled skills in ~/.config/zeph/skills/. The transition point is library/confusability dependent — need to track matcher accuracy as library grows
  2. Confusability metric: similar skill descriptions (e.g. web-scrape vs web-search vs fetch, json-yaml vs regex) cause disambiguation failures — SKILL.md descriptions should maximize semantic distance between co-resident skills
  3. Hierarchical grouping mitigation: add a category field to SKILL.md YAML frontmatter; implement a two-stage lookup: category → skill within category

Implementation Sketch

  • SHORT (LOW): add category to SKILL.md frontmatter spec (e.g., data, system, web, dev); surface in /skill list output
  • MEDIUM: two-stage matcher — coarse category embedding first, then fine-grained within-category match
  • MONITORING: add a confusability score to skill registry diagnostics — track inter-skill cosine similarity; alert when any pair exceeds 0.85

Related

Complements #2261 (SkillsBench) — SkillsBench shows curated skills outperform self-generated; this paper explains the mechanism (confusability) at scale.

Source: https://arxiv.org/abs/2601.04748

Metadata

Metadata

Assignees

Labels

P2High value, medium complexityresearchResearch-driven improvementskillszeph-skills crate

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions