Skip to content

feat(skills): semantic confusability mitigation — category grouping, two-stage matching (#2268)#2402

Merged
bug-ops merged 1 commit intomainfrom
skill-confusability
Mar 30, 2026
Merged

feat(skills): semantic confusability mitigation — category grouping, two-stage matching (#2268)#2402
bug-ops merged 1 commit intomainfrom
skill-confusability

Conversation

@bug-ops
Copy link
Copy Markdown
Owner

@bug-ops bug-ops commented Mar 29, 2026

Summary

  • Adds category field to SKILL.md frontmatter (strict validation: 1-32 chars, [a-z0-9-], no leading/trailing/consecutive hyphens)
  • Categorizes all 26 bundled skills across web, data, dev, system
  • Implements CategoryMatcher for two-stage matching: Stage 1 selects top-2 categories by centroid similarity, Stage 2 does fine-grained matching within the candidate pool — mitigates phase-transition degradation as library grows
  • Adds SkillMatcher::confusability_report() — O(n²) pairwise cosine similarity offloaded to blocking thread, with configurable threshold
  • Adds /skills confusability slash command surfacing confusable pairs (disabled by default: confusability_threshold = 0.0)
  • Groups /skills output by category in alphabetical order via BTreeMap
  • Wires two_stage_matching and confusability_threshold config fields in [skills] TOML section
  • Fixes three pre-existing clippy warnings in bandit.rs (binding shadowing, map_or, useless vec!)
  • Adds 20 new tests: validate_category validation boundaries, CategoryMatcher build/is_useful/candidate_positions, ConfusabilityReport::Display excluded skills branch, two-stage correctness and result count invariant

Motivation

arXiv:2601.04748 shows skill selection accuracy degrades sharply past a critical library size due to semantic confusability — not library size alone. This PR adds the three mitigations described in issue #2268: category field (SHORT), two-stage matching (MEDIUM), and confusability monitoring (MONITORING).

Test plan

  • cargo nextest run --workspace --lib --bins — 6705/6705 passed
  • cargo clippy --workspace --all-targets -- -D warnings — clean
  • cargo +nightly fmt --check — clean
  • Security audit: all 5 areas PASS (no injection, no prompt leakage, safe numerics)
  • Post-impl adversarial critique: minor verdict (no significant/critical gaps)
  • Validate /skills shows grouped output with category headers
  • Validate /skills confusability works when confusability_threshold > 0 in config

…two-stage matching, confusability report (#2268)

- Add `category` field to SKILL.md frontmatter spec with strict validation (1-32 chars, lowercase alnum+hyphens)
- Categorize all 26 bundled skills across web/data/dev/system groups
- Implement `CategoryMatcher` for two-stage matching: coarse category embedding first, then fine-grained within-category
- Add `SkillMatcher::confusability_report()` — O(n²) pairwise cosine similarity with configurable threshold
- Add `/skills confusability` slash command surfacing pairs above threshold
- Group `/skills` output by category in BTreeMap alphabetical order
- Wire `two_stage_matching` and `confusability_threshold` config fields in `SkillsConfig`
- Fix three pre-existing clippy warnings in `bandit.rs` (binding shadowing, map_or, useless vec!)
- Add 20 new tests: validate_category boundaries, CategoryMatcher behavior, ConfusabilityReport display, two-stage correctness
@github-actions github-actions bot added documentation Improvements or additions to documentation llm zeph-llm crate (Ollama, Claude) skills zeph-skills crate rust Rust code changes core zeph-core crate enhancement New feature or request size/XL Extra large PR (500+ lines) labels Mar 29, 2026
@bug-ops bug-ops enabled auto-merge (squash) March 29, 2026 23:54
@bug-ops bug-ops merged commit c68c5de into main Mar 30, 2026
27 checks passed
@bug-ops bug-ops deleted the skill-confusability branch March 30, 2026 00:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core zeph-core crate documentation Improvements or additions to documentation enhancement New feature or request llm zeph-llm crate (Ollama, Claude) rust Rust code changes size/XL Extra large PR (500+ lines) skills zeph-skills crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

research(skills): single-agent skill library phase transition — semantic confusability drives matcher degradation (arXiv:2601.04748)

1 participant