feat(skills): semantic confusability mitigation — category grouping, two-stage matching (#2268) by bug-ops · Pull Request #2402 · bug-ops/zeph

bug-ops · 2026-03-29T23:54:10Z

Summary

Adds category field to SKILL.md frontmatter (strict validation: 1-32 chars, [a-z0-9-], no leading/trailing/consecutive hyphens)
Categorizes all 26 bundled skills across web, data, dev, system
Implements CategoryMatcher for two-stage matching: Stage 1 selects top-2 categories by centroid similarity, Stage 2 does fine-grained matching within the candidate pool — mitigates phase-transition degradation as library grows
Adds SkillMatcher::confusability_report() — O(n²) pairwise cosine similarity offloaded to blocking thread, with configurable threshold
Adds /skills confusability slash command surfacing confusable pairs (disabled by default: confusability_threshold = 0.0)
Groups /skills output by category in alphabetical order via BTreeMap
Wires two_stage_matching and confusability_threshold config fields in [skills] TOML section
Fixes three pre-existing clippy warnings in bandit.rs (binding shadowing, map_or, useless vec!)
Adds 20 new tests: validate_category validation boundaries, CategoryMatcher build/is_useful/candidate_positions, ConfusabilityReport::Display excluded skills branch, two-stage correctness and result count invariant

Motivation

arXiv:2601.04748 shows skill selection accuracy degrades sharply past a critical library size due to semantic confusability — not library size alone. This PR adds the three mitigations described in issue #2268: category field (SHORT), two-stage matching (MEDIUM), and confusability monitoring (MONITORING).

Test plan

cargo nextest run --workspace --lib --bins — 6705/6705 passed
cargo clippy --workspace --all-targets -- -D warnings — clean
cargo +nightly fmt --check — clean
Security audit: all 5 areas PASS (no injection, no prompt leakage, safe numerics)
Post-impl adversarial critique: minor verdict (no significant/critical gaps)
Validate /skills shows grouped output with category headers
Validate /skills confusability works when confusability_threshold > 0 in config

…two-stage matching, confusability report (#2268) - Add `category` field to SKILL.md frontmatter spec with strict validation (1-32 chars, lowercase alnum+hyphens) - Categorize all 26 bundled skills across web/data/dev/system groups - Implement `CategoryMatcher` for two-stage matching: coarse category embedding first, then fine-grained within-category - Add `SkillMatcher::confusability_report()` — O(n²) pairwise cosine similarity with configurable threshold - Add `/skills confusability` slash command surfacing pairs above threshold - Group `/skills` output by category in BTreeMap alphabetical order - Wire `two_stage_matching` and `confusability_threshold` config fields in `SkillsConfig` - Fix three pre-existing clippy warnings in `bandit.rs` (binding shadowing, map_or, useless vec!) - Add 20 new tests: validate_category boundaries, CategoryMatcher behavior, ConfusabilityReport display, two-stage correctness

github-actions bot added documentation Improvements or additions to documentation llm zeph-llm crate (Ollama, Claude) skills zeph-skills crate rust Rust code changes core zeph-core crate enhancement New feature or request size/XL Extra large PR (500+ lines) labels Mar 29, 2026

bug-ops enabled auto-merge (squash) March 29, 2026 23:54

bug-ops linked an issue Mar 29, 2026 that may be closed by this pull request

research(skills): single-agent skill library phase transition — semantic confusability drives matcher degradation (arXiv:2601.04748) #2268

Closed

bug-ops merged commit c68c5de into main Mar 30, 2026
27 checks passed

bug-ops deleted the skill-confusability branch March 30, 2026 00:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(skills): semantic confusability mitigation — category grouping, two-stage matching (#2268)#2402

feat(skills): semantic confusability mitigation — category grouping, two-stage matching (#2268)#2402
bug-ops merged 1 commit intomainfrom
skill-confusability

bug-ops commented Mar 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bug-ops commented Mar 29, 2026

Summary

Motivation

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant