fix(providers): Chinese keywords for skill matcher#88
Conversation
There was a problem hiding this comment.
Findings
-
[Major]
prototype/原型mapped to mobile group causes cross-skill false positives — prompts like "landing page prototype" now matchmobile-mockeven when no mobile intent is present, becauseprototypeis in the mobile bucket and matching is group-intersection based. Evidencepackages/providers/src/skill-injector.ts:187,packages/providers/src/skill-injector.ts:193,packages/providers/src/skill-injector.ts:242
Suggested fix:// keep mobile-specific intent in mobile group // remove generic prototype terms from mobile bucket // 'prototype', '原型' // if needed, handle prototype in a non-mobile broad UI group ['ui', 'interface', 'screen', 'prototype', '界面', '原型图', '设计稿']
-
[Major] Generic CJK aliases with raw
includes()over-match unrelated prompts — terms like分析/演示/应用are broad and currently trigger by substring, which can inject unrelated skills (quality regression in prompt routing). Evidencepackages/providers/src/skill-injector.ts:162,packages/providers/src/skill-injector.ts:192,packages/providers/src/skill-injector.ts:208,packages/providers/src/skill-injector.ts:221
Suggested fix:// prefer specific aliases; remove highly generic ones // '分析', '应用', '演示' // and keep lexical matching stricter for Latin tokens function hasKeyword(text: string, kw: string): boolean { return /[a-z]/i.test(kw) ? new RegExp(`\\b${kw.replace(/[.*+?^${}()|[\\]\\]/g, '\\$&')}\\b`, 'i').test(text) : text.includes(kw); }
-
[Minor] Tests cover positive Chinese matches but miss negative regression cases introduced by new aliases — no guard for "prototype" desktop/web prompts incorrectly matching
mobile-mock, and no guard for generic CJK substrings over-triggering. Evidencepackages/providers/src/skill-injector.test.ts:390,packages/providers/src/skill-injector.test.ts:397,packages/providers/src/skill-injector.test.ts:403
Suggested fix:it('does not match mobile for non-mobile prototype prompt', () => { const matched = matchSkillsToPrompt(all, 'Design a landing page prototype'); expect(matched.map((s) => s.id)).not.toContain('mobile-mock'); }); it('does not trigger deck on generic Chinese "演示" wording', () => { const matched = matchSkillsToPrompt(all, '请演示一下页面交互'); expect(matched.map((s) => s.id)).not.toContain('pitch-deck'); });
Summary
- Review mode: initial
- Found 2 major matching-regression risks and 1 test-gap issue in modified lines.
- Constraint checks on this diff: no new dependencies, no direct provider SDK imports, no hardcoded UI token violations in touched files, no silent-fallback changes in touched files.
docs/VISION.mdanddocs/PRINCIPLES.md: Not found in repo/docs in this checkout.
Testing
- Not run (automation):
pnpmis unavailable in this environment.
open-codesign Bot
| 'ios', | ||
| 'iphone', | ||
| 'android', | ||
| 'prototype', |
There was a problem hiding this comment.
[Major] prototype in the mobile group will match non-mobile prompts (e.g., landing-page prototype) and incorrectly route mobile-mock.
Suggested fix:
// remove from mobile group
// 'prototype'
// optionally place in broad UI group instead
['ui', 'interface', 'screen', 'prototype', '界面', '原型图', '设计稿']| const group = SKILL_TRIGGER_GROUPS[i]; | ||
| if (!group) continue; | ||
| for (const kw of group) { | ||
| if (lower.includes(kw)) { |
There was a problem hiding this comment.
[Major] Raw includes() with broad aliases (分析/应用/演示) can over-trigger unrelated skills due to substring matches.
Suggested fix:
function hasKeyword(text: string, kw: string): boolean {
return /[a-z]/i.test(kw)
? new RegExp(`\\b${kw.replace(/[.*+?^${}()|[\\]\\]/g, '\\$&')}\\b`, 'i').test(text)
: text.includes(kw);
}| expect(ids).toContain('mobile-mock'); | ||
| }); | ||
|
|
||
| it('matches Chinese dashboard prompt — "做一个数据看板"', () => { |
There was a problem hiding this comment.
[Minor] Positive-case coverage is good, but this suite misses negative guards for new alias regressions (non-mobile prototype, generic Chinese substrings).
Suggested fix:
it('does not match mobile for non-mobile prototype prompt', () => {
const matched = matchSkillsToPrompt(all, 'Design a landing page prototype');
expect(matched.map((s) => s.id)).not.toContain('mobile-mock');
});Codex review on PR #88 [Major]: bucketing 'prototype' and '原型' under mobile false-fired mobile-mock for "landing page prototype" and "做一个产品落地页的原型". The words are generic UI vocabulary, not mobile-specific, and they cannot safely live in UI-broad either — mobile-mock's own description hits UI-broad via 'screen', so any cross-bucket placement still false-fires. Removed them from the mobile group entirely. mobile-mock still resolves via mobile/app/phone/ios/iphone/android/移动端/手机/移动应用/app设计/应用, so real mobile prompts keep matching. Adds 4 regression tests covering both the false-positive case (EN + 中文) and the true-positive case when a mobile-only token co-occurs (EN + 中文).
e678827 to
e3d1077
Compare
|
Addressed Codex [Major] finding on the mobile bucket false-positive. Root cause: Fix: removed both tokens from the mobile group and from any other group. Putting them in a UI-broad bucket would still false-fire mobile-mock, because mobile-mock's own description hits UI-broad via "screen" — the cross-bucket intersection re-introduces the same bug. The mobile-mock description still lands in the mobile bucket via mobile/app/phone, so real mobile prompts keep matching. Match table (after fix):
Added 4 regression tests covering both the false-positive case (EN + 中文) and the true-positive case where a mobile-only token co-occurs (EN + 中文). All 24 skill-injector tests + 288 desktop tests pass locally. Force-pushed. |
There was a problem hiding this comment.
Findings
-
[Major] Generic Chinese aliases still over-trigger unrelated skills via substring matching —
分析(dashboard group),应用(mobile group), and演示(deck group) are broad terms, so prompts with non-dashboard/non-mobile/non-deck intent can still route wrong skills. Evidencepackages/providers/src/skill-injector.ts:162,packages/providers/src/skill-injector.ts:197,packages/providers/src/skill-injector.ts:212,packages/providers/src/skill-injector.ts:225
Suggested fix:// keep intent-specific aliases only // remove: '分析', '应用', '演示' // examples of narrower replacements ['数据看板', '图表', '数据分析']; ['移动端', '移动应用', '手机', 'app设计']; ['演示文稿', '幻灯片', '路演', '提案'];
-
[Minor] New tests still miss negative guards for those broad aliases — the suite adds prototype regressions, but no assertions that generic
分析/应用/演示phrasing does not triggerdata-viz-recharts/mobile-mock/pitch-deck. Evidencepackages/providers/src/skill-injector.test.ts:390,packages/providers/src/skill-injector.test.ts:403,packages/providers/src/skill-injector.test.ts:434
Suggested fix:it('does not trigger data-viz on generic Chinese 分析 wording', () => { const ids = matchSkillsToPrompt(all, '帮我分析这个登录流程').map((s) => s.id); expect(ids).not.toContain('data-viz-recharts'); }); it('does not trigger mobile on generic Chinese 应用 wording', () => { const ids = matchSkillsToPrompt(all, '这个规范如何应用到网页').map((s) => s.id); expect(ids).not.toContain('mobile-mock'); }); it('does not trigger deck on generic Chinese 演示 wording', () => { const ids = matchSkillsToPrompt(all, '请演示一下按钮动效').map((s) => s.id); expect(ids).not.toContain('pitch-deck'); });
Summary
- Review mode: follow-up after new commits
- Found 1 major routing-regression risk and 1 related test-gap issue in modified lines.
- Constraint checks on this diff: no new dependencies, no direct provider SDK imports, no UI token hardcode changes, no silent-fallback behavior added in touched files.
docs/VISION.mdanddocs/PRINCIPLES.md: Not found in repo/docs (perCLAUDE.md, docs are gitignored internal materials).
Testing
- Not run (automation)
open-codesign Bot
| '数据看板', | ||
| '数据图', | ||
| '图表', | ||
| '分析', |
There was a problem hiding this comment.
[Major] This group still contains very broad aliases (分析, plus 应用 and 演示 in other groups) and matching is raw substring-based (includes). That can over-route skills for unrelated prompts.
Suggested fix:
// remove broad aliases
// '分析', '应用', '演示'
// keep narrower intent-specific terms
['数据看板', '图表', '数据分析'];
['移动端', '移动应用', '手机', 'app设计'];
['演示文稿', '幻灯片', '路演', '提案'];| expect(matched.map((s) => s.id)).toContain('mobile-mock'); | ||
| }); | ||
|
|
||
| it('still matches mobile-mock for Chinese mobile-only prompt — "做一个移动端原型"', () => { |
There was a problem hiding this comment.
[Minor] Prototype regressions are covered, but there are still no negative tests guarding broad Chinese aliases (分析/应用/演示) from false-positive routing.
Suggested fix:
it('does not trigger data-viz on generic Chinese 分析 wording', () => {
const ids = matchSkillsToPrompt(all, '帮我分析这个登录流程').map((s) => s.id);
expect(ids).not.toContain('data-viz-recharts');
});
it('does not trigger mobile on generic Chinese 应用 wording', () => {
const ids = matchSkillsToPrompt(all, '这个规范如何应用到网页').map((s) => s.id);
expect(ids).not.toContain('mobile-mock');
});
it('does not trigger deck on generic Chinese 演示 wording', () => {
const ids = matchSkillsToPrompt(all, '请演示一下按钮动效').map((s) => s.id);
expect(ids).not.toContain('pitch-deck');
});Bug: 中文 prompt 全部漏匹配 skill (dev log: skills:0). Refactor SKILL_TRIGGER_KEYWORDS into SKILL_TRIGGER_GROUPS so each concept (dashboard / landing / mobile / slides / UI) carries both English vocabulary from builtin skill descriptions and the Chinese aliases users actually type. Matching becomes group-id intersection, so '数据看板' resolves to the same bucket as 'dashboards, analytics'. Adds vitest cases covering Chinese prompts for dashboard, mobile, landing, and deck flows.
Codex review on PR #88 [Major]: bucketing 'prototype' and '原型' under mobile false-fired mobile-mock for "landing page prototype" and "做一个产品落地页的原型". The words are generic UI vocabulary, not mobile-specific, and they cannot safely live in UI-broad either — mobile-mock's own description hits UI-broad via 'screen', so any cross-bucket placement still false-fires. Removed them from the mobile group entirely. mobile-mock still resolves via mobile/app/phone/ios/iphone/android/移动端/手机/移动应用/app设计/应用, so real mobile prompts keep matching. Adds 4 regression tests covering both the false-positive case (EN + 中文) and the true-positive case when a mobile-only token co-occurs (EN + 中文).
…aliases Codex flagged that 分析 / 应用 / 演示 over-trigger via plain substring match — e.g. '分析这段文本' matches dashboard, 'Web 应用' matches mobile-mock, '演示一下功能' matches deck. Split each trigger group into strong + weak; broad aliases live in weak and never fire on their own. Adds regression tests for the 3 false positives plus 3 true-positive checks (数据看板 / iPhone app / PPT).
e3d1077 to
ff9b19d
Compare
|
Addressed Codex's [Major] finding on Refactored
|
There was a problem hiding this comment.
Findings
- No issues identified in added/modified lines with >=80% confidence.
Summary
- Review mode: follow-up after new commits
- No blocker/major/minor/nit issues found in the latest diff.
- Residual risk: skill routing remains lexical substring matching, so unseen multilingual phrasing may still produce edge-case false positives/negatives.
docs/VISION.mdanddocs/PRINCIPLES.md: Not found in repo/docs.
Testing
- Not run (automation)
open-codesign Bot
The previous keyword-intersection matcher computed an overlap between trigger keywords extracted from the user prompt and from each skill's description. Because all four builtin skills ship English-only descriptions, no Chinese prompt could ever produce an intersection — every non-English request landed with skills: 0 in the build_request log. PR #88 extended the keyword table with Chinese aliases, but the intersection still required matching tokens on both sides, so the bug persisted. Replace the matcher with progressive disclosure (level 1+2): every active skill body is unconditionally formatted into the system prompt under an "# Available Skills" header, and the model decides which one applies. This mirrors the pattern Claude Design / Claude Code use and removes the language gate entirely. Level-3 disclosure (lazy-load via SkillTool) is out of scope for v0.x. - Drop SKILL_TRIGGER_GROUPS, matchSkillsToPrompt, extractGroupIds. - Export formatSkillsForPrompt(skills) — no userPrompt argument. - Rename collectMatchedSkillBlobs -> collectAllSkillBlobs and add a step=load_skills.ok success log alongside the existing fail log. - composeSystemPrompt now wraps skill blobs with a model-friendly header. - Tests rewritten to assert language-independent loading and the new always-load contract.
The previous keyword-intersection matcher computed an overlap between trigger keywords extracted from the user prompt and from each skill's description. Because all four builtin skills ship English-only descriptions, no Chinese prompt could ever produce an intersection — every non-English request landed with skills: 0 in the build_request log. PR #88 extended the keyword table with Chinese aliases, but the intersection still required matching tokens on both sides, so the bug persisted. Replace the matcher with progressive disclosure (level 1+2): every active skill body is unconditionally formatted into the system prompt under an "# Available Skills" header, and the model decides which one applies. This mirrors the pattern Claude Design / Claude Code use and removes the language gate entirely. Level-3 disclosure (lazy-load via SkillTool) is out of scope for v0.x. - Drop SKILL_TRIGGER_GROUPS, matchSkillsToPrompt, extractGroupIds. - Export formatSkillsForPrompt(skills) — no userPrompt argument. - Rename collectMatchedSkillBlobs -> collectAllSkillBlobs and add a step=load_skills.ok success log alongside the existing fail log. - composeSystemPrompt now wraps skill blobs with a model-friendly header. - Tests rewritten to assert language-independent loading and the new always-load contract.
The previous keyword-intersection matcher computed an overlap between trigger keywords extracted from the user prompt and from each skill's description. Because all four builtin skills ship English-only descriptions, no Chinese prompt could ever produce an intersection — every non-English request landed with skills: 0 in the build_request log. PR #88 extended the keyword table with Chinese aliases, but the intersection still required matching tokens on both sides, so the bug persisted. Replace the matcher with progressive disclosure (level 1+2): every active skill body is unconditionally formatted into the system prompt under an "# Available Skills" header, and the model decides which one applies. This mirrors the pattern Claude Design / Claude Code use and removes the language gate entirely. Level-3 disclosure (lazy-load via SkillTool) is out of scope for v0.x. - Drop SKILL_TRIGGER_GROUPS, matchSkillsToPrompt, extractGroupIds. - Export formatSkillsForPrompt(skills) — no userPrompt argument. - Rename collectMatchedSkillBlobs -> collectAllSkillBlobs and add a step=load_skills.ok success log alongside the existing fail log. - composeSystemPrompt now wraps skill blobs with a model-friendly header. - Tests rewritten to assert language-independent loading and the new always-load contract.
The previous keyword-intersection matcher computed an overlap between trigger keywords extracted from the user prompt and from each skill's description. Because all four builtin skills ship English-only descriptions, no Chinese prompt could ever produce an intersection — every non-English request landed with skills: 0 in the build_request log. PR #88 extended the keyword table with Chinese aliases, but the intersection still required matching tokens on both sides, so the bug persisted. Replace the matcher with progressive disclosure (level 1+2): every active skill body is unconditionally formatted into the system prompt under an "# Available Skills" header, and the model decides which one applies. This mirrors the pattern Claude Design / Claude Code use and removes the language gate entirely. Level-3 disclosure (lazy-load via SkillTool) is out of scope for v0.x. - Drop SKILL_TRIGGER_GROUPS, matchSkillsToPrompt, extractGroupIds. - Export formatSkillsForPrompt(skills) — no userPrompt argument. - Rename collectMatchedSkillBlobs -> collectAllSkillBlobs and add a step=load_skills.ok success log alongside the existing fail log. - composeSystemPrompt now wraps skill blobs with a model-friendly header. - Tests rewritten to assert language-independent loading and the new always-load contract.
* feat(core): switch skills to progressive-disclosure model routing The previous keyword-intersection matcher computed an overlap between trigger keywords extracted from the user prompt and from each skill's description. Because all four builtin skills ship English-only descriptions, no Chinese prompt could ever produce an intersection — every non-English request landed with skills: 0 in the build_request log. PR #88 extended the keyword table with Chinese aliases, but the intersection still required matching tokens on both sides, so the bug persisted. Replace the matcher with progressive disclosure (level 1+2): every active skill body is unconditionally formatted into the system prompt under an "# Available Skills" header, and the model decides which one applies. This mirrors the pattern Claude Design / Claude Code use and removes the language gate entirely. Level-3 disclosure (lazy-load via SkillTool) is out of scope for v0.x. - Drop SKILL_TRIGGER_GROUPS, matchSkillsToPrompt, extractGroupIds. - Export formatSkillsForPrompt(skills) — no userPrompt argument. - Rename collectMatchedSkillBlobs -> collectAllSkillBlobs and add a step=load_skills.ok success log alongside the existing fail log. - composeSystemPrompt now wraps skill blobs with a model-friendly header. - Tests rewritten to assert language-independent loading and the new always-load contract. * fix(core): filter disabled and provider-restricted skills before injection Reuse filterActive() from packages/providers when collecting builtin skill blobs. Without this, skills with disable_model_invocation: true and skills restricted to other providers via trigger.providers were injected into the system prompt for every generate() call, violating the skill contract. Tests cover both gates against the anthropic default model. * test(providers): restore injectSkillsIntoMessages coverage for surviving behavior * chore(changeset): document breaking removal of matchSkillsToPrompt export
Bug: 中文 prompt 全部漏匹配 (log: skills:0)。加中文+遗漏英文。