fix(ui): prevent fuzzy match false positive in model chip after live Nous fetch#1192
fix(ui): prevent fuzzy match false positive in model chip after live Nous fetch#1192nesquena-hermes wants to merge 2 commits intomasterfrom
Conversation
…alse positives (#1188) Step 3 of _findModelInDropdown() used a truncated 'base' (target with last version segment stripped) as the prefix to match against dropdown options. For 'gpt-5.5', target='gpt.5.5' and base='gpt.5', which incorrectly matched '@nous:openai/gpt-5.4-mini' (norm: 'gpt.5.4.mini') because it starts with 'gpt.5'. The chip would then show 'GPT-5.4 Mini (via Nous)' for a session that stores 'gpt-5.5'. Fix: use the full target as the prefix when base has meaningful content (length > 4 and base !== target). Only fall back to the shorter base when it is a bare root word ('gpt', 'claude', etc.) where stripping the version segment would be a no-op. 'gpt-5.5' with prefixTarget='gpt.5.5': 'gpt.5.4.mini' does NOT start with 'gpt.5.5' → returns null (correct — no false match). 'gpt' with prefixTarget='gpt' (useBase=true): still finds 'gpt.5.4.mini' via the shorter base → prefix match for bare roots preserved. Closes #1188
9 tests run the live _findModelInDropdown function via Node so the real regex/normalization rules are exercised (no Python mirror to drift). Two locked-bad cases (gpt-5.5 → gpt-5.4-mini, claude-opus-4.7 → claude-opus-4.6) reproduce on master and pass on the PR. Seven preserved-good cases (bare-root prefix match, exact match, unrelated) ensure the tighter check doesn't regress legit fuzzy lookups. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
nesquena
left a comment
There was a problem hiding this comment.
Review — end-to-end ✅ (clean approve, regression test pushed)
What this ships
#1188 — _findModelInDropdown() step-3 fuzzy match was over-broad: stripping the trailing version segment from target (e.g. gpt-5.5 → base gpt.5) and matching against any option that startsWith(base) || includes(base). For gpt.5.5, the live Nous model @nous:openai/gpt-5.4-mini (norm: gpt.5.4.mini) starts with gpt.5 → false match. The chip showed "GPT-5.4 Mini (via Nous)" for a session storing gpt-5.5.
7-line fix in static/ui.js:103-110: use the FULL normalized target as the prefix when base.length > 4 and base !== target. Only fall back to the shorter base when it's a bare root (length ≤ 4) where stripping was effectively a no-op.
Traced against upstream hermes-agent
Pure WebUI display logic. Zero agent coupling — this is the chip-label resolver in the model picker.
End-to-end trace
The fix's decision matrix:
| Target | base (after strip) | useBase | prefixTarget | Behaviour |
|---|---|---|---|---|
gpt-5.5 → gpt.5.5 |
gpt.5 (len 5) |
false | gpt.5.5 |
Tight: gpt.5.4.mini doesn't start with gpt.5.5 → no false match ✅ |
gpt-5 → gpt.5 |
gpt (len 3, ≤4) |
true | gpt |
Loose root match preserved ✅ |
gpt → gpt |
gpt (base===target) |
true | gpt |
Bare root match ✅ |
claude-opus-4.6 → claude.opus.4.6 |
claude.opus.4 (len 13) |
false | claude.opus.4.6 |
Tight match ✅ |
claude-opus → claude.opus |
claude.opus (no version, base===target) |
true | claude.opus |
claude.opus.4.6 matches ✅ |
claude-opus-4.7 → claude.opus.4.7 |
claude.opus.4 (len 13) |
false | claude.opus.4.7 |
Won't match claude.opus.4.6 → null instead of wrong version ✅ |
gpt-3.5 → gpt.3.5 |
gpt.3 (len 5) |
false | gpt.3.5 |
gpt.3.5.turbo matches; gpt.4.0 doesn't ✅ |
The includes half of the original disjunction is gone. That was actually the only path that would match e.g. user-typed mini to gpt-5.4-mini — but that's a degenerate case that user likely never hits, and dropping it tightens the match. ✅
Bug-confirmed harness — 9/9 PR, 7/9 master
Built a Node behavioural harness running the live _findModelInDropdown against fake <select> options:
PASS [gpt-5.5 should NOT match gpt-5.4-mini (issue #1188)] → null
PASS [gpt-5.5 finds @nous:openai/gpt-5.5 (exact post-norm prefix)]
PASS [gpt finds gpt-5.4-mini (bare root prefix)]
PASS [gpt-5 finds gpt-5.4-mini (base=gpt is bare root)]
PASS [claude-opus-4.7 should NOT match claude-opus-4.6]
PASS [claude finds claude-opus-4.6 (bare root)]
PASS [claude-opus finds claude-opus-4.6 (base===target since no version)]
PASS [exact match short-circuits]
PASS [unrelated target returns null]
On master, two fail: gpt-5.5 → @nous:openai/gpt-5.4-mini and claude-opus-4.7 → claude-opus-4.6. Exactly the issue's described over-match shape.
What I pushed — 6126552
The PR didn't add a regression test. I added tests/test_issue1188_fuzzy_match.py — 9 tests running the live function via Node so the real regex/normalization rules are exercised (no Python mirror to drift). 2 tests lock the over-match cases as None; 7 lock the preserved-good cases (bare-root prefix, exact short-circuit, unrelated). CI re-ran green on 6126552.
Edge-case trace
| Scenario | Pre-fix | Post-fix |
|---|---|---|
gpt-5.5 session, dropdown has gpt-5.4-mini only |
shows wrong model | shows null (chip falls through to default) ✅ |
gpt-5.5 session, dropdown has gpt-5.5 |
exact match | exact match ✅ |
gpt (legacy bare ID) |
matches first gpt-* | matches first gpt-* (bare root preserved) ✅ |
claude-opus-4.7, dropdown has claude-opus-4.6 only |
wrong sibling-version match | null (correct: not present) ✅ |
mistral-large, dropdown has mistral-medium only |
matches via includes('mistral.large') failing... actually originally would match via startsWith('mistral') if base falls to bare root |
depends on base length |
Legacy o3-2024-12-17 |
o3.2024.12.17 → base o3.2024.12 len 10 → useBase=false → tight |
Won't false-match o3-2024-11-20 ✅ |
Tests
- My new
test_issue1188_fuzzy_match.py: 9/9 pass. - Local full suite: 2637 passed (+ my 9 new tests = 2646), only the unrelated pre-existing macOS
test_sprint3failure that PR #1186 fixes. - CI on PR after my push: ✅ test (3.11), ✅ test (3.12), ✅ test (3.13).
Other audit — confirmed correct
- JS syntax:
node --checkpasses onui.js. - No agent coupling: pure UI display code.
useBasethreshold of 4 chars: covers the realistic bare-root names (gpt,claude,gemini,llama,qwen— waitgeminiis 6,llamais 5 — those wouldn't trigger useBase). Let me re-check:target='gemini'→ base='gemini' (no version) → base===target → useBase=true regardless of length 6. ✅ Same forllama. Thelength<=4check is a separate path for "target with version stripped to bare 1-3-letter root".
Minor observations (non-blocking)
- The threshold of 4 chars is arbitrary; bare 5-letter roots (
llama) hitbase===target(no version to strip) so they go through the useBase branch anyway. The threshold is mostly defensive against pathological short inputs. - Step 3's existence at all is a fuzzy-match fallback — exact match (step 1) and provider-prefix match (step 2) handle the common cases. If users find this still over-matches in some other shape, dropping step 3 entirely would be the next move.
Recommendation
Approved. Tight 7-line fix with clear decision logic. The useBase = base.length<=4 || base===target predicate correctly distinguishes "stripping changed nothing meaningful" (bare roots) from "stripping is now over-loosening the match" (versioned IDs). Behavioural harness directly confirms 2 master failures match the bug shape and all 9 cases pass on PR. Pushed regression test runs the live function via Node so the rules can't silently drift. CI green; no agent coupling. Parked at approval — ready for the release agent's merge/tag pipeline.
…, timestamp sync (#1198) Batch release v0.50.232 — 4 fixes. ## PRs included | PR | Author | Fix | |---|---|---| | #1192 | @nesquena-hermes | Model chip fuzzy-match false positive (#1188) | | #1193 | @nesquena-hermes | openai-codex not detected in model picker (#1189) | | #1196 | @nesquena-hermes | Workspace files blank after second empty-session reload | | #1197 | @bergeouss | Session timestamps wrong with server/client clock drift (#1144) | All four PRs independently reviewed and approved by @nesquena. ## Integration fixes applied **#1193:** Updated misleading comment — `OPENAI_API_KEY` does NOT authenticate the default Codex OAuth endpoint (that uses `chatgpt.com/backend-api/codex` and requires a separate OAuth flow). The comment now accurately states the known limitation. Also replaced a fragile 400-char source-scan test with an isolation-safe unit test. Note: OAuth-authenticated users already get detected via `hermes_cli.auth` — this fix only addresses the env-var fallback path. ## Test results **2764 passed, 2 skipped** (macOS-only workspace tests). Browser QA: **21/21**. `/api/sessions` confirmed returning `server_time` and `server_tz` fields.
|
Merged as v0.50.232 via #1198. Thank you @nesquena-hermes! |
…, timestamp sync (nesquena#1198) Batch release v0.50.232 — 4 fixes. ## PRs included | PR | Author | Fix | |---|---|---| | nesquena#1192 | @nesquena-hermes | Model chip fuzzy-match false positive (nesquena#1188) | | nesquena#1193 | @nesquena-hermes | openai-codex not detected in model picker (nesquena#1189) | | nesquena#1196 | @nesquena-hermes | Workspace files blank after second empty-session reload | | nesquena#1197 | @bergeouss | Session timestamps wrong with server/client clock drift (nesquena#1144) | All four PRs independently reviewed and approved by @nesquena. ## Integration fixes applied **nesquena#1193:** Updated misleading comment — `OPENAI_API_KEY` does NOT authenticate the default Codex OAuth endpoint (that uses `chatgpt.com/backend-api/codex` and requires a separate OAuth flow). The comment now accurately states the known limitation. Also replaced a fragile 400-char source-scan test with an isolation-safe unit test. Note: OAuth-authenticated users already get detected via `hermes_cli.auth` — this fix only addresses the env-var fallback path. ## Test results **2764 passed, 2 skipped** (macOS-only workspace tests). Browser QA: **21/21**. `/api/sessions` confirmed returning `server_time` and `server_tz` fields.
Summary
Fixes a false-positive fuzzy match in
_findModelInDropdown()where a session model likegpt-5.5would resolve to@nous:openai/gpt-5.4-minionce live Nous models loaded, showing the wrong label in the model chip.Root cause
Step 3 of
_findModelInDropdown()strips the last version segment from the normalized target to produce abase:For
gpt-5.5,base = "gpt.5". The Nous live model@nous:openai/gpt-5.4-mininormalizes togpt.5.4.mini, which starts with"gpt.5"→ wrong match. The chip shows "GPT-5.4 Mini (via Nous)" for a session storinggpt-5.5.Fix
Use the full
targetas the prefix whenbasehas meaningful content (length > 4 andbase !== target). Only fall back to the shorterbasewhen it is a bare root like"gpt"or"claude"(length ≤ 4) where stripping the version was essentially a no-op anyway.Testing
@nous:openai/gpt-5.4-minifor session modelgpt-5.5; NEW correctly returns nullCloses #1188