Skip to content

fix(ui): prevent fuzzy match false positive in model chip after live Nous fetch#1192

Closed
nesquena-hermes wants to merge 2 commits intomasterfrom
fix/1188-fuzzy-match-overmatch
Closed

fix(ui): prevent fuzzy match false positive in model chip after live Nous fetch#1192
nesquena-hermes wants to merge 2 commits intomasterfrom
fix/1188-fuzzy-match-overmatch

Conversation

@nesquena-hermes
Copy link
Copy Markdown
Collaborator

Summary

Fixes a false-positive fuzzy match in _findModelInDropdown() where a session model like gpt-5.5 would resolve to @nous:openai/gpt-5.4-mini once live Nous models loaded, showing the wrong label in the model chip.

Root cause

Step 3 of _findModelInDropdown() strips the last version segment from the normalized target to produce a base:

const base = target.replace(/\.\d+$/, '');  // gpt-5.5 → target=gpt.5.5 → base=gpt.5
const partial = opts.find(o => norm(o).startsWith(base) || norm(o).includes(base));

For gpt-5.5, base = "gpt.5". The Nous live model @nous:openai/gpt-5.4-mini normalizes to gpt.5.4.mini, which starts with "gpt.5" → wrong match. The chip shows "GPT-5.4 Mini (via Nous)" for a session storing gpt-5.5.

Fix

Use the full target as the prefix when base has meaningful content (length > 4 and base !== target). Only fall back to the shorter base when it is a bare root like "gpt" or "claude" (length ≤ 4) where stripping the version was essentially a no-op anyway.

gpt-5.5  →  target=gpt.5.5, base=gpt.5, useBase=false, prefixTarget=gpt.5.5
gpt.5.4.mini  startsWith("gpt.5.5") → false  ✓ no false match

gpt  →  target=gpt, base=gpt, useBase=true (length≤4), prefixTarget=gpt
gpt.5.4.mini  startsWith("gpt") → true  ✓ bare-root match preserved

Testing

  • 2685 tests passing
  • Verified with inline Node.js test: OLD incorrectly returns @nous:openai/gpt-5.4-mini for session model gpt-5.5; NEW correctly returns null

Closes #1188

…alse positives (#1188)

Step 3 of _findModelInDropdown() used a truncated 'base' (target with
last version segment stripped) as the prefix to match against dropdown
options. For 'gpt-5.5', target='gpt.5.5' and base='gpt.5', which
incorrectly matched '@nous:openai/gpt-5.4-mini' (norm: 'gpt.5.4.mini')
because it starts with 'gpt.5'. The chip would then show 'GPT-5.4 Mini
(via Nous)' for a session that stores 'gpt-5.5'.

Fix: use the full target as the prefix when base has meaningful content
(length > 4 and base !== target). Only fall back to the shorter base
when it is a bare root word ('gpt', 'claude', etc.) where stripping the
version segment would be a no-op.

'gpt-5.5' with prefixTarget='gpt.5.5': 'gpt.5.4.mini' does NOT start
with 'gpt.5.5' → returns null (correct — no false match).
'gpt' with prefixTarget='gpt' (useBase=true): still finds 'gpt.5.4.mini'
via the shorter base → prefix match for bare roots preserved.

Closes #1188
9 tests run the live _findModelInDropdown function via Node so the real
regex/normalization rules are exercised (no Python mirror to drift).

Two locked-bad cases (gpt-5.5 → gpt-5.4-mini, claude-opus-4.7 →
claude-opus-4.6) reproduce on master and pass on the PR. Seven
preserved-good cases (bare-root prefix match, exact match, unrelated)
ensure the tighter check doesn't regress legit fuzzy lookups.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Copy link
Copy Markdown
Owner

@nesquena nesquena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — end-to-end ✅ (clean approve, regression test pushed)

What this ships

#1188_findModelInDropdown() step-3 fuzzy match was over-broad: stripping the trailing version segment from target (e.g. gpt-5.5 → base gpt.5) and matching against any option that startsWith(base) || includes(base). For gpt.5.5, the live Nous model @nous:openai/gpt-5.4-mini (norm: gpt.5.4.mini) starts with gpt.5 → false match. The chip showed "GPT-5.4 Mini (via Nous)" for a session storing gpt-5.5.

7-line fix in static/ui.js:103-110: use the FULL normalized target as the prefix when base.length > 4 and base !== target. Only fall back to the shorter base when it's a bare root (length ≤ 4) where stripping was effectively a no-op.

Traced against upstream hermes-agent

Pure WebUI display logic. Zero agent coupling — this is the chip-label resolver in the model picker.

End-to-end trace

The fix's decision matrix:

Target base (after strip) useBase prefixTarget Behaviour
gpt-5.5gpt.5.5 gpt.5 (len 5) false gpt.5.5 Tight: gpt.5.4.mini doesn't start with gpt.5.5 → no false match ✅
gpt-5gpt.5 gpt (len 3, ≤4) true gpt Loose root match preserved ✅
gptgpt gpt (base===target) true gpt Bare root match ✅
claude-opus-4.6claude.opus.4.6 claude.opus.4 (len 13) false claude.opus.4.6 Tight match ✅
claude-opusclaude.opus claude.opus (no version, base===target) true claude.opus claude.opus.4.6 matches ✅
claude-opus-4.7claude.opus.4.7 claude.opus.4 (len 13) false claude.opus.4.7 Won't match claude.opus.4.6 → null instead of wrong version ✅
gpt-3.5gpt.3.5 gpt.3 (len 5) false gpt.3.5 gpt.3.5.turbo matches; gpt.4.0 doesn't ✅

The includes half of the original disjunction is gone. That was actually the only path that would match e.g. user-typed mini to gpt-5.4-mini — but that's a degenerate case that user likely never hits, and dropping it tightens the match. ✅

Bug-confirmed harness — 9/9 PR, 7/9 master

Built a Node behavioural harness running the live _findModelInDropdown against fake <select> options:

PASS [gpt-5.5 should NOT match gpt-5.4-mini (issue #1188)] → null
PASS [gpt-5.5 finds @nous:openai/gpt-5.5 (exact post-norm prefix)]
PASS [gpt finds gpt-5.4-mini (bare root prefix)]
PASS [gpt-5 finds gpt-5.4-mini (base=gpt is bare root)]
PASS [claude-opus-4.7 should NOT match claude-opus-4.6]
PASS [claude finds claude-opus-4.6 (bare root)]
PASS [claude-opus finds claude-opus-4.6 (base===target since no version)]
PASS [exact match short-circuits]
PASS [unrelated target returns null]

On master, two fail: gpt-5.5 → @nous:openai/gpt-5.4-mini and claude-opus-4.7 → claude-opus-4.6. Exactly the issue's described over-match shape.

What I pushed — 6126552

The PR didn't add a regression test. I added tests/test_issue1188_fuzzy_match.py — 9 tests running the live function via Node so the real regex/normalization rules are exercised (no Python mirror to drift). 2 tests lock the over-match cases as None; 7 lock the preserved-good cases (bare-root prefix, exact short-circuit, unrelated). CI re-ran green on 6126552.

Edge-case trace

Scenario Pre-fix Post-fix
gpt-5.5 session, dropdown has gpt-5.4-mini only shows wrong model shows null (chip falls through to default) ✅
gpt-5.5 session, dropdown has gpt-5.5 exact match exact match ✅
gpt (legacy bare ID) matches first gpt-* matches first gpt-* (bare root preserved) ✅
claude-opus-4.7, dropdown has claude-opus-4.6 only wrong sibling-version match null (correct: not present) ✅
mistral-large, dropdown has mistral-medium only matches via includes('mistral.large') failing... actually originally would match via startsWith('mistral') if base falls to bare root depends on base length
Legacy o3-2024-12-17 o3.2024.12.17 → base o3.2024.12 len 10 → useBase=false → tight Won't false-match o3-2024-11-20

Tests

  • My new test_issue1188_fuzzy_match.py: 9/9 pass.
  • Local full suite: 2637 passed (+ my 9 new tests = 2646), only the unrelated pre-existing macOS test_sprint3 failure that PR #1186 fixes.
  • CI on PR after my push: ✅ test (3.11), ✅ test (3.12), ✅ test (3.13).

Other audit — confirmed correct

  • JS syntax: node --check passes on ui.js.
  • No agent coupling: pure UI display code.
  • useBase threshold of 4 chars: covers the realistic bare-root names (gpt, claude, gemini, llama, qwen — wait gemini is 6, llama is 5 — those wouldn't trigger useBase). Let me re-check: target='gemini' → base='gemini' (no version) → base===target → useBase=true regardless of length 6. ✅ Same for llama. The length<=4 check is a separate path for "target with version stripped to bare 1-3-letter root".

Minor observations (non-blocking)

  • The threshold of 4 chars is arbitrary; bare 5-letter roots (llama) hit base===target (no version to strip) so they go through the useBase branch anyway. The threshold is mostly defensive against pathological short inputs.
  • Step 3's existence at all is a fuzzy-match fallback — exact match (step 1) and provider-prefix match (step 2) handle the common cases. If users find this still over-matches in some other shape, dropping step 3 entirely would be the next move.

Recommendation

Approved. Tight 7-line fix with clear decision logic. The useBase = base.length<=4 || base===target predicate correctly distinguishes "stripping changed nothing meaningful" (bare roots) from "stripping is now over-loosening the match" (versioned IDs). Behavioural harness directly confirms 2 master failures match the bug shape and all 9 cases pass on PR. Pushed regression test runs the live function via Node so the rules can't silently drift. CI green; no agent coupling. Parked at approval — ready for the release agent's merge/tag pipeline.

nesquena-hermes added a commit that referenced this pull request Apr 28, 2026
…, timestamp sync (#1198)

Batch release v0.50.232 — 4 fixes.

## PRs included

| PR | Author | Fix |
|---|---|---|
| #1192 | @nesquena-hermes | Model chip fuzzy-match false positive (#1188) |
| #1193 | @nesquena-hermes | openai-codex not detected in model picker (#1189) |
| #1196 | @nesquena-hermes | Workspace files blank after second empty-session reload |
| #1197 | @bergeouss | Session timestamps wrong with server/client clock drift (#1144) |

All four PRs independently reviewed and approved by @nesquena.

## Integration fixes applied

**#1193:** Updated misleading comment — `OPENAI_API_KEY` does NOT authenticate the default Codex OAuth endpoint (that uses `chatgpt.com/backend-api/codex` and requires a separate OAuth flow). The comment now accurately states the known limitation. Also replaced a fragile 400-char source-scan test with an isolation-safe unit test. Note: OAuth-authenticated users already get detected via `hermes_cli.auth` — this fix only addresses the env-var fallback path.

## Test results

**2764 passed, 2 skipped** (macOS-only workspace tests). Browser QA: **21/21**. `/api/sessions` confirmed returning `server_time` and `server_tz` fields.
@nesquena-hermes
Copy link
Copy Markdown
Collaborator Author

Merged as v0.50.232 via #1198. Thank you @nesquena-hermes!

@nesquena-hermes nesquena-hermes deleted the fix/1188-fuzzy-match-overmatch branch April 28, 2026 01:40
JKJameson pushed a commit to JKJameson/hermes-webui that referenced this pull request Apr 29, 2026
…, timestamp sync (nesquena#1198)

Batch release v0.50.232 — 4 fixes.

## PRs included

| PR | Author | Fix |
|---|---|---|
| nesquena#1192 | @nesquena-hermes | Model chip fuzzy-match false positive (nesquena#1188) |
| nesquena#1193 | @nesquena-hermes | openai-codex not detected in model picker (nesquena#1189) |
| nesquena#1196 | @nesquena-hermes | Workspace files blank after second empty-session reload |
| nesquena#1197 | @bergeouss | Session timestamps wrong with server/client clock drift (nesquena#1144) |

All four PRs independently reviewed and approved by @nesquena.

## Integration fixes applied

**nesquena#1193:** Updated misleading comment — `OPENAI_API_KEY` does NOT authenticate the default Codex OAuth endpoint (that uses `chatgpt.com/backend-api/codex` and requires a separate OAuth flow). The comment now accurately states the known limitation. Also replaced a fragile 400-char source-scan test with an isolation-safe unit test. Note: OAuth-authenticated users already get detected via `hermes_cli.auth` — this fix only addresses the env-var fallback path.

## Test results

**2764 passed, 2 skipped** (macOS-only workspace tests). Browser QA: **21/21**. `/api/sessions` confirmed returning `server_time` and `server_tz` fields.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(ui): model chip shows wrong model after live Nous fetch — _findModelInDropdown over-broad fuzzy match

2 participants