fix(models): deduplicate model IDs across provider groups by bergeouss · Pull Request #1231 · nesquena/hermes-webui

bergeouss · 2026-04-28T18:44:33Z

Summary

Fixes #1228 — model picker loses provider identity when multiple providers expose the same model ID.

When two providers (e.g. custom:edith and openai-codex) both expose the same bare model ID (gpt-5.4), the model picker has three problems:

Both rows render as selected/active at the same time (same value)
Clicking the other provider's copy is a no-op (value unchanged)
Session persistence stores only the bare model string — provider identity is lost

Changes

Backend: `api/config.py`

_deduplicate_model_ids(groups) — new post-process in _build_available_models_uncached() that detects bare model IDs appearing in 2+ groups and prefixes collisions with @provider_id:. The first occurrence stays bare for backward compatibility with sessions that already store the bare model name. Labels are updated to show provider name for disambiguated entries (e.g. GPT-5.4 (OpenAI Codex)).

Frontend: `static/ui.js`

Updated norm() regex in _findModelInDropdown() to strip @provider: prefixes (including nested custom:edith: format) so fuzzy matching still works for session restore when the stored model ID is bare but the dropdown has a prefixed version.

Tests

tests/test_issue1228_model_picker_duplicate_ids.py — 13 tests covering:
- Backend dedup (9 tests): unique IDs unchanged, single group, empty groups, two/three providers with same model, already-prefixed IDs skipped, mixed unique+colliding, label disambiguation
- Frontend norm regex (4 tests): nested @custom:edith:model, simple @provider:model, OpenRouter slash, existing @minimax: format
Updated test_model_resolver.py to be dedup-aware

Test results

2849 passed, 1 failed (pre-existing isolation issue in test_sprint31, unrelated)

) When multiple providers expose the same bare model ID (e.g. two custom providers both listing gpt-5.4), the model picker cannot distinguish them — both rows appear active and clicking the other provider's copy is a no-op. Fix: - Add _deduplicate_model_ids() post-process in api/config.py that detects duplicate bare model IDs across groups and prefixes collisions with @provider_id: so each entry is globally unique - Update norm() regex in static/ui.js to strip @Provider: prefixes for fuzzy matching, so existing sessions with bare model IDs still restore correctly - First occurrence stays bare for backward compatibility with sessions that already store the bare model name - Update test_model_resolver to be dedup-aware Closes nesquena#1228

nesquena-hermes · 2026-04-28T18:58:20Z

Thanks for the detailed PR, @bergeouss! This fixes a real edge case — when two providers expose the same bare model ID, the picker breaks in two ways (dual active state, no-op click), and session persistence silently loses provider identity.

The design approach is sensible: post-process dedup in _build_available_models_uncached() using @provider_id: prefixes for collisions, while keeping the first occurrence bare for backward compat with existing sessions. The paired norm() strip in _findModelInDropdown() closes the session-restore gap cleanly.

A few things to confirm before merge:

Backward compatibility semantics
The first occurrence stays bare — this is correct for sessions that already stored the bare model name. But: if the first-occurrence provider is later removed or disabled, the session's bare model ID now matches the second provider (which wasn't deduped). Is that acceptable, or should there be a note in code that the ordering is intentionally not stable across config changes?

@provider_id: prefix format
The prefix format is @provider_id:model. Confirm that provider_id is always safe to use as a prefix — specifically, that it cannot contain characters that would break the model picker's existing value comparison logic (e.g. spaces, colons, slashes from OpenRouter-style IDs). The norm() regex strips the prefix for matching, but the raw value is what gets stored in session.

Label disambiguation readability
The label change to GPT-5.4 (OpenAI Codex) is clear. One edge case: what happens when three or more providers all expose the same ID? The first stays bare, and the rest each get their own @provider_id: prefix? Confirming the loop handles N>2 correctly.

Pre-existing test_sprint31 failure
The 1 failing test (test_sprint31) is noted as a pre-existing isolation issue. Is this tracked anywhere (linked issue, comment in the test file)? Ideally the failure is marked as xfail or skipped until fixed, so it doesn't mask real regressions.

13 test cases: the coverage spread (unique, single group, empty, two providers, three providers, already-prefixed, mixed, label disambiguation, frontend norm regex) looks thorough.

Once the above are confirmed, this looks ready!

…na#1228) Address reviewer questions: - Document that first-occurrence ordering is not stable across config changes, but removing a provider causes re-dedup on next cache rebuild, so sessions still match the new bare entry - Confirm @provider_id: format is consistent with existing _apply_provider_prefix() and resolved by resolve_model_provider() (splits on first ':')

bergeouss · 2026-04-28T19:17:00Z

Thanks for the thorough review! Addressing each point:

1. Backward compatibility semantics / ordering instability
Correct observation. Added a clarifying note in the docstring: when the first provider is removed from config, the next cache rebuild re-runs dedup — the remaining provider becomes the sole occurrence and is left bare, so the session's bare model ID still matches. This is safe across config changes because dedup is stateless and runs on every cache build. The ordering is intentionally not stable (group order depends on sorted(detected_providers) and config structure), but the behavior degrades gracefully.

2. @provider_id: prefix format safety
Confirmed safe. Provider IDs in the codebase come from three sources:

Built-in aliases: lowercase slugs with hyphens only (openai-codex, x-ai, opencode-zen, etc.)
Custom provider slugs: "custom:" + name.lower().replace(" ", "-") — e.g. custom:edith
Config dict keys: user-defined in providers section

The @provider:model format is consistent with the existing _apply_provider_prefix() function (line 821) which already uses f"@{provider_id}:{mid}" directly. Resolution in resolve_model_provider() splits on the first : only (split(":", 1)), so nested colons like @custom:edith:gpt-5.4 resolve correctly as provider_hint="custom:edith", bare_model="gpt-5.4".

Added this as a note in the docstring.

3. Label disambiguation for N>2
Confirmed — the loop for gi, mi in locations[1:] handles N providers correctly. Covered by test_three_providers_same_model which verifies provider C gets prefixed when A, B, C all share the same model ID.

4. Pre-existing test_sprint31 failure
Noted — it's an isolation issue where a test profile directory persists between runs. Not in scope for this PR, but agree it should ideally be marked xfail or the test should use unique profile names. Happy to address in a separate fix if needed.

5. 13 test cases
Thanks!

- Use rsplit(':', 1) instead of split(':', 1) in resolve_model_provider() to handle provider_ids containing ':' (e.g. custom:my-key) - Add note in _deduplicate_model_ids docstring about ordering instability across config changes (first occurrence wins is intentional) - Add comment confirming N>2 provider dedup correctness - Add tests for rsplit behavior with colon-containing provider_ids - Mark test_sprint31 integration tests as xfail (pre-existing isolation issue)

bergeouss · 2026-04-28T19:29:35Z

Review Feedback Addressed

Thank you for the thorough review! Here are the responses and fixes for each point:

Issue (Backward compatibility semantics): Added a note in the docstring documenting that the bare ID assignment is intentionally not stable across config changes. The dedup runs on every cache rebuild, so sessions always resolve to the current canonical bare ID.
Issue ( prefix format with colons): Good catch! Custom named providers use IDs like which contain colons. Changed to use instead of so correctly resolves to . The frontend regex already handles nested colons via .
Issue (N>2 providers): Confirmed the loop handles N>2 correctly — it iterates over , prefixing each subsequent occurrence with its own . Added a clarifying comment in the code. The existing test covers this.
Issue (test_sprint31 failure): Marked class with . It will no longer mask real regressions.

Files modified:

— rsplit fix + docstring notes
— new tests for colon-in-provider_id
— xfail marker

🤖 AI-assisted via Hermes Agent

bergeouss · 2026-04-28T19:29:47Z

Review Feedback Addressed

Thank you for the thorough review! Here are the responses and fixes for each point:

Issue (Backward compatibility semantics): Added a note in the _deduplicate_model_ids docstring documenting that the bare ID assignment is intentionally not stable across config changes. The dedup runs on every cache rebuild, so sessions always resolve to the current canonical bare ID.
Issue (@provider_id: prefix format with colons): Good catch! Custom named providers use IDs like custom:my-key which contain colons. Changed resolve_model_provider() to use rsplit(":", 1) instead of split(":", 1) so @custom:my-key:model correctly resolves to provider="custom:my-key", model="model". The frontend norm() regex already handles nested colons via ([^:]+:)+.
Issue (N>2 providers): Confirmed the loop handles N>2 correctly - it iterates over locations[1:], prefixing each subsequent occurrence with its own provider_id. Added a clarifying comment in the code. The existing test test_three_providers_same_model covers this.
Issue (test_sprint31 failure): Marked TestProfileCreateAPIWithEndpoint class with @pytest.mark.xfail(reason="Pre-existing isolation issue: test_server fixture conflict (#sprint31)"). It will no longer mask real regressions.

Files modified:

api/config.py - rsplit fix + docstring notes
tests/test_issue1228_model_picker_duplicate_ids.py - new tests for colon-in-provider_id
tests/test_sprint31.py - xfail marker

🤖 AI-assisted via Hermes Agent

nesquena-hermes · 2026-04-28T21:02:41Z

Thanks for following up, @bergeouss! All four review points are now addressed — and the rsplit fix is the most important catch here:

(1) Backward compatibility semantics: The docstring note is clear. Since dedup re-runs on every cache rebuild and is stateless, the "bare ID = first canonical occurrence" assignment is always correct relative to the current config — not a snapshot of what was true at first boot. That's the right mental model.
(2) @provider_id: prefix with nested colons: The rsplit(":", 1) fix is exactly right. @custom:edith:gpt-5.4 needs to split on the last colon to correctly recover provider_hint="custom:edith", bare_model="gpt-5.4". The original split(":", 1) would have silently broken custom providers whose IDs contained colons — good catch.
(3) N>2 providers: Confirmed by test_three_providers_same_model. Nothing to add.
(4) test_sprint31 xfail: Marking with @pytest.mark.xfail(reason=...) is the right call. It documents the pre-existing isolation issue without letting it suppress real failures.

The duplicate comment at 19:29 appears to be a double-post — no action needed there.

This looks ready to merge! 🟢

nesquena-hermes · 2026-04-29T05:31:05Z

Merged in v0.50.237 via #1243. Thank you @bergeouss! 🎉

- Use rsplit(':', 1) instead of split(':', 1) in resolve_model_provider() to handle provider_ids containing ':' (e.g. custom:my-key) - Add note in _deduplicate_model_ids docstring about ordering instability across config changes (first occurrence wins is intentional) - Add comment confirming N>2 provider dedup correctness - Add tests for rsplit behavior with colon-containing provider_ids - Mark test_sprint31 integration tests as xfail (pre-existing isolation issue)

This was referenced Apr 28, 2026

fix(models): preserve /api/models cache metadata #1239

Closed

Clarify and harden Provider/Model source-of-truth across WebUI, CLI, and Gateway #1240

Open

nesquena-hermes mentioned this pull request Apr 29, 2026

fix+feat: batch v0.50.237 — 21 PRs (embedded terminal, JSON/diff viewers, MCP UI, cron tracking, workspace CRUD, archive upload, DeepSeek V4, NVIDIA NIM, fixes) #1243

Merged

nesquena-hermes closed this Apr 29, 2026

hacker1e7 mentioned this pull request Apr 29, 2026

bug(models): chat/start collapses explicit @provider:model back to bare ID and re-selects first duplicate provider #1253

Closed

hacker1e7 mentioned this pull request Apr 30, 2026

bug(settings): Preferences default model loses provider identity and re-selects the first duplicate provider #1313

Closed

starship-s mentioned this pull request May 1, 2026

fix(models): preserve session provider context #1390

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(models): deduplicate model IDs across provider groups#1231

fix(models): deduplicate model IDs across provider groups#1231
bergeouss wants to merge 3 commits intonesquena:masterfrom
bergeouss:fix/1228-model-picker-duplicate-ids

bergeouss commented Apr 28, 2026

Uh oh!

nesquena-hermes commented Apr 28, 2026

Uh oh!

bergeouss commented Apr 28, 2026

Uh oh!

bergeouss commented Apr 28, 2026

Uh oh!

bergeouss commented Apr 28, 2026

Uh oh!

nesquena-hermes commented Apr 28, 2026

Uh oh!

nesquena-hermes commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bergeouss commented Apr 28, 2026

Summary

Changes

Backend: api/config.py

Frontend: static/ui.js

Tests

Test results

Uh oh!

nesquena-hermes commented Apr 28, 2026

Uh oh!

bergeouss commented Apr 28, 2026

Uh oh!

bergeouss commented Apr 28, 2026

Review Feedback Addressed

Uh oh!

bergeouss commented Apr 28, 2026

Review Feedback Addressed

Uh oh!

nesquena-hermes commented Apr 28, 2026

Uh oh!

nesquena-hermes commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Backend: `api/config.py`

Frontend: `static/ui.js`