fix(#1384): heal 'provider: local' mid-conversation crash for local-model users#1388
fix(#1384): heal 'provider: local' mid-conversation crash for local-model users#1388nesquena-hermes wants to merge 1 commit intomasterfrom
Conversation
…odel users
When model.base_url pointed at an OpenAI-compatible loopback endpoint
that didn't match the ollama/localhost/lmstudio keyword classifier
(192.168.x.y:8080, llama.cpp, vLLM, TabbyAPI, custom proxies), the
WebUI auto-detected provider="local" and persisted it to config.yaml.
Inference worked initially because the main agent has its own direct
path that uses explicit base_url + api_key. But once the conversation
hit the auto-compression threshold — or when vision / web extraction
fired — the agent's auxiliary client routed through
resolve_provider_client("local", ...), fell through every branch (since
"local" is not registered in PROVIDER_REGISTRY), and raised:
Provider 'local' is set in config.yaml but no API key was found.
Set the LOCAL_API_KEY environment variable, or switch to a different
provider with `hermes model`.
Three-layer fix so users already in the broken state heal automatically:
1. Auto-detect block at api/config.py:1748 now writes provider="custom"
for unknown loopback hosts. "custom" is the canonical OpenAI-compat
fall-through and the agent's auxiliary client takes the
"no-key-required" path for it.
2. resolve_model_provider() rewrites legacy "local" → "custom" at read
time so existing broken configs heal automatically without the user
having to edit config.yaml by hand.
3. set_hermes_default_model() refuses to persist "local" going forward,
and _PROVIDER_ALIASES gets a "local" → "custom" entry for any
consumer that normalises through the alias table.
9 regression tests in tests/test_issue1384_local_provider.py covering
the source-code invariant, both YAML migration cases (lowercase + mixed
case), pass-through for unrelated providers, the round-trip save
invariant, and the alias table entry.
Verified end-to-end with the agent's actual resolve_runtime_provider()
and resolve_provider_client() — the "custom" path returns a working
OpenAI client with api_key="no-key-required" against a 127.0.0.1
endpoint.
Closes #1384
nesquena
left a comment
There was a problem hiding this comment.
Review — end-to-end ✅ (clean approve, three-layer fix verified against upstream)
Self-authored fix for #1384 — the "Provider 'local' is set in config.yaml but no API key was found" mid-conversation crash for local-model users. Implements the issue's recommended Fix C (WebUI defensive) plus two heal-existing-state layers. +33 LOC in api/config.py, 9 regression tests.
Single commit
c221d33 fix(#1384): heal 'provider: local' mid-conversation crash for local-model users
What this ships — three-layer fix
The bug: WebUI's auto-detect block at api/config.py:1745-1782 writes provider = "local" to config.yaml for unknown loopback hosts (e.g. 192.168.1.10:8080, llama.cpp on 127.0.0.1:8080, vLLM, TabbyAPI, custom proxies). But "local" is not registered in hermes_cli.auth.PROVIDER_REGISTRY. The main agent works because it reads model.base_url + model.api_key directly. But once auto-compression / vision / web-extract fires and routes through the auxiliary client, /tmp/hermes-agent-fresh/agent/auxiliary_client.py:3336-3342 raises:
_explicit = (resolved_provider or "").strip().lower()
if _explicit and _explicit not in ("auto", "openrouter", "custom"):
raise RuntimeError(
f"Provider '{_explicit}' is set in config.yaml but no API key was found. "
f"Set the {_explicit.upper()}_API_KEY environment variable, ..."
)"local" is not in the safe-pass list ("auto", "openrouter", "custom") → raise → user perceives "the chat suddenly broke after a few messages."
The fix routes through "custom" instead, which:
- IS in the safe-pass list at three sites: agent/auxiliary_client.py:3337, auxiliary_client.py:3647, run_agent.py:1466.
- Has an explicit handler at agent/auxiliary_client.py:2115-2147 that takes the
"no-key-required"OpenAI-compat path for explicitbase_url. Verified the handler's exact behavior at line 2118-2122:custom_key = explicit_api_key or OPENAI_API_KEY env or "no-key-required".
Three layers so users already in the broken state heal automatically:
Layer 1 — Stop creating new broken state at api/config.py:1758,1770-1780: the auto-detect block now starts with provider = "custom" as the default at line 1758, AND the explicit else branch at line 1780 reaffirms provider = "custom". The else is technically a no-op (since the default is already "custom"), but it's explicit for documentation purposes — clearly says "unknown loopback host → custom".
Layer 2 — Heal existing broken configs at read time at api/config.py:980-988:
# Heal legacy ``provider: local`` entries (written by WebUI < v0.50.252)
# at read time. ``local`` is not a registered provider...
if isinstance(config_provider, str) and config_provider.strip().lower() == "local":
config_provider = "custom"This fires on every resolve_model_provider() call, used by streaming.py:1704 on every chat turn. Existing broken configs heal without the user editing config.yaml. Case-insensitive (Local, LOCAL both healed).
Layer 3 — Refuse to persist "local" on save at api/config.py:1209-1210:
if persisted_provider.lower() == "local":
persisted_provider = "custom"Backstop that fires when previous_provider == "local" falls through (because resolved_provider returned None). Plus an alias entry at api/config.py:622: "local": "custom" in _PROVIDER_ALIASES.
Traced against upstream hermes-agent
Verified:
- Custom branch IS the no-key-required path: agent/auxiliary_client.py:2115-2147 —
custom_key = ... or "no-key-required"at line 2121. - All three LOCAL_API_KEY raise sites whitelist
"custom": lines 3337, 3647 inauxiliary_client.py, line 1466 inrun_agent.py. - Upstream
_PROVIDER_ALIASEShas no"local"entry: /tmp/hermes-agent-fresh/hermes_cli/models.py:813. The webui's local entry is the only thing that mapslocal → customfor the_resolve_provider_aliasfunction. Verified the lookup chain at api/config.py:640-645: tries agent's table first (no hit forlocal), then falls through to webui's local table (local → custom). ✅
End-to-end trace — both directions
Forward (preventing new state):
- User sets
model.base_url: http://192.168.1.10:8080/v1(custom non-loopback proxy or unfamiliar local URL). _build_configured_model_badges()runs →urlparse→ checks ifaddr.is_private/loopback/link_local.- Hostname matching:
"ollama"not in host,"127.0.0.1"not in host,"localhost"not in host,"lmstudio"not in host → falls to else. provider = "custom"(line 1780). Persisted. ✅
Reverse (healing existing state):
- User has pre-existing
config.yamlwithprovider: local. - User sends a chat →
streaming.py:1704callsresolve_model_provider(model). - Line 987-988:
config_provider = "local"→ healed to"custom". - Returns
(model, "custom", base_url). - Line 1711:
resolve_runtime_provider(requested="custom")— agent's resolver takes the safe path. - Line 1771:
AIAgent(provider="custom", ...)— main agent uses explicit base_url. - Auto-compression fires at some point → auxiliary client →
resolve_provider_client("custom", ...)→ no-key-required path. ✅
Save round-trip:
- User picks a model in dropdown →
set_hermes_default_model("qwen"). - Reads YAML directly:
previous_provider = "local". - Calls
resolve_model_provider("qwen")→ returnsresolved_provider = "custom"(Layer 2 heal). persisted_provider = "custom"(resolved_provider takes precedence).- Layer 3 backstop at line 1209-1210 fires only if persisted_provider somehow ended up as "local" (defense in depth).
- Writes
provider: customto YAML. ✅
Cross-tool consistency
- ✅
provider: customis honored by both webui and agent CLI — it's a canonical value inPROVIDER_REGISTRYand has explicit handlers. - ✅
base_urlandapi_keyround-trip cleanly — webui preserves them; agent reads them viaresolve_runtime_provider. - ✅ Frontend dropdown still works —
_resolve_provider_alias("local")returns"custom", so any badge/dropdown logic that normalizes through the alias correctly maps to the custom group. - ✅ CLI users with
provider: localin their YAML — the issue notes the WebUI fix doesn't address the underlying agent gap (Fix A/B). CLI users still hit the bug, but this PR is correctly scoped to webui-only relief. CLI fix belongs in a separate hermes-agent PR.
Security audit
- ✅ No new endpoints, no new env vars, no new file-serving surface.
- ✅ No SSRF surface added — the auto-detect block already handled URL parsing; this PR only changes the result string.
- ✅ Path traversal:
"local"and"custom"are static strings; no user input flows through. - ✅ Whitespace + case handling —
config_provider.strip().lower() == "local"correctly tolerates"Local","LOCAL","local ", etc. - ✅
isinstance(config_provider, str)guard at line 987 prevents AttributeError whenproviderisNone/int/etc.
Edge-case matrix
| Scenario | Pre-fix | Post-fix |
|---|---|---|
Auto-detect with 127.0.0.1:11434 (Ollama-style) |
provider = "ollama" |
Same — host classifier still fires ✅ |
Auto-detect with lm-studio.local:1234 |
provider = "lmstudio" |
Same ✅ |
Auto-detect with 192.168.1.10:8080 (unknown loopback) |
provider = "local" → BREAKS mid-conversation |
provider = "custom" → no-key-required path ✅ |
Auto-detect with non-IP hostname (e.g. myserver.lan:8080) |
provider = "custom" (try/except ValueError) |
Same — provider stays "custom" from default at line 1758 ✅ |
Auto-detect with public IP (e.g. 203.0.113.5:8080) |
provider = "custom" (not loopback) |
Same ✅ |
Existing config.yaml with provider: local (lowercase) |
Mid-conversation crash | Healed at read time → "custom" ✅ |
Existing config.yaml with provider: Local (mixed case) |
Mid-conversation crash | Healed (case-insensitive) ✅ |
Existing config.yaml with provider: anthropic |
Pass through | Pass through (no false positive) ✅ |
Existing config.yaml with provider: "" (empty) |
None returned | None returned (isinstance check passes; .strip() == "" not == "local") ✅ |
Existing config.yaml with provider: None |
None returned | None returned (isinstance check fails) ✅ |
set_hermes_default_model when previous is "local" |
Persists "local" → user re-broken | Persists "custom" via L2 heal + L3 backstop ✅ |
_resolve_provider_alias("local") |
Pass through unchanged | Returns "custom" ✅ |
| Frontend dropdown badge for healed config | Showed "local" badge | Shows "custom" badge (via alias) ✅ |
Tests
test_issue1384_local_provider.py— 9/9 pass:- 2 source-code invariants (no
provider = "local"literal; auto-detect else assigns"custom"). - 3 read-time healing (lowercase, mixed-case, pass-through for non-
localproviders). - 1 round-trip save invariant (
set_hermes_default_modelwithprevious_provider="local"persists"custom"). - 3 alias-table tests (resolution, case-insensitive, raw entry presence).
- 2 source-code invariants (no
- Full suite: 3443 passed, 54 skipped, 3 xpassed, 0 failed in 16.82s on
c221d33. - CI: No checks attached (branch was just pushed). Mirrors what the PR description claims (3495 passing on the bot's machine — counting drift consistent with prior PRs).
Other audit — confirmed correct
- ✅ Backwards compat — any existing
provider: customconfigs still work exactly as before. - ✅ Forward compat — if upstream agent ever adds
"local"as a real provider with its own credentials, the webui's alias-rewrite would mask it. Worth removing the alias if/when that happens. Not a current concern. - ✅ No mutation of
cfgglobal —resolve_model_providermutates only the localconfig_providervariable, not the in-memory_cfg_cache. So subsequent calls re-heal on every read; no torn state. - ✅
set_hermes_default_modelreads YAML directly at line 1184 (_load_yaml_config_file(config_path)) — bypasses_cfg_cache. So even if cache had stale "local", the save path sees fresh disk content. Then layer 3 backstop covers the edge case.
Minor observations (non-blocking)
- Layer 1's
else: provider = "custom"is a no-op redundancy — the default at line 1758 already setsprovider = "custom". The else branch is purely documentation. Worth keeping for future readers; not worth removing. - The auto-detect block isn't directly tested behaviorally — only via source-level regex. A test that constructs a config with
base_url: http://192.168.1.10:8080/v1, runs_build_configured_model_badges, and asserts the resulting badge hasprovider: "custom"would lock the auto-detect path end-to-end. Out of scope; the source-level regex catches the most likely regression (someone reverting toprovider = "local"). - Upstream PR for Fix A/B — the issue notes that CLI users (without the WebUI in front) still hit the bug. A complementary hermes-agent PR adding
local/lmstudio/ollamahandlers inresolve_provider_client()would close the underlying gap. Out of scope here; flagged in the PR description. _PROVIDER_ALIASEStable grows — the local copy now has 30+ entries. Most are 1:1 cosmetic aliases that should ideally live upstream. Future cleanup, not blocking.- Comment refers to "WebUI < v0.50.252" at line 980 — assumes the next release will be 0.50.252. The release agent will pick the actual number. Worth updating after merge if it lands as something else.
Recommendation
✅ Approved. Three-layer fix correctly heals both new and existing broken state. Cross-checked against upstream that "custom" is the canonical safe-pass value in all three LOCAL_API_KEY raise sites and takes the no-key-required OpenAI-compat path in resolve_provider_client. The auto-detect block now never produces "local", the read path heals it, the save path refuses to persist it, and the alias table catches any other normalizer.
Parked at approval — ready for the release agent's merge/tag pipeline.
|
Released as part of v0.50.253 — thanks self-built (nesquena-hermes)! This PR was merged into the v0.50.253 release batch via #1391 alongside two other contributor fixes (#1342 by @bergeouss and #1381 by @starship-s). Full CHANGELOG entry: https://github.com/nesquena/hermes-webui/blob/master/CHANGELOG.md. Pre-release verification:
Closing this PR — the change is live on master and tagged. |
… + nesquena#1381 + 2 Opus follow-ups)
… + nesquena#1381 + 2 Opus follow-ups)
What
Fully resolves #1384 —
Provider 'local' is set in config.yaml but no API key was foundmid-conversation.The bug: a user pointed
model.base_urlat an OpenAI-compatible local endpoint that didn't match the WebUI'sollama/localhost/lmstudiokeyword classifier. The auto-detect block atapi/config.py:1748wroteprovider: "local"toconfig.yaml. The first few inferences worked because the main agent has its own direct path that uses the explicitbase_url + api_key. Once context compression / vision / web extraction fired, the auxiliary client routed throughresolve_provider_client("local", …), fell through every branch (because"local"is not inhermes_cli.auth.PROVIDER_REGISTRY), and raised the LOCAL_API_KEY error. The user perceives this as "the chat suddenly broke after a few messages."Why a three-layer fix
The issue's suggested Fix C (just stop writing
"local") is sufficient for new users, but a user who already hit this bug hasprovider: localpersisted in theirconfig.yamland would still be broken on next startup. So this PR heals existing state too:Stop creating new broken state —
api/config.py:1748now writesprovider = "custom"instead of"local".customis the canonical OpenAI-compat fall-through and the agent's auxiliary client takes theno-key-requiredpath for it (verified — see "Verification" below).Heal existing broken configs at read time —
resolve_model_provider()rewrites"local"→"custom"so users who already haveprovider: localget fixed automatically on next request, without having to editconfig.yamlby hand.Refuse to persist
"local"on save —set_hermes_default_model()rewrites"local"→"custom"before writing config.yaml, plus a_PROVIDER_ALIASES["local"] = "custom"entry for any consumer that normalises through the alias table.What this does NOT do
This is a WebUI-only fix that bypasses the underlying agent gap (the issue's "Fix A" / "Fix B" — adding
local/lmstudio/ollamabranches toresolve_provider_client()in hermes-agent). Those are still worth doing for the CLI users who can hit the same gap, but they belong in a separate hermes-agent PR. This change makes hermes-webui users whole regardless of whether the agent gap ever closes.Verification
Tested empirically against
~/.hermes/hermes-agent:resolve_runtime_provider(requested="custom")with a loopbackbase_urlreturns the correct dict includingapi_key='no-key-required'— the agent's auxiliary client takes the working path.resolve_provider_client("custom", ...)returns a workingOpenAIclient against127.0.0.1:11434.provider="local"path returns(None, None)fromresolve_provider_client(the bug).Tests
tests/test_issue1384_local_provider.py— 9 new tests:TestAutoDetectWritesCustom(×2) — source-code invariant (provider = "local"literal must not appear inapi/config.py) + auto-detect branch structural checkTestResolveModelProviderHealsLegacyLocal(×3) — covers lowercase, mixed-case (Local/LOCAL), and pass-through for unrelated providers (anthropicetc. must not be touched)TestSetHermesDefaultModelNeverPersistsLocal(×1) — round-trip save invariant: writing a new default withprevious_provider="local"persists"custom"inconfig.yamlTestAliasTableHasLocalEntry(×3) — alias resolution, case-insensitive, raw table entryFull suite: 3495 passed, 2 skipped, 3 xpassed in 74.7s.
Files changed
api/config.py— auto-detect branch (line 1748),resolve_model_provider()migration,set_hermes_default_model()save guard,_PROVIDER_ALIASESentrytests/test_issue1384_local_provider.py— newCHANGELOG.md—[Unreleased] → FixedentryRisk
Low. The change is surgical, fully backward compatible (any existing
provider: customin user configs still works exactly as before), and the only behaviour change is that a previously-fatal value is now silently rewritten to a working one. No UI changes, no API contract changes.