Skip to content

fix(#1384): heal 'provider: local' mid-conversation crash for local-model users#1388

Closed
nesquena-hermes wants to merge 1 commit intomasterfrom
fix/1384-local-provider
Closed

fix(#1384): heal 'provider: local' mid-conversation crash for local-model users#1388
nesquena-hermes wants to merge 1 commit intomasterfrom
fix/1384-local-provider

Conversation

@nesquena-hermes
Copy link
Copy Markdown
Collaborator

What

Fully resolves #1384Provider 'local' is set in config.yaml but no API key was found mid-conversation.

The bug: a user pointed model.base_url at an OpenAI-compatible local endpoint that didn't match the WebUI's ollama/localhost/lmstudio keyword classifier. The auto-detect block at api/config.py:1748 wrote provider: "local" to config.yaml. The first few inferences worked because the main agent has its own direct path that uses the explicit base_url + api_key. Once context compression / vision / web extraction fired, the auxiliary client routed through resolve_provider_client("local", …), fell through every branch (because "local" is not in hermes_cli.auth.PROVIDER_REGISTRY), and raised the LOCAL_API_KEY error. The user perceives this as "the chat suddenly broke after a few messages."

Why a three-layer fix

The issue's suggested Fix C (just stop writing "local") is sufficient for new users, but a user who already hit this bug has provider: local persisted in their config.yaml and would still be broken on next startup. So this PR heals existing state too:

  1. Stop creating new broken stateapi/config.py:1748 now writes provider = "custom" instead of "local". custom is the canonical OpenAI-compat fall-through and the agent's auxiliary client takes the no-key-required path for it (verified — see "Verification" below).

  2. Heal existing broken configs at read timeresolve_model_provider() rewrites "local""custom" so users who already have provider: local get fixed automatically on next request, without having to edit config.yaml by hand.

  3. Refuse to persist "local" on saveset_hermes_default_model() rewrites "local""custom" before writing config.yaml, plus a _PROVIDER_ALIASES["local"] = "custom" entry for any consumer that normalises through the alias table.

What this does NOT do

This is a WebUI-only fix that bypasses the underlying agent gap (the issue's "Fix A" / "Fix B" — adding local/lmstudio/ollama branches to resolve_provider_client() in hermes-agent). Those are still worth doing for the CLI users who can hit the same gap, but they belong in a separate hermes-agent PR. This change makes hermes-webui users whole regardless of whether the agent gap ever closes.

Verification

Tested empirically against ~/.hermes/hermes-agent:

  • resolve_runtime_provider(requested="custom") with a loopback base_url returns the correct dict including api_key='no-key-required' — the agent's auxiliary client takes the working path.
  • resolve_provider_client("custom", ...) returns a working OpenAI client against 127.0.0.1:11434.
  • Confirmed the broken provider="local" path returns (None, None) from resolve_provider_client (the bug).

Tests

tests/test_issue1384_local_provider.py — 9 new tests:

  • TestAutoDetectWritesCustom (×2) — source-code invariant (provider = "local" literal must not appear in api/config.py) + auto-detect branch structural check
  • TestResolveModelProviderHealsLegacyLocal (×3) — covers lowercase, mixed-case (Local/LOCAL), and pass-through for unrelated providers (anthropic etc. must not be touched)
  • TestSetHermesDefaultModelNeverPersistsLocal (×1) — round-trip save invariant: writing a new default with previous_provider="local" persists "custom" in config.yaml
  • TestAliasTableHasLocalEntry (×3) — alias resolution, case-insensitive, raw table entry

Full suite: 3495 passed, 2 skipped, 3 xpassed in 74.7s.

Files changed

  • api/config.py — auto-detect branch (line 1748), resolve_model_provider() migration, set_hermes_default_model() save guard, _PROVIDER_ALIASES entry
  • tests/test_issue1384_local_provider.py — new
  • CHANGELOG.md[Unreleased] → Fixed entry

Risk

Low. The change is surgical, fully backward compatible (any existing provider: custom in user configs still works exactly as before), and the only behaviour change is that a previously-fatal value is now silently rewritten to a working one. No UI changes, no API contract changes.

…odel users

When model.base_url pointed at an OpenAI-compatible loopback endpoint
that didn't match the ollama/localhost/lmstudio keyword classifier
(192.168.x.y:8080, llama.cpp, vLLM, TabbyAPI, custom proxies), the
WebUI auto-detected provider="local" and persisted it to config.yaml.

Inference worked initially because the main agent has its own direct
path that uses explicit base_url + api_key. But once the conversation
hit the auto-compression threshold — or when vision / web extraction
fired — the agent's auxiliary client routed through
resolve_provider_client("local", ...), fell through every branch (since
"local" is not registered in PROVIDER_REGISTRY), and raised:

  Provider 'local' is set in config.yaml but no API key was found.
  Set the LOCAL_API_KEY environment variable, or switch to a different
  provider with `hermes model`.

Three-layer fix so users already in the broken state heal automatically:

1. Auto-detect block at api/config.py:1748 now writes provider="custom"
   for unknown loopback hosts. "custom" is the canonical OpenAI-compat
   fall-through and the agent's auxiliary client takes the
   "no-key-required" path for it.
2. resolve_model_provider() rewrites legacy "local" → "custom" at read
   time so existing broken configs heal automatically without the user
   having to edit config.yaml by hand.
3. set_hermes_default_model() refuses to persist "local" going forward,
   and _PROVIDER_ALIASES gets a "local" → "custom" entry for any
   consumer that normalises through the alias table.

9 regression tests in tests/test_issue1384_local_provider.py covering
the source-code invariant, both YAML migration cases (lowercase + mixed
case), pass-through for unrelated providers, the round-trip save
invariant, and the alias table entry.

Verified end-to-end with the agent's actual resolve_runtime_provider()
and resolve_provider_client() — the "custom" path returns a working
OpenAI client with api_key="no-key-required" against a 127.0.0.1
endpoint.

Closes #1384
Copy link
Copy Markdown
Owner

@nesquena nesquena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — end-to-end ✅ (clean approve, three-layer fix verified against upstream)

Self-authored fix for #1384 — the "Provider 'local' is set in config.yaml but no API key was found" mid-conversation crash for local-model users. Implements the issue's recommended Fix C (WebUI defensive) plus two heal-existing-state layers. +33 LOC in api/config.py, 9 regression tests.

Single commit

c221d33  fix(#1384): heal 'provider: local' mid-conversation crash for local-model users

What this ships — three-layer fix

The bug: WebUI's auto-detect block at api/config.py:1745-1782 writes provider = "local" to config.yaml for unknown loopback hosts (e.g. 192.168.1.10:8080, llama.cpp on 127.0.0.1:8080, vLLM, TabbyAPI, custom proxies). But "local" is not registered in hermes_cli.auth.PROVIDER_REGISTRY. The main agent works because it reads model.base_url + model.api_key directly. But once auto-compression / vision / web-extract fires and routes through the auxiliary client, /tmp/hermes-agent-fresh/agent/auxiliary_client.py:3336-3342 raises:

_explicit = (resolved_provider or "").strip().lower()
if _explicit and _explicit not in ("auto", "openrouter", "custom"):
    raise RuntimeError(
        f"Provider '{_explicit}' is set in config.yaml but no API key was found. "
        f"Set the {_explicit.upper()}_API_KEY environment variable, ..."
    )

"local" is not in the safe-pass list ("auto", "openrouter", "custom") → raise → user perceives "the chat suddenly broke after a few messages."

The fix routes through "custom" instead, which:

  1. IS in the safe-pass list at three sites: agent/auxiliary_client.py:3337, auxiliary_client.py:3647, run_agent.py:1466.
  2. Has an explicit handler at agent/auxiliary_client.py:2115-2147 that takes the "no-key-required" OpenAI-compat path for explicit base_url. Verified the handler's exact behavior at line 2118-2122: custom_key = explicit_api_key or OPENAI_API_KEY env or "no-key-required".

Three layers so users already in the broken state heal automatically:

Layer 1 — Stop creating new broken state at api/config.py:1758,1770-1780: the auto-detect block now starts with provider = "custom" as the default at line 1758, AND the explicit else branch at line 1780 reaffirms provider = "custom". The else is technically a no-op (since the default is already "custom"), but it's explicit for documentation purposes — clearly says "unknown loopback host → custom".

Layer 2 — Heal existing broken configs at read time at api/config.py:980-988:

# Heal legacy ``provider: local`` entries (written by WebUI < v0.50.252)
# at read time. ``local`` is not a registered provider...
if isinstance(config_provider, str) and config_provider.strip().lower() == "local":
    config_provider = "custom"

This fires on every resolve_model_provider() call, used by streaming.py:1704 on every chat turn. Existing broken configs heal without the user editing config.yaml. Case-insensitive (Local, LOCAL both healed).

Layer 3 — Refuse to persist "local" on save at api/config.py:1209-1210:

if persisted_provider.lower() == "local":
    persisted_provider = "custom"

Backstop that fires when previous_provider == "local" falls through (because resolved_provider returned None). Plus an alias entry at api/config.py:622: "local": "custom" in _PROVIDER_ALIASES.

Traced against upstream hermes-agent

Verified:

  • Custom branch IS the no-key-required path: agent/auxiliary_client.py:2115-2147custom_key = ... or "no-key-required" at line 2121.
  • All three LOCAL_API_KEY raise sites whitelist "custom": lines 3337, 3647 in auxiliary_client.py, line 1466 in run_agent.py.
  • Upstream _PROVIDER_ALIASES has no "local" entry: /tmp/hermes-agent-fresh/hermes_cli/models.py:813. The webui's local entry is the only thing that maps local → custom for the _resolve_provider_alias function. Verified the lookup chain at api/config.py:640-645: tries agent's table first (no hit for local), then falls through to webui's local table (local → custom). ✅

End-to-end trace — both directions

Forward (preventing new state):

  1. User sets model.base_url: http://192.168.1.10:8080/v1 (custom non-loopback proxy or unfamiliar local URL).
  2. _build_configured_model_badges() runs → urlparse → checks if addr.is_private/loopback/link_local.
  3. Hostname matching: "ollama" not in host, "127.0.0.1" not in host, "localhost" not in host, "lmstudio" not in host → falls to else.
  4. provider = "custom" (line 1780). Persisted. ✅

Reverse (healing existing state):

  1. User has pre-existing config.yaml with provider: local.
  2. User sends a chat → streaming.py:1704 calls resolve_model_provider(model).
  3. Line 987-988: config_provider = "local" → healed to "custom".
  4. Returns (model, "custom", base_url).
  5. Line 1711: resolve_runtime_provider(requested="custom") — agent's resolver takes the safe path.
  6. Line 1771: AIAgent(provider="custom", ...) — main agent uses explicit base_url.
  7. Auto-compression fires at some point → auxiliary client → resolve_provider_client("custom", ...) → no-key-required path. ✅

Save round-trip:

  1. User picks a model in dropdown → set_hermes_default_model("qwen").
  2. Reads YAML directly: previous_provider = "local".
  3. Calls resolve_model_provider("qwen") → returns resolved_provider = "custom" (Layer 2 heal).
  4. persisted_provider = "custom" (resolved_provider takes precedence).
  5. Layer 3 backstop at line 1209-1210 fires only if persisted_provider somehow ended up as "local" (defense in depth).
  6. Writes provider: custom to YAML. ✅

Cross-tool consistency

  • provider: custom is honored by both webui and agent CLI — it's a canonical value in PROVIDER_REGISTRY and has explicit handlers.
  • base_url and api_key round-trip cleanly — webui preserves them; agent reads them via resolve_runtime_provider.
  • Frontend dropdown still works_resolve_provider_alias("local") returns "custom", so any badge/dropdown logic that normalizes through the alias correctly maps to the custom group.
  • CLI users with provider: local in their YAML — the issue notes the WebUI fix doesn't address the underlying agent gap (Fix A/B). CLI users still hit the bug, but this PR is correctly scoped to webui-only relief. CLI fix belongs in a separate hermes-agent PR.

Security audit

  • No new endpoints, no new env vars, no new file-serving surface.
  • No SSRF surface added — the auto-detect block already handled URL parsing; this PR only changes the result string.
  • Path traversal: "local" and "custom" are static strings; no user input flows through.
  • Whitespace + case handlingconfig_provider.strip().lower() == "local" correctly tolerates "Local", "LOCAL", "local ", etc.
  • isinstance(config_provider, str) guard at line 987 prevents AttributeError when provider is None/int/etc.

Edge-case matrix

Scenario Pre-fix Post-fix
Auto-detect with 127.0.0.1:11434 (Ollama-style) provider = "ollama" Same — host classifier still fires ✅
Auto-detect with lm-studio.local:1234 provider = "lmstudio" Same ✅
Auto-detect with 192.168.1.10:8080 (unknown loopback) provider = "local" → BREAKS mid-conversation provider = "custom" → no-key-required path ✅
Auto-detect with non-IP hostname (e.g. myserver.lan:8080) provider = "custom" (try/except ValueError) Same — provider stays "custom" from default at line 1758 ✅
Auto-detect with public IP (e.g. 203.0.113.5:8080) provider = "custom" (not loopback) Same ✅
Existing config.yaml with provider: local (lowercase) Mid-conversation crash Healed at read time → "custom" ✅
Existing config.yaml with provider: Local (mixed case) Mid-conversation crash Healed (case-insensitive) ✅
Existing config.yaml with provider: anthropic Pass through Pass through (no false positive) ✅
Existing config.yaml with provider: "" (empty) None returned None returned (isinstance check passes; .strip() == "" not == "local") ✅
Existing config.yaml with provider: None None returned None returned (isinstance check fails) ✅
set_hermes_default_model when previous is "local" Persists "local" → user re-broken Persists "custom" via L2 heal + L3 backstop ✅
_resolve_provider_alias("local") Pass through unchanged Returns "custom" ✅
Frontend dropdown badge for healed config Showed "local" badge Shows "custom" badge (via alias) ✅

Tests

  • test_issue1384_local_provider.py — 9/9 pass:
    • 2 source-code invariants (no provider = "local" literal; auto-detect else assigns "custom").
    • 3 read-time healing (lowercase, mixed-case, pass-through for non-local providers).
    • 1 round-trip save invariant (set_hermes_default_model with previous_provider="local" persists "custom").
    • 3 alias-table tests (resolution, case-insensitive, raw entry presence).
  • Full suite: 3443 passed, 54 skipped, 3 xpassed, 0 failed in 16.82s on c221d33.
  • CI: No checks attached (branch was just pushed). Mirrors what the PR description claims (3495 passing on the bot's machine — counting drift consistent with prior PRs).

Other audit — confirmed correct

  • Backwards compat — any existing provider: custom configs still work exactly as before.
  • Forward compat — if upstream agent ever adds "local" as a real provider with its own credentials, the webui's alias-rewrite would mask it. Worth removing the alias if/when that happens. Not a current concern.
  • No mutation of cfg globalresolve_model_provider mutates only the local config_provider variable, not the in-memory _cfg_cache. So subsequent calls re-heal on every read; no torn state.
  • set_hermes_default_model reads YAML directly at line 1184 (_load_yaml_config_file(config_path)) — bypasses _cfg_cache. So even if cache had stale "local", the save path sees fresh disk content. Then layer 3 backstop covers the edge case.

Minor observations (non-blocking)

  • Layer 1's else: provider = "custom" is a no-op redundancy — the default at line 1758 already sets provider = "custom". The else branch is purely documentation. Worth keeping for future readers; not worth removing.
  • The auto-detect block isn't directly tested behaviorally — only via source-level regex. A test that constructs a config with base_url: http://192.168.1.10:8080/v1, runs _build_configured_model_badges, and asserts the resulting badge has provider: "custom" would lock the auto-detect path end-to-end. Out of scope; the source-level regex catches the most likely regression (someone reverting to provider = "local").
  • Upstream PR for Fix A/B — the issue notes that CLI users (without the WebUI in front) still hit the bug. A complementary hermes-agent PR adding local/lmstudio/ollama handlers in resolve_provider_client() would close the underlying gap. Out of scope here; flagged in the PR description.
  • _PROVIDER_ALIASES table grows — the local copy now has 30+ entries. Most are 1:1 cosmetic aliases that should ideally live upstream. Future cleanup, not blocking.
  • Comment refers to "WebUI < v0.50.252" at line 980 — assumes the next release will be 0.50.252. The release agent will pick the actual number. Worth updating after merge if it lands as something else.

Recommendation

Approved. Three-layer fix correctly heals both new and existing broken state. Cross-checked against upstream that "custom" is the canonical safe-pass value in all three LOCAL_API_KEY raise sites and takes the no-key-required OpenAI-compat path in resolve_provider_client. The auto-detect block now never produces "local", the read path heals it, the save path refuses to persist it, and the alias table catches any other normalizer.

Parked at approval — ready for the release agent's merge/tag pipeline.

@nesquena-hermes
Copy link
Copy Markdown
Collaborator Author

Released as part of v0.50.253 — thanks self-built (nesquena-hermes)!

This PR was merged into the v0.50.253 release batch via #1391 alongside two other contributor fixes (#1342 by @bergeouss and #1381 by @starship-s). Full CHANGELOG entry: https://github.com/nesquena/hermes-webui/blob/master/CHANGELOG.md.

Pre-release verification:

  • pytest: 3558 passing (up from 3507 baseline)
  • run-browser-tests.sh + webui_qa_agent.sh: all green
  • Comprehensive E2E browser walk (desktop + mobile) — every interactive surface verified
  • Telegram screenshot approval gate: passed
  • Opus Advisor pre-release review: APPROVED with 1 NEEDS-FIX (resolved) and 2 follow-ups (both applied in same release)
  • Independent review by @nesquena: APPROVED with end-to-end behavioral harness verification

Closing this PR — the change is live on master and tagged.

GeoffBao pushed a commit to GeoffBao/hermes-webui that referenced this pull request May 1, 2026
GeoffBao pushed a commit to GeoffBao/hermes-webui that referenced this pull request May 1, 2026
JKJameson pushed a commit to JKJameson/hermes-webui that referenced this pull request May 1, 2026
JKJameson pushed a commit to JKJameson/hermes-webui that referenced this pull request May 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(local-model): 'Provider local set but no API key' fires after a few turns — auxiliary client (compression/vision) has no 'local' branch

2 participants