fix(api): resolve model provider from config to prevent misrouting#4
Conversation
When the model dropdown sends a prefixed ID like "anthropic/claude-xxx", AIAgent interprets the provider/model format as an OpenRouter path and routes through OpenRouter instead of the direct Anthropic API. Fix: read the configured provider from config.yaml model section. If the model ID starts with the configured provider name followed by "/", strip that prefix and pass the provider explicitly to AIAgent. This ensures direct API providers (Anthropic, OpenAI, etc.) are used when configured, regardless of the model ID format from the dropdown.
nesquena
left a comment
There was a problem hiding this comment.
Review: PR #4 — model provider routing fix
Problem Statement
Valid concern — when the dropdown sends a prefixed ID like anthropic/claude-sonnet-4.6, it could be misinterpreted as an OpenRouter model path. This is a real issue worth fixing.
Issues Found
1. provider parameter may not exist on AIAgent (Critical)
Per ARCHITECTURE.md (lines 566-574), AIAgent's documented constructor parameters are: model, platform, quiet_mode, enabled_toolsets, session_id, stream_delta_callback, tool_progress_callback. There is no provider parameter. Without access to the hermes-agent source to verify, this could either silently pass (if AIAgent uses **kwargs) or raise a TypeError at runtime.
Can you confirm this parameter exists in your hermes-agent version?
2. Wrong provider for cross-provider model selection (Logic Bug)
The fix always passes the globally configured provider to AIAgent, even when the user selects a model from a different provider. Example:
- Config has
provider: anthropic - User selects
openai/gpt-5.4-minifrom the dropdown - The prefix doesn't match, so the model stays as
openai/gpt-5.4-mini - But
provider='anthropic'is still passed to AIAgent - This would try to route an OpenAI model through the Anthropic API
The provider should be inferred from the model ID being selected, not from the global config.
3. Duplicated logic
The same provider-resolution block is copy-pasted in both routes.py (sync path) and streaming.py (async path). This should be a shared helper in api/config.py.
4. Variable naming style
Using underscore-prefixed names (_model, _prov, _mc, _hcfg) for local variables is unconventional — in Python, underscore prefix typically means module-private or 'unused'. Regular names like resolved_model and resolved_provider would be clearer.
Suggested Approach
A cleaner fix would be to add a helper in api/config.py:
def resolve_model_and_provider(model_id):
"""Strip provider prefix from model ID and return (model, provider)."""
model_cfg = cfg.get('model', {})
configured_provider = model_cfg.get('provider') if isinstance(model_cfg, dict) else None
if configured_provider and '/' in model_id and model_id.startswith(configured_provider + '/'):
return model_id.split('/', 1)[1], configured_provider
return model_id, NoneThen both routes.py and streaming.py can call it. But the cross-provider issue (point 2) still needs to be addressed — likely by parsing the provider from the model ID prefix itself, not from the global config.
Tests
Ran full test suite: 201 passed, 23 failed — all 23 failures are pre-existing. No regressions.
Verdict
The concept is right but the implementation has a logic bug with cross-provider model selection and a potential runtime error with the provider parameter. Requesting changes before merge.
nesquena
left a comment
There was a problem hiding this comment.
Update on point #1 from my review: I verified against the upstream hermes-agent source — AIAgent.__init__ does accept provider as a keyword argument (line 469 of run_agent.py). So that concern is resolved. Apologies for the incorrect call.
The remaining issues still stand:
…outing Replace duplicated inline provider resolution in routes.py and streaming.py with a shared resolve_model_provider() helper in config.py. Improvements over original: - If model ID has a prefix matching any known direct-API provider (not just the config provider), strip it and route correctly. This handles edge cases like localStorage restoring a model from a different provider group. - Single source of truth for the resolution logic. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
nesquena
left a comment
There was a problem hiding this comment.
Thanks @deboste — you identified a real and important routing issue here. I pushed a follow-up commit that:
- Extracts the resolution logic into a shared
resolve_model_provider()helper inconfig.py(eliminates the duplication between routes.py and streaming.py) - Extends the prefix matching to handle any known direct-API provider, not just the config provider — this covers edge cases like localStorage restoring a model ID from a different provider group
With those improvements in place, this is ready to merge. Thanks again for the solid bug find!
fix(api): resolve model provider from config to prevent misrouting
- Title/badge/source: 2025 → 2026, sources updated to Chatbot Arena/BenchLM - Overall: Opus 4.6 (#1, 1504 Arena Elo), Gemini 3.1 Pro (#2), GPT-5.4 (#3), Sonnet 4.6 (#4), DeepSeek V3.2 (#5, replaces DeepSeek R1) - Coding: Opus 4.6 (#1, powers Claude Code/Cursor), GPT-5.4 (#2, SWE-Pro 57.7%), Sonnet 4.6 (#3), Gemini 3.1 Pro (#4), DeepSeek V3.2 (#5) Removed: Sonnet 4.5, Gemini 2.5 Pro, GPT-5.3 Codex - Writing: Opus 4.6 (#1, Mazur 8.53), Gemini 3.1 Pro (#2, Arena CW 1487), Sonnet 4.6 (#3, EQ-Bench CW 1936), GPT-5.4 (#4), Meta Muse Spark (#5) Removed: Llama 4 Maverick (replaced by Meta Muse Spark) - Search: Gemini 3.1 Pro (#1), Grok 4 (#2, live X data), Sonnet 4.6+tool (#3), GPT-5.4 (#4), Gemini 3 Flash Thinking (#5) Added Grok 4 for real-time social/X data - Reasoning: Gemini 3.1 Pro (#1, GPQA 95.45%), GPT-5.4 (#2), Opus 4.6 (#3), Gemini 3 Flash Thinking (#4, best value), DeepSeek V3.2 (#5) Removed: Kimi K2, DeepSeek R1 standalone - Quick picker: all model names updated to 2026 versions - Setup boxes: OpenAI gpt-5-4 names, Google gemini-3-1-pro-preview, self-hosted section updated to DeepSeek V3.2 + Muse Spark
Coding section: - Gemini 3.1 Pro: add Terminal-Bench 78.4% (highest of any frontier model on CLI/DevOps) Score badge updated to show Terminal-Bench rather than SWE-bench Verified - GPT-5.4: note Terminal-Bench 75.1% in description, consolidate pill text - #5 DeepSeek V3.2 → Qwen 3.6-Plus: leads Terminal-Bench at 61.6%, 88.2% GPQA, 1M context, available now on Alibaba Cloud + OpenRouter Writing section (reordered based on EQ-Bench CW scores): - #1 Claude Sonnet 4.6 (1936 EQ-Bench CW — highest, best voice consistency) - #2 Claude Opus 4.6 (Mazur 8.53, IF Arena #1, 1M context for literary depth) - #3 Gemini 3.1 Pro (Arena CW #1 1487, AI-tell avoidance, 2M context) - #4 GPT-5.4 (noted as ~9th on Arena CW, better for structured/commercial writing) - #5 Meta Muse Spark → Kimi K2.5 (/usr/bin/bash.60/.50, ~1700 EQ-Bench CW, live API) Muse Spark removed — no commercial API available yet Reasoning section: - Gemini 3.1 Pro GPQA: 95.45% → 94.1% (more conservative/recent figure, consistent with both agents' data) - Added ARC-AGI-2 77.1% for Gemini 3.1 Pro (#1 on visual reasoning too) - Opus 4.6: added note that Sonnet leads GDPval-AA (1633 Elo #1) for throughput - #5 DeepSeek V3.2 → Qwen 3.6-Plus (88.2% GPQA, 1M context, same model as coding) Quick picker: - Creative writing: Opus → Sonnet 4.6 (EQ-Bench #1, 85% cheaper) - Hard reasoning: 95.45% → 94.1%, add ARC-AGI-2 mention - Budget pick: DeepSeek V3.2 → Gemini 3 Flash Thinking (/usr/bin/bash.50/1M, 89.8% GPQA) Setup boxes: - Self-hosted: Muse Spark → Qwen 3.6-Plus + Gemma 4 26B MoE (Apache 2.0, 82.3% GPQA with 3.8B active params, best edge/self-hosted reasoning) Overall section: unchanged (top 5 still correct per both agents) Search section: unchanged (no new data from either agent)
index.html:
- Hero badge: 56k+ → 77k+ GitHub stars (NousResearch/hermes-agent: 76,811)
community/index.html:
- All 24 project star counts updated to live GitHub values (April 13 2026)
Notable bumps: gbrain 4.8k→7.4k, hermes-webui 1.3k→1.8k, hermes-agent 56k+→77k,
hermes-workspace 1.1k→1.3k, hudui 571→827, self-evolution 893→1.6k
- Removed awizermann/scarf (repo no longer exists on GitHub)
- Added 3 new high-value projects:
- vectorize-io/hindsight (9.1k ⭐) → Memory Providers: Official Hermes memory plugin,
biomimetic 3-tier memory, topped LongMemEval Jan 2026
- rohitg00/agentmemory (1.3k ⭐) → Memory Providers: #1 persistent memory for agents,
explicit Hermes integration path, 43 MCP tools
- builderz-labs/mission-control (4.1k ⭐) → Tools & Orchestration: self-hosted agent
platform with Hermes gateway integration
- Updated project count: 25 → 27
models/index.html:
- Gemini 3.1 Pro: bumped from #4 Arena (1492 Elo) to #1 (1505 Elo)
- Claude Opus 4.6: updated from #1 (1504) to #2 (1503 Elo) — still top tier
- Updated datestamp to April 13, 2026
fix(api): resolve model provider from config to prevent misrouting
…, CLI busy Re-applied #4 + nesquena#6 against upstream v0.50.217: - Add /api/gateway-status endpoint returning instance statuses - Add status dot CSS (.status-dot variants + .stream-cursor + accent rule) - Add gatewayDot inside composer-model-chip (replaces removed topbar modelChip) - Append per-session .status-dot in renderSessionListFromCache with data-session-id, data-cli-session, data-updated-at metadata - Per-conversation accent via _hashHue() → --conv-accent CSS var - Append pollGatewayStatus() (10s) + updateSessionDots() (3s) + setBusy hook to static/ui.js — recognizes both composerModelChip and modelChip ids - get_gateway_instance_statuses() helper in api/gateway_provider.py Streaming-cursor injection in messages.js was skipped because upstream replaced the simple Thinking placeholder with smd-based incremental rendering; upstream's session-state-indicator already covers that role. Co-authored-by: Copilot <[email protected]>
Three changes from the pre-merge Opus review: **MUST-FIX** — XPC_SERVICE_NAME false-positive on macOS Terminal macOS launchd sets `XPC_SERVICE_NAME` in EVERY Terminal-spawned shell, not just real services. Typical noise values: `"0"` (truthy in Python!) and `"application.com.apple.Terminal.<UUID>"`. A bare `os.environ.get(name)` existence check would auto-promote interactive `./start.sh` runs to foreground mode on every Mac dev machine — silently breaking the most common installation path (no /health probe, no browser open, no log file, hanging shell). Fix: new `_is_real_supervisor_value()` helper that filters noise. For `XPC_SERVICE_NAME` specifically, reject `"0"` and any `"application.*"` prefix. Real launchd plists use reverse-DNS Label form (`com.<rdns>.<svc>`) which still triggers correctly. 7 new tests in `TestXPCServiceNameNoiseFilter`: - 4 noise values (`0`, Terminal.app, iTerm2, VSCode) → no detection - 3 real Label forms → correct detection - Mixed env with XPC noise + real INVOCATION_ID → falls through to systemd **SHOULD-FIX 1** — Test env leakage The original `clean_env` fixture stripped supervisor-detection env vars but not the resolved bootstrap vars (HERMES_WEBUI_HOST/PORT/AGENT_DIR) that `main()` mutates onto `os.environ`. After `test_foreground_exports_resolved_env_vars` ran, later tests would import bootstrap with polluted defaults (DEFAULT_HOST="0.0.0.0" instead of "127.0.0.1"). Existing assertions still passed (tautological vs DEFAULT_*), but it was a footgun for future tests. Fix: extend `clean_env` to also `delenv` the three resolved vars before each test. **SHOULD-FIX 2** — Pre-execv executability guard If `discover_launcher_python` returns a path that doesn't exist or isn't executable, `os.execv` raises OSError → wrapper catches → SystemExit(1) → supervisor restarts → loop forever. That's exactly the failure mode this PR is supposed to eliminate. Fix: `os.access(python_exe, os.X_OK)` check before execv. Converts infinite supervisor loop into a single visible RuntimeError. 1 new test in `TestForegroundExecutabilityGuard` pinning that the guard fires before execv when the python path is non-executable. **Docs** — supervisor.md updates - New section explaining the XPC_SERVICE_NAME noise filter and what values trigger / don't trigger detection - New section listing supervisors that are NOT auto-detected (runit, daemontools, PM2, Foreman/Honcho, custom shell-script supervisors) with explicit recommendation to set HERMES_WEBUI_FOREGROUND=1 Verification - 3820 tests pass (+9 from this commit's new tests vs the original PR push of 3811) - Filter manually verified end-to-end with the live os.environ: XPC=0 → None, XPC=application.* → None, XPC=com.example.foo → triggers - run-browser-tests.sh ALL CHECKS PASSED on the worktree Items deferred from the Opus review - #4 chdir target may not exist: REPO_ROOT comes from __file__.resolve() so it's stable; not a real concern in practice - #6 two startup messages in foreground mode: cosmetic, useful for diagnostics - #7 stricter explicit-only mode: leaves user the override of just not passing --foreground (current behavior) - #8 test stub return value: trivial, can fix later if regression surface - #9 argparse positional-after-option ordering: test reads fine These can be follow-up issues if anyone hits them.
…n guard) CHANGELOG, ROADMAP, TESTING bumped (3925 → 3929 tests collected). Opus SHOULD-FIX absorbed in-release: tests #1-3 documented the dedup contract via direct construction but did not invoke get_models_grouped(). Test #4 (test_get_models_grouped_unconfigured_providers_get_independent_dicts) inspects the live source for the literal copy.deepcopy(auto_detected_models) call AND runs an end-to-end smoke of the fixed assignment loop. A future refactor that removes the deepcopy at api/config.py:2078 will fail this test immediately.
SHOULD-FIX #1 (renamed-root client cross-alias): drop strict-equality client filter at static/sessions.js:1853. Server-side _profiles_match cross-aliases 'default'-tagged rows to a renamed root 'kinni'; the strict-equality client would reject them, dropping every legacy session for renamed-root users. The server is now solely authoritative for profile scoping. SHOULD-FIX #2 (messaging-source dedupe ordering): _keep_latest_messaging_session_per_source now runs AFTER the profile filter at api/routes.py:2078. Before, it ran on the merged-cross-profile list with profile-blind keys, discarding the older profile's row across profiles before the scope filter — leaving zero rows for any messaging identity the active profile shared with another profile. NIT #3: _projects_migrated flag now set only AFTER successful save_projects. NIT #4: cleaned dead test code in test_is_root_profile_invalidation_drops_stale. NIT #5: _create_profile_fallback's clone_from=='default' literal now routes through _is_root_profile() for parity with the 5 other callsites. +2 regression tests pin the SHOULD-FIX shapes: - test_keep_latest_messaging_runs_after_profile_filter (source-string ordering) - test_static_sessions_js_trusts_server_profile_scoping (no client re-filter) 4173 -> 4175 tests pass. 0 regressions.
When the model dropdown sends a prefixed ID like "anthropic/claude-xxx", AIAgent interprets the provider/model format as an OpenRouter path and routes through OpenRouter instead of the direct Anthropic API.
Fix: read the configured provider from config.yaml model section. If the model ID starts with the configured provider name followed by "/", strip that prefix and pass the provider explicitly to AIAgent. This ensures direct API providers (Anthropic, OpenAI, etc.) are used when configured, regardless of the model ID format from the dropdown.