fix(auxiliary): consolidate auxiliary client UX hardening (#7605) by teknium1 · Pull Request #7647 · NousResearch/hermes-agent

teknium1 · 2026-04-11T08:43:52Z

Summary

Consolidated salvage of 5 PRs from tracking issue #7605 — auxiliary client UX hardening for non-OpenRouter providers. All bugs verified present on current main, all cherry-picked with contributor authorship preserved.

Fixes included

1. Honor api_mode in auxiliary client (PR #7630, @kshitijk4poor)

Expand _resolve_task_provider_model from 4-tuple to 5-tuple to include api_mode
Add api_mode parameter to resolve_provider_client and _get_cached_client
Add _needs_codex_wrap/_wrap_if_needed helpers for Responses API routing
Config: auxiliary.{task}.api_mode: codex_responses and env AUXILIARY_{TASK}_API_MODE
Closes [Feature]: auxiliary_client.py should honor api_mode flag (parallel to runtime_provider.py) #6800

2. Harden fallback behavior for non-OpenRouter users (PR #7594, @kshitijk4poor)

Explicit provider = hard constraint: When auxiliary.{task}.provider is explicitly set (not auto), payment/connection errors no longer silently fall back to cloud providers. Local-only users (Ollama, vLLM) will no longer get unexpected OpenRouter billing.
Eliminate model="default" sentinel: Providers not in _API_KEY_PROVIDER_AUX_MODELS are skipped instead of sending literal "default" to APIs.
Async fallback parity: async_call_llm now mirrors call_llm's payment/connection fallback chain (was completely missing).
Fix hardcoded openrouter fallback: async_call_llm now uses full auto-detection chain instead of hardcoded "openrouter".
Accurate fallback logs: _try_payment_fallback() accepts reason parameter — connection timeouts no longer logged as "payment error".
Closes [Bug] Auxiliary model silently falls back to Gemini Flash on OpenRouter even when user configured local-only #7559, auxiliary_client.py: Fallback from custom/Ollama provider sends model="default" to API, causing model_not_supported #7512

3. Drop incompatible model slugs on cache hit (PR #5804, @eddieran)

Add _compat_model() helper that mirrors the / slug check from resolve_provider_client()
Applied on both sync and async cache-hit return paths in _get_cached_client()
Closes Auxiliary client cache hit bypasses model slug compatibility check #5809

4. Validate response shape in call_llm/async_call_llm (PR #7631, @kshitijk4poor)

Add _validate_llm_response() wrapping all return paths
Catches malformed responses (bare strings, dicts) with clear RuntimeError instead of misleading AttributeError: 'str' object has no attribute 'choices' downstream
Closes [Bug]: auxiliary async model path can return invalid payloads and crash session_search with misleading AttributeError #7264

5. Warn and clear stale OPENAI_BASE_URL on provider switch (PR #7601, @kshitijk4poor)

Startup warning in _resolve_auto() when OPENAI_BASE_URL conflicts with named provider
Proactive cleanup in select_provider_and_model() after provider switch
Closes OPENAI_BASE_URL env var not cleared on provider switch — silently poisons auxiliary clients #5161

Test results

102 passed in test_auxiliary_client.py (up from 80 on baseline — 22 new tests)
4 passed in test_clear_stale_base_url.py (new file)
3 pre-existing failures unchanged (OAuth flag, vision client import)
All 7 E2E verification tests pass

Files changed

agent/auxiliary_client.py — core fixes
hermes_cli/main.py — OPENAI_BASE_URL cleanup on provider switch
tests/agent/test_auxiliary_client.py — 22 new tests + mock updates
tests/hermes_cli/test_clear_stale_base_url.py — new test file (4 tests)

Attribution

Cherry-picked with original authorship preserved:

Part of #7605

The auxiliary client always calls client.chat.completions.create(), ignoring the api_mode config flag. This breaks codex-family models (e.g. gpt-5.3-codex) on direct OpenAI API keys, which need the /v1/responses endpoint. Changes: - Expand _resolve_task_provider_model to return api_mode (5-tuple) - Read api_mode from auxiliary.{task}.api_mode config and env vars (AUXILIARY_{TASK}_API_MODE) - Pass api_mode through _get_cached_client to resolve_provider_client - Add _needs_codex_wrap/_wrap_if_needed helpers that wrap plain OpenAI clients in CodexAuxiliaryClient when api_mode=codex_responses or when auto-detection finds api.openai.com + codex model pattern - Apply wrapping at all custom endpoint, named custom provider, and API-key provider return paths - Update test mocks for the new 5-tuple return format Users can now set: auxiliary: compression: model: gpt-5.3-codex base_url: https://api.openai.com/v1 api_mode: codex_responses Closes #6800

Four fixes to auxiliary_client.py: 1. Respect explicit provider as hard constraint (#7559) When auxiliary.{task}.provider is explicitly set (not 'auto'), connection/payment errors no longer silently fallback to cloud providers. Local-only users (Ollama, vLLM) will no longer get unexpected OpenRouter billing from auxiliary tasks. 2. Eliminate model='default' sentinel (#7512) _resolve_api_key_provider() no longer sends literal 'default' as model name to APIs. Providers without a known aux model in _API_KEY_PROVIDER_AUX_MODELS are skipped instead of producing model_not_supported errors. 3. Add payment/connection fallback to async_call_llm (#7512) async_call_llm now mirrors sync call_llm's fallback logic for payment (402) and connection errors. Previously, async consumers (session_search, web_tools, vision) got hard failures with no recovery. Also fixes hardcoded 'openrouter' fallback to use the full auto-detection chain. 4. Use accurate error reason in fallback logs (#7512) _try_payment_fallback() now accepts a reason parameter and uses it in log messages. Connection timeouts are no longer misleadingly logged as 'payment error'. Closes #7559 Closes #7512

`resolve_provider_client()` already drops OpenRouter-format model slugs (containing "/") when the resolved provider is not OpenRouter (line 1097). However, `_get_cached_client()` returns `model or cached_default` directly on cache hits, bypassing this check entirely. When the main provider is openai-codex, the auto-detection chain (Step 1 of `_resolve_auto`) caches a CodexAuxiliaryClient. Subsequent auxiliary calls for different tasks (e.g. compression with `summary_model: google/gemini-3-flash-preview`) hit the cache and pass the OpenRouter- format model slug straight to the Codex Responses API, which does not understand it and returns an empty `response.output`. This causes two user-visible failures: - "Invalid API response shape" (empty output after 3 retries) - "Context length exceeded, cannot compress further" (compression itself fails through the same path) Add `_compat_model()` helper that mirrors the "/" check from `resolve_provider_client()` and call it on the cache-hit return path.

…7264) async_call_llm (and call_llm) can return non-OpenAI objects from custom providers or adapter shims, crashing downstream consumers with misleading AttributeError ('str' has no attribute 'choices'). Add _validate_llm_response() that checks the response has the expected .choices[0].message shape before returning. Wraps all return paths in call_llm, async_call_llm, and fallback paths. Fails fast with a clear RuntimeError identifying the task, response type, and a preview of the malformed payload. Closes #7264

…ow retries The blanket `except RuntimeError: return None` in `_summarize_session()` treated every RuntimeError as non-recoverable, immediately giving up without retrying. After PR NousResearch#7647 added `_validate_llm_response()`, two new transient RuntimeErrors ("LLM returned None response" and "LLM returned invalid response") started being caught by this clause — causing session_search to fall back to "[Raw preview — summarization unavailable]" even when a retry would have succeeded. Invert the logic: only retry on known transient errors from `_validate_llm_response()`; treat all other RuntimeErrors (no provider, missing API key, etc.) as non-recoverable and fail fast. Fixes NousResearch#8045 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

kshitijk4poor and others added 6 commits April 11, 2026 01:35

fix: warn and clear stale OPENAI_BASE_URL on provider switch (#5161)

74a0773

fix: update async fallback test mock to 5-tuple for api_mode

8f676ea

teknium1 merged commit 424b62a into main Apr 11, 2026
5 of 6 checks passed

This was referenced Apr 13, 2026

fix(tools): narrow RuntimeError catch in session summarization to allow retries #8788

Closed

[Bug]: Telegram session summarization always falls back to summarization unavailable even when jsonl is complete #8045

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(auxiliary): consolidate auxiliary client UX hardening (#7605)#7647

fix(auxiliary): consolidate auxiliary client UX hardening (#7605)#7647
teknium1 merged 6 commits intomainfrom
hermes/hermes-d0d52697

teknium1 commented Apr 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

teknium1 commented Apr 11, 2026

Summary

Fixes included

1. Honor api_mode in auxiliary client (PR #7630, @kshitijk4poor)

2. Harden fallback behavior for non-OpenRouter users (PR #7594, @kshitijk4poor)

3. Drop incompatible model slugs on cache hit (PR #5804, @eddieran)

4. Validate response shape in call_llm/async_call_llm (PR #7631, @kshitijk4poor)

5. Warn and clear stale OPENAI_BASE_URL on provider switch (PR #7601, @kshitijk4poor)

Test results

Files changed

Attribution

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants