fix(auxiliary): consolidate auxiliary client UX hardening (#7605)#7647
Merged
fix(auxiliary): consolidate auxiliary client UX hardening (#7605)#7647
Conversation
The auxiliary client always calls client.chat.completions.create(),
ignoring the api_mode config flag. This breaks codex-family models
(e.g. gpt-5.3-codex) on direct OpenAI API keys, which need the
/v1/responses endpoint.
Changes:
- Expand _resolve_task_provider_model to return api_mode (5-tuple)
- Read api_mode from auxiliary.{task}.api_mode config and env vars
(AUXILIARY_{TASK}_API_MODE)
- Pass api_mode through _get_cached_client to resolve_provider_client
- Add _needs_codex_wrap/_wrap_if_needed helpers that wrap plain OpenAI
clients in CodexAuxiliaryClient when api_mode=codex_responses or
when auto-detection finds api.openai.com + codex model pattern
- Apply wrapping at all custom endpoint, named custom provider, and
API-key provider return paths
- Update test mocks for the new 5-tuple return format
Users can now set:
auxiliary:
compression:
model: gpt-5.3-codex
base_url: https://api.openai.com/v1
api_mode: codex_responses
Closes #6800
Four fixes to auxiliary_client.py: 1. Respect explicit provider as hard constraint (#7559) When auxiliary.{task}.provider is explicitly set (not 'auto'), connection/payment errors no longer silently fallback to cloud providers. Local-only users (Ollama, vLLM) will no longer get unexpected OpenRouter billing from auxiliary tasks. 2. Eliminate model='default' sentinel (#7512) _resolve_api_key_provider() no longer sends literal 'default' as model name to APIs. Providers without a known aux model in _API_KEY_PROVIDER_AUX_MODELS are skipped instead of producing model_not_supported errors. 3. Add payment/connection fallback to async_call_llm (#7512) async_call_llm now mirrors sync call_llm's fallback logic for payment (402) and connection errors. Previously, async consumers (session_search, web_tools, vision) got hard failures with no recovery. Also fixes hardcoded 'openrouter' fallback to use the full auto-detection chain. 4. Use accurate error reason in fallback logs (#7512) _try_payment_fallback() now accepts a reason parameter and uses it in log messages. Connection timeouts are no longer misleadingly logged as 'payment error'. Closes #7559 Closes #7512
`resolve_provider_client()` already drops OpenRouter-format model slugs (containing "/") when the resolved provider is not OpenRouter (line 1097). However, `_get_cached_client()` returns `model or cached_default` directly on cache hits, bypassing this check entirely. When the main provider is openai-codex, the auto-detection chain (Step 1 of `_resolve_auto`) caches a CodexAuxiliaryClient. Subsequent auxiliary calls for different tasks (e.g. compression with `summary_model: google/gemini-3-flash-preview`) hit the cache and pass the OpenRouter- format model slug straight to the Codex Responses API, which does not understand it and returns an empty `response.output`. This causes two user-visible failures: - "Invalid API response shape" (empty output after 3 retries) - "Context length exceeded, cannot compress further" (compression itself fails through the same path) Add `_compat_model()` helper that mirrors the "/" check from `resolve_provider_client()` and call it on the cache-hit return path.
…7264) async_call_llm (and call_llm) can return non-OpenAI objects from custom providers or adapter shims, crashing downstream consumers with misleading AttributeError ('str' has no attribute 'choices'). Add _validate_llm_response() that checks the response has the expected .choices[0].message shape before returning. Wraps all return paths in call_llm, async_call_llm, and fallback paths. Fails fast with a clear RuntimeError identifying the task, response type, and a preview of the malformed payload. Closes #7264
This was referenced Apr 11, 2026
luyao618
added a commit
to luyao618/hermes-agent
that referenced
this pull request
Apr 13, 2026
…ow retries The blanket `except RuntimeError: return None` in `_summarize_session()` treated every RuntimeError as non-recoverable, immediately giving up without retrying. After PR NousResearch#7647 added `_validate_llm_response()`, two new transient RuntimeErrors ("LLM returned None response" and "LLM returned invalid response") started being caught by this clause — causing session_search to fall back to "[Raw preview — summarization unavailable]" even when a retry would have succeeded. Invert the logic: only retry on known transient errors from `_validate_llm_response()`; treat all other RuntimeErrors (no provider, missing API key, etc.) as non-recoverable and fail fast. Fixes NousResearch#8045 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
luyao618
added a commit
to luyao618/hermes-agent
that referenced
this pull request
Apr 24, 2026
…ow retries The blanket `except RuntimeError: return None` in `_summarize_session()` treated every RuntimeError as non-recoverable, immediately giving up without retrying. After PR NousResearch#7647 added `_validate_llm_response()`, two new transient RuntimeErrors ("LLM returned None response" and "LLM returned invalid response") started being caught by this clause — causing session_search to fall back to "[Raw preview — summarization unavailable]" even when a retry would have succeeded. Invert the logic: only retry on known transient errors from `_validate_llm_response()`; treat all other RuntimeErrors (no provider, missing API key, etc.) as non-recoverable and fail fast. Fixes NousResearch#8045 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
luyao618
added a commit
to luyao618/hermes-agent
that referenced
this pull request
Apr 28, 2026
…ow retries The blanket `except RuntimeError: return None` in `_summarize_session()` treated every RuntimeError as non-recoverable, immediately giving up without retrying. After PR NousResearch#7647 added `_validate_llm_response()`, two new transient RuntimeErrors ("LLM returned None response" and "LLM returned invalid response") started being caught by this clause — causing session_search to fall back to "[Raw preview — summarization unavailable]" even when a retry would have succeeded. Invert the logic: only retry on known transient errors from `_validate_llm_response()`; treat all other RuntimeErrors (no provider, missing API key, etc.) as non-recoverable and fail fast. Fixes NousResearch#8045 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
luyao618
added a commit
to luyao618/hermes-agent
that referenced
this pull request
Apr 30, 2026
…ow retries The blanket `except RuntimeError: return None` in `_summarize_session()` treated every RuntimeError as non-recoverable, immediately giving up without retrying. After PR NousResearch#7647 added `_validate_llm_response()`, two new transient RuntimeErrors ("LLM returned None response" and "LLM returned invalid response") started being caught by this clause — causing session_search to fall back to "[Raw preview — summarization unavailable]" even when a retry would have succeeded. Invert the logic: only retry on known transient errors from `_validate_llm_response()`; treat all other RuntimeErrors (no provider, missing API key, etc.) as non-recoverable and fail fast. Fixes NousResearch#8045 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Consolidated salvage of 5 PRs from tracking issue #7605 — auxiliary client UX hardening for non-OpenRouter providers. All bugs verified present on current main, all cherry-picked with contributor authorship preserved.
Fixes included
1. Honor api_mode in auxiliary client (PR #7630, @kshitijk4poor)
_resolve_task_provider_modelfrom 4-tuple to 5-tuple to includeapi_modeapi_modeparameter toresolve_provider_clientand_get_cached_client_needs_codex_wrap/_wrap_if_neededhelpers for Responses API routingauxiliary.{task}.api_mode: codex_responsesand envAUXILIARY_{TASK}_API_MODE2. Harden fallback behavior for non-OpenRouter users (PR #7594, @kshitijk4poor)
auxiliary.{task}.provideris explicitly set (notauto), payment/connection errors no longer silently fall back to cloud providers. Local-only users (Ollama, vLLM) will no longer get unexpected OpenRouter billing.model="default"sentinel: Providers not in_API_KEY_PROVIDER_AUX_MODELSare skipped instead of sending literal"default"to APIs.async_call_llmnow mirrorscall_llm's payment/connection fallback chain (was completely missing).async_call_llmnow uses full auto-detection chain instead of hardcoded"openrouter"._try_payment_fallback()acceptsreasonparameter — connection timeouts no longer logged as "payment error".3. Drop incompatible model slugs on cache hit (PR #5804, @eddieran)
_compat_model()helper that mirrors the/slug check fromresolve_provider_client()_get_cached_client()4. Validate response shape in call_llm/async_call_llm (PR #7631, @kshitijk4poor)
_validate_llm_response()wrapping all return pathsRuntimeErrorinstead of misleadingAttributeError: 'str' object has no attribute 'choices'downstream5. Warn and clear stale OPENAI_BASE_URL on provider switch (PR #7601, @kshitijk4poor)
_resolve_auto()whenOPENAI_BASE_URLconflicts with named providerselect_provider_and_model()after provider switchTest results
test_auxiliary_client.py(up from 80 on baseline — 22 new tests)test_clear_stale_base_url.py(new file)Files changed
agent/auxiliary_client.py— core fixeshermes_cli/main.py— OPENAI_BASE_URL cleanup on provider switchtests/agent/test_auxiliary_client.py— 22 new tests + mock updatestests/hermes_cli/test_clear_stale_base_url.py— new test file (4 tests)Attribution
Cherry-picked with original authorship preserved:
Part of #7605