Skip to content

fix(auxiliary): consolidate auxiliary client UX hardening (#7605)#7647

Merged
teknium1 merged 6 commits intomainfrom
hermes/hermes-d0d52697
Apr 11, 2026
Merged

fix(auxiliary): consolidate auxiliary client UX hardening (#7605)#7647
teknium1 merged 6 commits intomainfrom
hermes/hermes-d0d52697

Conversation

@teknium1
Copy link
Copy Markdown
Contributor

Summary

Consolidated salvage of 5 PRs from tracking issue #7605 — auxiliary client UX hardening for non-OpenRouter providers. All bugs verified present on current main, all cherry-picked with contributor authorship preserved.

Fixes included

1. Honor api_mode in auxiliary client (PR #7630, @kshitijk4poor)

2. Harden fallback behavior for non-OpenRouter users (PR #7594, @kshitijk4poor)

3. Drop incompatible model slugs on cache hit (PR #5804, @eddieran)

4. Validate response shape in call_llm/async_call_llm (PR #7631, @kshitijk4poor)

5. Warn and clear stale OPENAI_BASE_URL on provider switch (PR #7601, @kshitijk4poor)

Test results

  • 102 passed in test_auxiliary_client.py (up from 80 on baseline — 22 new tests)
  • 4 passed in test_clear_stale_base_url.py (new file)
  • 3 pre-existing failures unchanged (OAuth flag, vision client import)
  • All 7 E2E verification tests pass

Files changed

  • agent/auxiliary_client.py — core fixes
  • hermes_cli/main.py — OPENAI_BASE_URL cleanup on provider switch
  • tests/agent/test_auxiliary_client.py — 22 new tests + mock updates
  • tests/hermes_cli/test_clear_stale_base_url.py — new test file (4 tests)

Attribution

Cherry-picked with original authorship preserved:

Part of #7605

kshitijk4poor and others added 6 commits April 11, 2026 01:35
The auxiliary client always calls client.chat.completions.create(),
ignoring the api_mode config flag. This breaks codex-family models
(e.g. gpt-5.3-codex) on direct OpenAI API keys, which need the
/v1/responses endpoint.

Changes:
- Expand _resolve_task_provider_model to return api_mode (5-tuple)
- Read api_mode from auxiliary.{task}.api_mode config and env vars
  (AUXILIARY_{TASK}_API_MODE)
- Pass api_mode through _get_cached_client to resolve_provider_client
- Add _needs_codex_wrap/_wrap_if_needed helpers that wrap plain OpenAI
  clients in CodexAuxiliaryClient when api_mode=codex_responses or
  when auto-detection finds api.openai.com + codex model pattern
- Apply wrapping at all custom endpoint, named custom provider, and
  API-key provider return paths
- Update test mocks for the new 5-tuple return format

Users can now set:
  auxiliary:
    compression:
      model: gpt-5.3-codex
      base_url: https://api.openai.com/v1
      api_mode: codex_responses

Closes #6800
Four fixes to auxiliary_client.py:

1. Respect explicit provider as hard constraint (#7559)
   When auxiliary.{task}.provider is explicitly set (not 'auto'),
   connection/payment errors no longer silently fallback to cloud
   providers. Local-only users (Ollama, vLLM) will no longer get
   unexpected OpenRouter billing from auxiliary tasks.

2. Eliminate model='default' sentinel (#7512)
   _resolve_api_key_provider() no longer sends literal 'default' as
   model name to APIs. Providers without a known aux model in
   _API_KEY_PROVIDER_AUX_MODELS are skipped instead of producing
   model_not_supported errors.

3. Add payment/connection fallback to async_call_llm (#7512)
   async_call_llm now mirrors sync call_llm's fallback logic for
   payment (402) and connection errors. Previously, async consumers
   (session_search, web_tools, vision) got hard failures with no
   recovery. Also fixes hardcoded 'openrouter' fallback to use the
   full auto-detection chain.

4. Use accurate error reason in fallback logs (#7512)
   _try_payment_fallback() now accepts a reason parameter and uses
   it in log messages. Connection timeouts are no longer misleadingly
   logged as 'payment error'.

Closes #7559
Closes #7512
`resolve_provider_client()` already drops OpenRouter-format model slugs
(containing "/") when the resolved provider is not OpenRouter (line 1097).
However, `_get_cached_client()` returns `model or cached_default` directly
on cache hits, bypassing this check entirely.

When the main provider is openai-codex, the auto-detection chain (Step 1
of `_resolve_auto`) caches a CodexAuxiliaryClient. Subsequent auxiliary
calls for different tasks (e.g. compression with `summary_model:
google/gemini-3-flash-preview`) hit the cache and pass the OpenRouter-
format model slug straight to the Codex Responses API, which does not
understand it and returns an empty `response.output`.

This causes two user-visible failures:
- "Invalid API response shape" (empty output after 3 retries)
- "Context length exceeded, cannot compress further" (compression itself
  fails through the same path)

Add `_compat_model()` helper that mirrors the "/" check from
`resolve_provider_client()` and call it on the cache-hit return path.
…7264)

async_call_llm (and call_llm) can return non-OpenAI objects from
custom providers or adapter shims, crashing downstream consumers
with misleading AttributeError ('str' has no attribute 'choices').

Add _validate_llm_response() that checks the response has the
expected .choices[0].message shape before returning. Wraps all
return paths in call_llm, async_call_llm, and fallback paths.
Fails fast with a clear RuntimeError identifying the task, response
type, and a preview of the malformed payload.

Closes #7264
@teknium1 teknium1 merged commit 424b62a into main Apr 11, 2026
5 of 6 checks passed
luyao618 added a commit to luyao618/hermes-agent that referenced this pull request Apr 13, 2026
…ow retries

The blanket `except RuntimeError: return None` in `_summarize_session()`
treated every RuntimeError as non-recoverable, immediately giving up
without retrying. After PR NousResearch#7647 added `_validate_llm_response()`, two
new transient RuntimeErrors ("LLM returned None response" and "LLM
returned invalid response") started being caught by this clause —
causing session_search to fall back to "[Raw preview — summarization
unavailable]" even when a retry would have succeeded.

Invert the logic: only retry on known transient errors from
`_validate_llm_response()`; treat all other RuntimeErrors (no provider,
missing API key, etc.) as non-recoverable and fail fast.

Fixes NousResearch#8045

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
luyao618 added a commit to luyao618/hermes-agent that referenced this pull request Apr 24, 2026
…ow retries

The blanket `except RuntimeError: return None` in `_summarize_session()`
treated every RuntimeError as non-recoverable, immediately giving up
without retrying. After PR NousResearch#7647 added `_validate_llm_response()`, two
new transient RuntimeErrors ("LLM returned None response" and "LLM
returned invalid response") started being caught by this clause —
causing session_search to fall back to "[Raw preview — summarization
unavailable]" even when a retry would have succeeded.

Invert the logic: only retry on known transient errors from
`_validate_llm_response()`; treat all other RuntimeErrors (no provider,
missing API key, etc.) as non-recoverable and fail fast.

Fixes NousResearch#8045

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
luyao618 added a commit to luyao618/hermes-agent that referenced this pull request Apr 28, 2026
…ow retries

The blanket `except RuntimeError: return None` in `_summarize_session()`
treated every RuntimeError as non-recoverable, immediately giving up
without retrying. After PR NousResearch#7647 added `_validate_llm_response()`, two
new transient RuntimeErrors ("LLM returned None response" and "LLM
returned invalid response") started being caught by this clause —
causing session_search to fall back to "[Raw preview — summarization
unavailable]" even when a retry would have succeeded.

Invert the logic: only retry on known transient errors from
`_validate_llm_response()`; treat all other RuntimeErrors (no provider,
missing API key, etc.) as non-recoverable and fail fast.

Fixes NousResearch#8045

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
luyao618 added a commit to luyao618/hermes-agent that referenced this pull request Apr 30, 2026
…ow retries

The blanket `except RuntimeError: return None` in `_summarize_session()`
treated every RuntimeError as non-recoverable, immediately giving up
without retrying. After PR NousResearch#7647 added `_validate_llm_response()`, two
new transient RuntimeErrors ("LLM returned None response" and "LLM
returned invalid response") started being caught by this clause —
causing session_search to fall back to "[Raw preview — summarization
unavailable]" even when a retry would have succeeded.

Invert the logic: only retry on known transient errors from
`_validate_llm_response()`; treat all other RuntimeErrors (no provider,
missing API key, etc.) as non-recoverable and fail fast.

Fixes NousResearch#8045

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment