Skip to content

[Bug]: auxiliary async model path can return invalid payloads and crash session_search with misleading AttributeError #7264

@woaim65

Description

@woaim65

Bug Description

session_search / auxiliary async model calls can silently accept a non-OpenAI response object, then fail later with a misleading 'str' object has no attribute 'choices' traceback.

In the same custom-provider setup, auxiliary failures also surface as HTTP 400: No models provided and region-mismatch errors, even though the configured main model/provider resolve correctly.

This looks like an auxiliary-client contract / fallback bug, not a user config bug.

Steps to Reproduce

  1. Configure Hermes with a working custom main model in ~/.hermes/config.yaml, for example:
model:
  provider: custom
  default: gpt-5
  base_url: https://httpsgood.abrdns.com/v1
  api_key: ...
  1. Trigger a session_search path (or directly call async_call_llm(task="session_search", ...)).

  2. Force the async auxiliary client path to return a non-OpenAI payload (for example, a bare string from a mocked adapter / malformed fallback path).

  3. Observe that async_call_llm() returns the invalid payload unchanged, and tools/session_search_tool.py crashes later when it assumes a normal OpenAI response object.

Minimal repro used locally:

import asyncio
from agent import auxiliary_client as ac
from tools import session_search_tool as sst

class FakeAsyncCompletions:
    async def create(self, **kwargs):
        return 'not-a-response-object'

class FakeAsyncClient:
    def __init__(self):
        self.chat = type('Chat', (), {'completions': FakeAsyncCompletions()})()
        self.base_url = 'https://example.invalid/v1'

orig_get = ac._get_cached_client
orig_resolve = ac._resolve_task_provider_model
ac._resolve_task_provider_model = lambda *a, **k: ('custom', 'gpt-5', 'https://example.invalid/v1', 'x')
ac._get_cached_client = lambda *a, **k: (FakeAsyncClient(), 'gpt-5')

async def main():
    out = await sst._summarize_session('hello world', 'hello', {'source':'test','started_at':'now'})
    print(repr(out))

asyncio.run(main())

Observed traceback:

WARNING:root:Session summarization failed after 3 attempts: 'str' object has no attribute 'choices'
Traceback (most recent call last):
  File "/home/oz/hermes-agent/tools/session_search_tool.py", line 164, in _summarize_session
    content = extract_content_or_reasoning(response)
  File "/home/oz/hermes-agent/agent/auxiliary_client.py", line 2114, in extract_content_or_reasoning
    msg = response.choices[0].message
AttributeError: 'str' object has no attribute 'choices'

Expected Behavior

  • async_call_llm() should enforce a response contract and reject malformed payloads immediately with a clear error.
  • Auxiliary task failures should not degrade into misleading downstream AttributeErrors.
  • A valid model.provider=custom + model.default=gpt-5 config should not randomly surface as No models provided in auxiliary paths if the main config resolves correctly.

Actual Behavior

  • Main config resolves correctly.
  • Auxiliary async path can propagate an invalid payload unchanged.
  • session_search then fails later in extract_content_or_reasoning() with 'str' object has no attribute 'choices'.
  • In real gateway logs, related auxiliary failures showed up as:
    • HTTP 400: No models provided
    • Session summarization failed after 3 attempts
    • 403 This model is not available in your region.

Evidence Collected

I verified directly in the current codebase that the main config resolves correctly:

from hermes_cli.config import load_config
from agent.auxiliary_client import _read_main_model, _read_main_provider, get_text_auxiliary_client, resolve_provider_client

print(load_config().get('model'))
print(_read_main_model())      # 'gpt-5'
print(_read_main_provider())   # 'custom'
print(get_text_auxiliary_client('session_search'))
print(resolve_provider_client('custom'))

Observed locally:

CONFIG_MODEL {'provider': 'custom', 'default': 'gpt-5', 'base_url': 'https://httpsgood.abrdns.com/v1', ...}
MAIN_MODEL 'gpt-5'
MAIN_PROVIDER 'custom'
AUX session_search MODEL 'gpt-5' CLIENT OpenAI BASE https://httpsgood.abrdns.com/v1/
RESOLVE_CUSTOM 'gpt-5' OpenAI https://httpsgood.abrdns.com/v1/

So the primary config path is fine; the failure happens later in the async auxiliary execution path.

Suspected Root Cause

agent/auxiliary_client.py does not validate the return shape of client.chat.completions.create(...) in async_call_llm() before passing the result upward.

Relevant flow:

  • tools/session_search_tool.py:155 calls await async_call_llm(task="session_search", ...)
  • agent/auxiliary_client.py:2234 returns await client.chat.completions.create(**kwargs) with no contract check
  • tools/session_search_tool.py:164 calls extract_content_or_reasoning(response)
  • agent/auxiliary_client.py:2114 assumes response.choices[0].message

If any async adapter / fallback / malformed client returns a bare string or some other non-OpenAI object, the real failure is delayed and misreported.

Separately, the No models provided log line is likely a sibling symptom from auxiliary fallback / config-failure handling, not proof that model.default is missing in the user's config.

Suggested Fix

  1. Add response-shape validation in both call_llm() and async_call_llm() before returning.
    • Fail fast with a clear TypeError / RuntimeError if the payload lacks choices[0].message.
  2. Add a regression test covering malformed async auxiliary responses.
  3. Audit auxiliary fallback/config-error paths that can produce HTTP 400: No models provided despite a valid custom main model.

Affected Component

  • agent/auxiliary_client.py
  • tools/session_search_tool.py
  • auxiliary async task routing / fallback paths

Messaging Platform (if gateway-related)

Telegram (symptom observed there), but root cause appears platform-agnostic.

Operating System

Linux

Python Version

3.13

Hermes Version

main branch as of 2026-04-10 local checkout

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions