Skip to content

[Bug] Auxiliary model silently falls back to Gemini Flash on OpenRouter even when user configured local-only #7559

@SHL0MS

Description

@SHL0MS

Problem

Users who configure a local model as their main provider (Ollama, vLLM, llama.cpp) and have no cloud API keys still get billed on OpenRouter. The auxiliary model system (compression, vision, memory flush) hardcodes google/gemini-3-flash-preview as the fallback at auxiliary_client.py:127-128:

_OPENROUTER_MODEL = "google/gemini-3-flash-preview"
_NOUS_MODEL = "google/gemini-3-flash-preview"

If the user's local model is slow to respond for auxiliary tasks, the system falls through to this hardcoded OpenRouter model — even if the user has no OpenRouter key configured (it finds one from a previous setup in .env).

User reports

From Reddit: "I woke up this morning with a three digits hole of intense gemini flash calls on open router. With a local model configured for compression in the yaml, but in JIT. Hermes don't like it — it fallback on Gemini Flash if it's not very fast, even if you populate the yaml for a local compression."

Another user built a skill that patches the hardcoded values on every update:

sed -i 's/_OPENROUTER_MODEL = "google\/gemini-3-flash-preview"/_OPENROUTER_MODEL = "minimax\/minimax-m2.5"/' agent/auxiliary_client.py

Expected behavior

If auxiliary.compression.provider: custom is set in config.yaml with a base_url, the system should use ONLY that endpoint — no silent fallback to OpenRouter. If the local model is slow, wait for it. If it fails, error — don't silently bill a cloud provider the user didn't authorize.

Current behavior

The fallback chain at auxiliary_client.py:754-776 (_resolve_auto):

OpenRouter → Nous Portal → Custom endpoint → Codex → API key provider

Even with explicit auxiliary.compression.provider: custom in config, if the custom endpoint is slow or times out, the system falls through to the next provider in the chain.

Suggested fix

When a user explicitly configures auxiliary.{task}.provider: custom, respect it as a hard constraint — don't fall back to cloud providers. The fallback chain should only apply when provider: auto (the default).

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    type/bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions