[Bug] Auxiliary model silently falls back to Gemini Flash on OpenRouter even when user configured local-only

## Problem

Users who configure a local model as their main provider (Ollama, vLLM, llama.cpp) and have no cloud API keys still get billed on OpenRouter. The auxiliary model system (compression, vision, memory flush) hardcodes `google/gemini-3-flash-preview` as the fallback at `auxiliary_client.py:127-128`:

```python
_OPENROUTER_MODEL = "google/gemini-3-flash-preview"
_NOUS_MODEL = "google/gemini-3-flash-preview"
```

If the user's local model is slow to respond for auxiliary tasks, the system falls through to this hardcoded OpenRouter model — even if the user has no OpenRouter key configured (it finds one from a previous setup in `.env`).

## User reports

From Reddit: "I woke up this morning with a three digits hole of intense gemini flash calls on open router. With a local model configured for compression in the yaml, but in JIT. Hermes don't like it — it fallback on Gemini Flash if it's not very fast, even if you populate the yaml for a local compression."

Another user built a skill that patches the hardcoded values on every update:
```bash
sed -i 's/_OPENROUTER_MODEL = "google\/gemini-3-flash-preview"/_OPENROUTER_MODEL = "minimax\/minimax-m2.5"/' agent/auxiliary_client.py
```

## Expected behavior

If `auxiliary.compression.provider: custom` is set in config.yaml with a `base_url`, the system should use ONLY that endpoint — no silent fallback to OpenRouter. If the local model is slow, wait for it. If it fails, error — don't silently bill a cloud provider the user didn't authorize.

## Current behavior

The fallback chain at `auxiliary_client.py:754-776` (`_resolve_auto`):
```
OpenRouter → Nous Portal → Custom endpoint → Codex → API key provider
```

Even with explicit `auxiliary.compression.provider: custom` in config, if the custom endpoint is slow or times out, the system falls through to the next provider in the chain.

## Suggested fix

When a user explicitly configures `auxiliary.{task}.provider: custom`, respect it as a hard constraint — don't fall back to cloud providers. The fallback chain should only apply when `provider: auto` (the default).

## Related

- #5392 — Custom/Google provider not resolved in fallback (different bug, same area)
- #4171 (closed) — Provider auto-detection ignores config.yaml (fixed for main model, not auxiliary)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Auxiliary model silently falls back to Gemini Flash on OpenRouter even when user configured local-only #7559

Problem

User reports

Expected behavior

Current behavior

Suggested fix

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Auxiliary model silently falls back to Gemini Flash on OpenRouter even when user configured local-only #7559

Description

Problem

User reports

Expected behavior

Current behavior

Suggested fix

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions