bug(streaming): OpenAI Codex OAuth usage exhaustion produces no response and no error — quota detector substring-list misses Codex's 'plan limit reached' / 'usage_limit_exceeded' shapes

## Bug Description

Reporter: Cygnus (Discord; please ping in replies) via the WebUI testers thread.

When her **OpenAI Codex OAuth** account runs out of usage, the WebUI/Mac app produces **no response and no error** — the message appears to be sent but nothing comes back, with no inline error, no toast, no spinner-stop, no quota banner. Combined with the slow Codex first-token latency, this is doubly frustrating because the user has no signal whether it's still thinking or has silently failed.

> Also, I seem to get nothing and just no response when my OAuth Codex is out of usage. Which is a bit extra-frustrating because Codex takes a terribly long time to start replying (not sure if that's a Codex-side thing, but figured I'd let you know)
> *(May 7 2026, 00:13 UTC)*

This is a known failure shape we've seen before — see #1452 (the credential-pool rotation parity work, May 02 2026). #1452 documented the **rotation** half of the OAuth-exhaustion problem (WebUI now respects the credential pool's auto-rotate-on-exhaustion). What Cygnus is hitting now is the **single-credential** half: when there's no fallback to rotate to, the user must at minimum see a clear "out of usage" error — and currently they get silence.

## Likely root cause — quota-detection string-match misses OpenAI OAuth shape

`api/streaming.py:2461-2492` already has a silent-failure detector that fires when the agent returns no assistant content and no tokens ever streamed:

```python
_is_quota = (
    'insufficient credit' in _err_lower
    or 'credit balance' in _err_lower
    or 'credits exhausted' in _err_lower
    or 'more credits' in _err_lower
    or 'can only afford' in _err_lower
    or 'fewer max_tokens' in _err_lower
    or 'quota_exceeded' in _err_lower
    or 'quota exceeded' in _err_lower
    or 'exceeded your current quota' in _err_lower
)
```

These strings are **OpenRouter / Anthropic / OpenAI billing API** wording. **OpenAI Codex OAuth** uses a different error shape on usage exhaustion. The actual response body from the Codex OAuth API on usage exhaustion contains phrasing like:

- `"Plan limit reached"` (Plus / Team plan ceiling)
- `"You've reached the limit of messages per <window>"`
- `"You've used up your usage"` (legacy)
- `"usage_limit_exceeded"` (new error code)
- HTTP 429 with body shape `{"error": {"type": "usage_limit_exceeded", "message": "..."}}`

None of those substrings match the existing `_is_quota` regex. So when Codex OAuth runs out of usage, `_is_quota = False`, `_is_auth = False` (no `401` / `unauthorized` / `invalid api key` substring), and the silent-failure path falls through to a generic apperror — or, worse, nothing at all if `_assistant_added` looks like it succeeded with an empty assistant message because the agent caught and swallowed the error.

## What we should ship

### 1. Add OpenAI Codex OAuth usage-exhaustion patterns to `_is_quota`

Extend the substring list in `api/streaming.py:2461-2492` (and the parallel block at `:2936-2978`):

```python
_is_quota = (
    # ... existing strings ...
    or 'plan limit reached' in _err_lower
    or 'usage_limit_exceeded' in _err_lower
    or 'usage limit exceeded' in _err_lower
    or 'reached the limit of messages' in _err_lower
    or 'used up your usage' in _err_lower
    # Codex Plus/Team plan-window-reached HTTP body shape
    or ('plan' in _err_lower and 'limit' in _err_lower and 'reached' in _err_lower)
)
```

A version-pinned list is fine for now; longer-term the right fix is to switch from substring matching to checking the agent's error type (`agent.PlanLimitReached`, `agent.UsageLimitExceeded`, etc.) when the agent layer surfaces them. For now, add the strings + a regression test that pins the exact response body.

### 2. Make the silent-failure guard catch the no-error-no-content case

Currently the guard at `api/streaming.py:2456-2460` checks `_assistant_added` (any non-empty assistant content) and `_token_sent` (any streamed token). The guard correctly fires when both are false. But the apperror it emits depends on `_is_quota` / `_is_auth` correctly classifying the cause. When neither matches (the actual current state for Codex OAuth out-of-usage), the user sees a generic error — or worse, the agent's swallow-and-return-empty path produces an apparently-successful empty response.

Add a default catch-all when `not _assistant_added and not _token_sent` and neither classifier fires:

```python
else:
    _err_label = 'No response from provider'
    _err_type = 'silent_failure'
    _err_hint = (
        'The provider returned no content and no error. This often means a usage/rate '
        'limit was hit silently. Check provider status, switch providers via /provider, '
        'or try again in a moment.'
    )
```

This guarantees the user always sees *something* even when the error-shape classification fails.

### 3. Surface the raw error string in the toast

The current `apperror` event includes `_err_label` + `_err_hint`, but not the raw provider message. When the user reports "no response", the only way we can debug what the provider actually said is by asking them to check the agent log. Including the raw error in a collapsible "details" section of the apperror event lets them paste useful diagnostic data into bug reports without opening a log file.

## Severity

M2 — silent data-loss / silent-failure UX. Not a crash, but the absence of any signal makes the app feel broken. Codex first-token latency (3–10s under load on Plus/Team) compounds this — the user waits 10s, sees nothing, retries, waits 10s again, and only after the 3rd retry suspects the API is exhausted.

## Reporter

@cygnusignis via WebUI Discord testers thread, May 7 2026.

## Related

- #1452 (closed, May 02 2026) — credential pool rotation parity. Fixes the multi-credential rotation case but not the single-credential exhaustion case Cygnus is hitting now.
- #1671 (closed, May 04 2026) — `/api/provider/quota` endpoint. OpenRouter-only currently; tracking issue #706 for OpenAI/Anthropic header capture.
- The companion enhancement to surface usage display ambiently (parallel issue being filed for Cygnus's "Terminal Hermes shows usage, hope Mac app gets it too" ask).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug(streaming): OpenAI Codex OAuth usage exhaustion produces no response and no error — quota detector substring-list misses Codex's 'plan limit reached' / 'usage_limit_exceeded' shapes #1765

Bug Description

Likely root cause — quota-detection string-match misses OpenAI OAuth shape

What we should ship

1. Add OpenAI Codex OAuth usage-exhaustion patterns to `_is_quota`

2. Make the silent-failure guard catch the no-error-no-content case

3. Surface the raw error string in the toast

Severity

Reporter

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

bug(streaming): OpenAI Codex OAuth usage exhaustion produces no response and no error — quota detector substring-list misses Codex's 'plan limit reached' / 'usage_limit_exceeded' shapes #1765

Description

Bug Description

Likely root cause — quota-detection string-match misses OpenAI OAuth shape

What we should ship

1. Add OpenAI Codex OAuth usage-exhaustion patterns to _is_quota

2. Make the silent-failure guard catch the no-error-no-content case

3. Surface the raw error string in the toast

Severity

Reporter

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

1. Add OpenAI Codex OAuth usage-exhaustion patterns to `_is_quota`