Skip to content

bug(streaming): OpenAI Codex OAuth usage exhaustion produces no response and no error — quota detector substring-list misses Codex's 'plan limit reached' / 'usage_limit_exceeded' shapes #1765

@nesquena-hermes

Description

@nesquena-hermes

Bug Description

Reporter: Cygnus (Discord; please ping in replies) via the WebUI testers thread.

When her OpenAI Codex OAuth account runs out of usage, the WebUI/Mac app produces no response and no error — the message appears to be sent but nothing comes back, with no inline error, no toast, no spinner-stop, no quota banner. Combined with the slow Codex first-token latency, this is doubly frustrating because the user has no signal whether it's still thinking or has silently failed.

Also, I seem to get nothing and just no response when my OAuth Codex is out of usage. Which is a bit extra-frustrating because Codex takes a terribly long time to start replying (not sure if that's a Codex-side thing, but figured I'd let you know)
(May 7 2026, 00:13 UTC)

This is a known failure shape we've seen before — see #1452 (the credential-pool rotation parity work, May 02 2026). #1452 documented the rotation half of the OAuth-exhaustion problem (WebUI now respects the credential pool's auto-rotate-on-exhaustion). What Cygnus is hitting now is the single-credential half: when there's no fallback to rotate to, the user must at minimum see a clear "out of usage" error — and currently they get silence.

Likely root cause — quota-detection string-match misses OpenAI OAuth shape

api/streaming.py:2461-2492 already has a silent-failure detector that fires when the agent returns no assistant content and no tokens ever streamed:

_is_quota = (
    'insufficient credit' in _err_lower
    or 'credit balance' in _err_lower
    or 'credits exhausted' in _err_lower
    or 'more credits' in _err_lower
    or 'can only afford' in _err_lower
    or 'fewer max_tokens' in _err_lower
    or 'quota_exceeded' in _err_lower
    or 'quota exceeded' in _err_lower
    or 'exceeded your current quota' in _err_lower
)

These strings are OpenRouter / Anthropic / OpenAI billing API wording. OpenAI Codex OAuth uses a different error shape on usage exhaustion. The actual response body from the Codex OAuth API on usage exhaustion contains phrasing like:

  • "Plan limit reached" (Plus / Team plan ceiling)
  • "You've reached the limit of messages per <window>"
  • "You've used up your usage" (legacy)
  • "usage_limit_exceeded" (new error code)
  • HTTP 429 with body shape {"error": {"type": "usage_limit_exceeded", "message": "..."}}

None of those substrings match the existing _is_quota regex. So when Codex OAuth runs out of usage, _is_quota = False, _is_auth = False (no 401 / unauthorized / invalid api key substring), and the silent-failure path falls through to a generic apperror — or, worse, nothing at all if _assistant_added looks like it succeeded with an empty assistant message because the agent caught and swallowed the error.

What we should ship

1. Add OpenAI Codex OAuth usage-exhaustion patterns to _is_quota

Extend the substring list in api/streaming.py:2461-2492 (and the parallel block at :2936-2978):

_is_quota = (
    # ... existing strings ...
    or 'plan limit reached' in _err_lower
    or 'usage_limit_exceeded' in _err_lower
    or 'usage limit exceeded' in _err_lower
    or 'reached the limit of messages' in _err_lower
    or 'used up your usage' in _err_lower
    # Codex Plus/Team plan-window-reached HTTP body shape
    or ('plan' in _err_lower and 'limit' in _err_lower and 'reached' in _err_lower)
)

A version-pinned list is fine for now; longer-term the right fix is to switch from substring matching to checking the agent's error type (agent.PlanLimitReached, agent.UsageLimitExceeded, etc.) when the agent layer surfaces them. For now, add the strings + a regression test that pins the exact response body.

2. Make the silent-failure guard catch the no-error-no-content case

Currently the guard at api/streaming.py:2456-2460 checks _assistant_added (any non-empty assistant content) and _token_sent (any streamed token). The guard correctly fires when both are false. But the apperror it emits depends on _is_quota / _is_auth correctly classifying the cause. When neither matches (the actual current state for Codex OAuth out-of-usage), the user sees a generic error — or worse, the agent's swallow-and-return-empty path produces an apparently-successful empty response.

Add a default catch-all when not _assistant_added and not _token_sent and neither classifier fires:

else:
    _err_label = 'No response from provider'
    _err_type = 'silent_failure'
    _err_hint = (
        'The provider returned no content and no error. This often means a usage/rate '
        'limit was hit silently. Check provider status, switch providers via /provider, '
        'or try again in a moment.'
    )

This guarantees the user always sees something even when the error-shape classification fails.

3. Surface the raw error string in the toast

The current apperror event includes _err_label + _err_hint, but not the raw provider message. When the user reports "no response", the only way we can debug what the provider actually said is by asking them to check the agent log. Including the raw error in a collapsible "details" section of the apperror event lets them paste useful diagnostic data into bug reports without opening a log file.

Severity

M2 — silent data-loss / silent-failure UX. Not a crash, but the absence of any signal makes the app feel broken. Codex first-token latency (3–10s under load on Plus/Team) compounds this — the user waits 10s, sees nothing, retries, waits 10s again, and only after the 3rd retry suspects the API is exhausted.

Reporter

@CygnusIgnis via WebUI Discord testers thread, May 7 2026.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions