Skip to content

Gateway hangs indefinitely on model API billing error instead of failing gracefully #24622

@evaldo-westy

Description

@evaldo-westy

Issue

When a model API key runs out of credits/quota, the gateway hangs indefinitely instead of failing gracefully or recovering. The process becomes unresponsive and requires manual SIGTERM to restart.

Steps to Reproduce

  1. Use a model with an API key that has exhausted credits (in this case Google Gemini)
  2. Trigger a model request
  3. API returns billing error: ⚠️ API provider returned a billing error — your API key has run out of credits or has an insufficient balance. Check your provider's billing dashboard and top up or switch to a different API key.
  4. Gateway hangs and stops responding to all requests

Expected Behavior

Gateway should:

  • Catch the billing error gracefully
  • Return error to user immediately
  • Remain responsive to new requests
  • Allow model switching or other operations to continue

Actual Behavior

Gateway enters hung state:

  • All processing stops
  • Typing indicator times out after 2 minutes
  • Gateway logs show TypeError: fetch failed as "non-fatal unhandled rejection"
  • Process remains running but unresponsive for hours
  • Model switching commands don't help (can't be processed due to hung state)
  • Manual restart (SIGTERM) required

Timeline from Logs

13:13:12 - Typing indicator timeout during active processing
13:13-15:36 - Complete silence (2.5 hour hang)
14:53:23 - Non-fatal unhandled rejection: TypeError: fetch failed
15:36+ - Gateway logging resumed but still unresponsive to user
16:13:58 - Manual SIGTERM sent
16:14:05 - Gateway restarted successfully

Root Cause

Unhandled Promise rejection during model API call. The network failure leaves the Node.js event loop blocked. The "non-fatal unhandled rejection" suggests the error isn't being caught by the proper error handler.

Environment

  • OpenClaw version: 2026.2.9 (33c75cb)
  • Model: google/gemini-3-pro-preview (default)
  • OS: macOS (Darwin 25.3.0 arm64)
  • Node: v25.6.0

Recommendation

Implement proper timeout and error handling for model API calls:

  • Wrap API calls in try-catch with timeouts
  • Handle billing/quota errors specifically
  • Ensure Promise rejections can't block the event loop
  • Consider circuit breaker pattern for failing providers
  • Return errors to user immediately rather than hanging

Log Excerpt

2026-02-23T14:53:23.177Z [openclaw] Non-fatal unhandled rejection (continuing): TypeError: fetch failed
    at node:internal/deps/undici/undici:16480:13
    at processTicksAndRejections (node:internal/process/task_queues:104:5)

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleMarked as stale due to inactivity

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions