-
-
Notifications
You must be signed in to change notification settings - Fork 69.4k
500/503 errors misclassified as rate_limit, triggering unnecessary cooldowns #22294
Copy link
Copy link
Closed as not planned
Closed as not planned
Copy link
Labels
staleMarked as stale due to inactivityMarked as stale due to inactivity
Description
Bug Description
OpenClaw gateway classifies Gemini 500 (InternalServerError) and 503 (ServiceUnavailable) responses as rate_limit errors, which triggers the exponential cooldown mechanism (1min → 5min → 25min → 60min cap). This effectively takes the agent offline even when API usage is well below rate limits.
Evidence
- Gemini API dashboard shows usage at 5/25 RPM and 20/250 RPD (Paid Tier 1) — nowhere near limits
- The actual errors in Google's dashboard are 500 InternalServerError, NOT 429 TooManyRequests
- Both auth profiles (
google-gemini-cliandanthropicfallback) entered cooldown simultaneously, leaving the agent with no working model auth-profiles.jsonshowedcooldownUntilset withfailureCountsincrementing under therate_limitcategory
Expected Behavior
- 429 errors → trigger rate limit cooldown (correct)
- 500/503 errors → retry with backoff but do NOT enter rate_limit cooldown state
- Transient server errors should not disable the agent for extended periods
Actual Behavior
- 500/503 errors → classified as
rate_limit→ exponential cooldown activated - Agent goes offline for up to 60 minutes due to server-side errors it has no control over
- Both primary and fallback models can be simultaneously disabled
Impact
- Agent becomes completely unresponsive during Gemini outages
- Even brief Gemini instability (a few 500s) can trigger multi-minute cooldowns
- Fallback model (Claude Sonnet) may also be in cooldown, leaving zero working models
Reproduction
- Configure gateway with
google-gemini-clias primary model - Wait for Gemini to return a 500 or 503 error (happens periodically)
- Observe
auth-profiles.json—failureCounts.rate_limitincrements andcooldownUntilis set - Agent stops making requests even though rate limits are not exceeded
Workaround
Manually clear cooldowns in auth-profiles.json:
python3 -c "
import json
with open('auth-profiles.json') as f:
data = json.load(f)
for p in data.get('profiles', []):
for key in ['cooldownUntil', 'errorCount', 'failureCounts', 'lastFailureAt']:
if key in p:
del p[key]
with open('auth-profiles.json', 'w') as f:
json.dump(data, f, indent=2)
"Environment
- OpenClaw Gateway v2026.2.19
- Google Gemini Paid Tier 1
- Model: gemini-3-pro-preview
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
staleMarked as stale due to inactivityMarked as stale due to inactivity
Type
Fields
Give feedbackNo fields configured for issues without a type.