Description
When the Anthropic OAuth refresh endpoint (console.anthropic.com/v1/oauth/token) itself returns HTTP 429, the agent enters an unrecoverable loop with no backoff, pause, or cooldown mechanism. Every poll cycle (default 2 min) repeats the same failing pattern indefinitely.
Root Cause
The fix in #16 (commit 0ab1009) added OAuth token refresh as a 429 bypass for the usage API. However, the refresh endpoint can also get rate-limited. When it does:
credsRefresh() reads expired/rate-limited credentials
FetchQuotas() -> 429 -> triggers RefreshAnthropicToken()
- OAuth endpoint returns 429 ->
ErrOAuthRefreshFailed (generic, no 429 distinction)
- Agent returns, no backoff applied
- Next poll cycle (2 min) -> repeat from step 1
Contrast with Auth Error Handling
Auth errors (401/403) have proper protection (anthropic_agent.go lines 262-289):
- Retry counter (
authFailCount)
- Pause mechanism (
authPaused = true after maxAuthFailures)
- Stops polling until credentials change
Rate limit errors (429) have none of this (anthropic_agent.go lines 208-254).
Impact
- Hammers the OAuth endpoint every 2 minutes indefinitely
- Burns through refresh tokens (one-time use with OAuth rotation)
- Can permanently invalidate credentials if rotation fails mid-429
- No self-recovery - requires manual intervention (daemon restart + fresh token)
Suggested Fix
Add backoff/pause logic for 429 errors similar to the existing auth error handling:
- Distinguish 429 from other
ErrOAuthRefreshFailed cases in anthropic_oauth.go
- Add exponential backoff or a pause counter for rate limit errors in
anthropic_agent.go
- Consider a cooldown period before retrying OAuth refresh (e.g., 5-10 minutes on first 429, doubling each time)
Environment
Description
When the Anthropic OAuth refresh endpoint (
console.anthropic.com/v1/oauth/token) itself returns HTTP 429, the agent enters an unrecoverable loop with no backoff, pause, or cooldown mechanism. Every poll cycle (default 2 min) repeats the same failing pattern indefinitely.Root Cause
The fix in #16 (commit
0ab1009) added OAuth token refresh as a 429 bypass for the usage API. However, the refresh endpoint can also get rate-limited. When it does:credsRefresh()reads expired/rate-limited credentialsFetchQuotas()-> 429 -> triggersRefreshAnthropicToken()ErrOAuthRefreshFailed(generic, no 429 distinction)Contrast with Auth Error Handling
Auth errors (401/403) have proper protection (
anthropic_agent.golines 262-289):authFailCount)authPaused = trueaftermaxAuthFailures)Rate limit errors (429) have none of this (
anthropic_agent.golines 208-254).Impact
Suggested Fix
Add backoff/pause logic for 429 errors similar to the existing auth error handling:
ErrOAuthRefreshFailedcases inanthropic_oauth.goanthropic_agent.goEnvironment