Skip to content

fix(agents): broaden 402 temporary-limit detection and allow billing cooldown probe#38533

Merged
altaywtf merged 7 commits intoopenclaw:mainfrom
xialonglee:fix38497
Mar 8, 2026
Merged

fix(agents): broaden 402 temporary-limit detection and allow billing cooldown probe#38533
altaywtf merged 7 commits intoopenclaw:mainfrom
xialonglee:fix38497

Conversation

@xialonglee
Copy link
Copy Markdown
Contributor

fix #38465

Summary

  • Problem: OpenClaw misclassifies Anthropic (and other providers) HTTP 402 responses as persistent "billing" issues, even when the error is a temporary usage/spend limit (e.g. monthly spend limit, organization usage limit). Users with active credits and working direct API calls see "Provider anthropic has billing issue (skipping all models)" indefinitely.
  • Why it matters: Misclassification leads to 5+ hour cooldowns with no recovery path. Single-provider setups are fully locked out; multi-provider setups skip the primary until manual intervention.
  • What changed: (1) Broaden 402 classification to treat periodic usage limits and org/workspace spend limits as rate_limit; (2) Treat billing as semi-persistent in model-fallback, allowing probes when no fallbacks exist (30s throttle) or when fallbacks exist (near cooldown expiry).
  • What did NOT change: Auth/auth_permanent remain persistent (no probes). Explicit billing signals (credit balance, insufficient credits) still classify as billing. Cooldown durations and auth profile storage are unchanged.

Change Type

  • Bug fix

Linked Issue/PR

User-visible / Behavior Changes

  • Classification: 402 responses with periodic usage limits (daily/weekly/monthly) or org/workspace spend limits are now classified as rate_limit instead of billing, enabling fallback and probe behavior.
  • Recovery: Billing-cooldowned providers can be probed again: every 30s when no fallbacks exist, or near cooldown expiry when fallbacks exist. Previously they were skipped indefinitely.

Repro + Verification

Steps

  1. Configure Anthropic as the only provider with active credits.
  2. Trigger a 402 from Anthropic (e.g. workspace spend limit, or transient 402).
  3. Observe classification and cooldown behavior.

Expected

  • Temporary 402s (periodic limits, org spend limits) → rate_limit → fallback/probe allowed.
  • Explicit billing (credit balance, insufficient credits) → billing → cooldown with probe recovery.

Actual (before fix)

  • All 402s → billing → indefinite skip, no probes.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7f7a4dfc75

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 7, 2026

Greptile Summary

This PR fixes a misclassification bug where HTTP 402 responses were treated as persistent billing failures, causing indefinite provider lockout even for temporary usage/spend limit errors. The fix broadens the 402 classifier to detect:

  1. Periodic usage limits (daily/weekly/monthly): Classified as rate_limit rather than billing, enabling fallback and probe behavior.
  2. Organization/workspace spend limits: Reclassified as rate_limit since these are typically admin-set caps that reset, not credit-balance failures.

Key implementation details:

  • classifyFailoverReasonFromHttpStatus(): New detection branches for periodic limits and org/workspace limits, both guarded by !isBillingErrorMessage(message) to prevent explicit billing signals from being reclassified.
  • resolveCooldownDecision(): Billing is now semi-persistent — a 30s throttled probe is allowed when no fallback candidates exist (guaranteed failure otherwise), and the existing probe schedule is used when fallbacks are present but the cooldown is near expiry.
  • runWithModelFallback(): allowTransientCooldownProbe: true is now passed for billing-reason probes (lines 536-542).

The changes maintain existing behavior for explicit billing signals (credit balance, insufficient credits) and auth/auth_permanent errors. Test isolation is correctly handled via beforeEach cleanup, and all new behavioral branches are covered with comprehensive test cases.

Confidence Score: 4/5

  • Safe to merge; the fix improves recovery from misclassified billing errors without weakening protection for genuine auth or billing failures.
  • The logic in both the classifier and the cooldown decision function is sound. The billing guard (!isBillingErrorMessage) prevents explicit billing signals from being reclassified. Test isolation is correctly handled via beforeEach cleanup. The hasSpendOrOrgLimitSignal pattern is somewhat broad but is constrained by the HTTP 402 context and the billing-signal guard. Minor deduction because the org/workspace pattern could theoretically match edge-case messages, though this is mitigated by the overall guard logic.
  • No files require special attention.

Last reviewed commit: 7f7a4df

@xialonglee
Copy link
Copy Markdown
Contributor Author

this ci failed is nothing to do with this PR

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 42cdef24ed

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@altaywtf altaywtf self-assigned this Mar 7, 2026
@altaywtf altaywtf force-pushed the fix38497 branch 6 times, most recently from 083a81b to 75960a8 Compare March 7, 2026 23:00
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 75960a8af1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

xialonglee and others added 6 commits March 8, 2026 09:59
…cooldown probes

- Broaden classifyFailoverReasonFromHttpStatus to treat periodic usage limits
  and org/workspace spend limits as rate_limit instead of billing
- Treat billing as semi-persistent in model-fallback: allow probes when
  no fallbacks exist (30s throttle) or when fallbacks exist (near expiry)
- Add tests for new 402 classification and billing probe behavior
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 38dc21271e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 282b9186c6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +534 to +537
decision.reason === "rate_limit" ||
decision.reason === "overloaded" ||
decision.reason === "billing"
) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid escalating billing backoff on failed cooldown probes

This change now enables billing probes (allowTransientCooldownProbe) for cooldowned providers, but failed probe attempts still flow into maybeMarkAuthProfileFailure and increment failureCounts.billing even while disabledUntil is active (computeNextProfileUsageStats keeps the active window but still bumps counts). In a fallback-enabled setup, repeated requests during the near-expiry probe window can quickly raise billing count so the next disable jumps from 5h to 10h/20h/24h, which makes lockouts longer after a probe sequence instead of aiding recovery. Fresh evidence in this diff is the new near-expiry billing probe path exercised in src/agents/model-fallback.probe.test.ts, which makes this escalation reachable in normal traffic.

Useful? React with 👍 / 👎.

@altaywtf altaywtf merged commit 92648f9 into openclaw:main Mar 8, 2026
27 of 28 checks passed
@altaywtf
Copy link
Copy Markdown
Member

altaywtf commented Mar 8, 2026

Merged via squash.

Thanks @xialonglee!

Get-windy pushed a commit to Get-windy/JieZi-ai-PS that referenced this pull request Mar 8, 2026
上游主要更新(5d22bd029 → 92648f9):
- fix(agents): 扩大 402 临时限速错误的检测范围,允许计费冷却探测(openclaw#38533)
- fix(ui): 修正控制台设备鉴权 token 签名问题
- CLI: 避免无监听归因时的误判更新重启失败(openclaw#39508)
- fix: 合入社区贡献 PR openclaw#39516(@Imhermes1)
- build: 版本升级至 2026.3.8,发布 v2026.3.7 tag

本地累积更新(自上次同步):
- feat: OAuth token 自动续期 —— cron 触发前预检 token 并自动刷新
- fix: oauth-refresh-daemon shouldRefresh 修复(已过期 token 不再被跳过)
- fix: 守护进程失败3次后重置计数器,避免永久停止刷新
- feat: 智能路由模型账号选择、模型管理全面优化
- feat: 飞书/钉钉频道会话聚合历史视图
MPavleski pushed a commit to MPavleski/openclaw that referenced this pull request Mar 8, 2026
…cooldown probe (openclaw#38533)

Merged via squash.

Prepared head SHA: 282b918
Co-authored-by: xialonglee <[email protected]>
Co-authored-by: altaywtf <[email protected]>
Reviewed-by: @altaywtf
Saitop pushed a commit to NomiciAI/openclaw that referenced this pull request Mar 8, 2026
…cooldown probe (openclaw#38533)

Merged via squash.

Prepared head SHA: 282b918
Co-authored-by: xialonglee <[email protected]>
Co-authored-by: altaywtf <[email protected]>
Reviewed-by: @altaywtf
GordonSH-oss pushed a commit to GordonSH-oss/openclaw that referenced this pull request Mar 9, 2026
…cooldown probe (openclaw#38533)

Merged via squash.

Prepared head SHA: 282b918
Co-authored-by: xialonglee <[email protected]>
Co-authored-by: altaywtf <[email protected]>
Reviewed-by: @altaywtf
jenawant pushed a commit to jenawant/openclaw that referenced this pull request Mar 10, 2026
…cooldown probe (openclaw#38533)

Merged via squash.

Prepared head SHA: 282b918
Co-authored-by: xialonglee <[email protected]>
Co-authored-by: altaywtf <[email protected]>
Reviewed-by: @altaywtf
sauerdaniel pushed a commit to sauerdaniel/openclaw that referenced this pull request Mar 11, 2026
…cooldown probe (openclaw#38533)

Merged via squash.

Prepared head SHA: 282b918
Co-authored-by: xialonglee <[email protected]>
Co-authored-by: altaywtf <[email protected]>
Reviewed-by: @altaywtf
dhoman pushed a commit to dhoman/chrono-claw that referenced this pull request Mar 11, 2026
…cooldown probe (openclaw#38533)

Merged via squash.

Prepared head SHA: 282b918
Co-authored-by: xialonglee <[email protected]>
Co-authored-by: altaywtf <[email protected]>
Reviewed-by: @altaywtf
alexey-pelykh pushed a commit to remoteclaw/remoteclaw that referenced this pull request Mar 14, 2026
…cooldown probe (openclaw#38533)

Cherry-picked from openclaw/openclaw@92648f9ba (AUTO-PARTIAL).
Discarded gutted files: CHANGELOG.md, model-fallback.ts, model-fallback.probe.test.ts,
failover-error.test.ts, pi-embedded-helpers.isbillingerrormessage.test.ts,
pi-embedded-runner/run.ts. Kept classifyFailoverReasonFrom402Text addition
in classifyFailoverReason; discarded classifyFailoverReasonFromHttpStatus
(gutted model-fallback layer) and isPeriodicUsageLimitErrorMessage references
(function does not exist in fork).

Co-authored-by: xialonglee <[email protected]>
Co-authored-by: altaywtf <[email protected]>
alexey-pelykh added a commit to remoteclaw/remoteclaw that referenced this pull request Mar 14, 2026
…cooldown probe (openclaw#38533) (#1252)

Cherry-picked from openclaw/openclaw@92648f9ba (AUTO-PARTIAL).
Discarded gutted files: CHANGELOG.md, model-fallback.ts, model-fallback.probe.test.ts,
failover-error.test.ts, pi-embedded-helpers.isbillingerrormessage.test.ts,
pi-embedded-runner/run.ts. Kept classifyFailoverReasonFrom402Text addition
in classifyFailoverReason; discarded classifyFailoverReasonFromHttpStatus
(gutted model-fallback layer) and isPeriodicUsageLimitErrorMessage references
(function does not exist in fork).

Co-authored-by: Peter Lee <[email protected]>
Co-authored-by: xialonglee <[email protected]>
Co-authored-by: altaywtf <[email protected]>
senw-developers pushed a commit to senw-developers/va-openclaw that referenced this pull request Mar 17, 2026
…cooldown probe (openclaw#38533)

Merged via squash.

Prepared head SHA: 282b918
Co-authored-by: xialonglee <[email protected]>
Co-authored-by: altaywtf <[email protected]>
Reviewed-by: @altaywtf
V-Gutierrez pushed a commit to V-Gutierrez/openclaw-vendor that referenced this pull request Mar 17, 2026
…cooldown probe (openclaw#38533)

Merged via squash.

Prepared head SHA: 282b918
Co-authored-by: xialonglee <[email protected]>
Co-authored-by: altaywtf <[email protected]>
Reviewed-by: @altaywtf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Anthropic API works via direct /v1/messages call, but OpenClaw 2026.3.2 classifies provider as billing issue

2 participants