fix(runtime-fallback): extract status code from nested AI SDK errors#2618
Merged
acamq merged 2 commits intocode-yeongyu:devfrom Mar 16, 2026
Merged
Conversation
AI SDK wraps HTTP status codes inside error.error.statusCode (e.g., AI_APICallError). The current extractStatusCode only checks the top level, missing these nested codes. This caused runtime-fallback to skip retryable errors like 400, 500, 504 because it couldn't find the status code. Fixes code-yeongyu#2617
There was a problem hiding this comment.
1 issue found across 1 file
Confidence score: 3/5
- There is a concrete regression risk in
src/hooks/runtime-fallback/error-classifier.ts: the nullish-coalescing chain may stop at a non-numericstatus, so deeper numericstatusCodevalues are skipped and errors can be misclassified. - Given the moderate severity (6/10) with high confidence (9/10), this is more than a cosmetic issue and could affect runtime fallback behavior for some error shapes.
- Pay close attention to
src/hooks/runtime-fallback/error-classifier.ts- ensure status extraction prefers valid numeric values even when earlier fields are present but non-numeric.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/hooks/runtime-fallback/error-classifier.ts">
<violation number="1" location="src/hooks/runtime-fallback/error-classifier.ts:36">
P2: Nullish-coalescing chain can stop on non-numeric `status`, preventing deeper nested numeric `statusCode` values from being considered.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
…n extraction chain The nullish-coalescing chain could stop at a non-numeric value (e.g. status: "error"), preventing deeper nested numeric statusCode values from being reached. Switch to Array.find() with a type guard to always select the first numeric value. Adds 11 tests for extractStatusCode covering: top-level, nested (data/error/cause), non-numeric skip, fallback to regex, and precedence. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Collaborator
|
LGTM! Thank you! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What's broken
When a model call fails,
extractStatusCode()tries to find the HTTP status code in the error object so it can decide whether to retry with a fallback model.The problem: the AI SDK (
@ai-sdk/provider-utils) wraps errors in a nested structure. The status code ends up aterror.error.statusCode, butextractStatusCode()only looks at the top level (error.statusCode,error.status,error.data.statusCode).So when Claude returns a 400 or OpenAI returns a 500, the fallback system can't see the status code and treats the error as non-retryable. The user gets an error with no fallback attempt, even though their
retry_on_errorsconfig covers those codes.What this fixes
Adds two more places to look for the status code:
Which errors were affected
How I found this
I was debugging why runtime-fallback never fired for a 400 error. Real session log showed Atlas getting a 400 from Claude through CLIProxyAPI:
{"error":{"name":"AI_APICallError","statusCode":400,"url":"http://127.0.0.1:8317/v1/messages"}}The
statusCode: 400was right there — just one level deeper than where the code was looking. The text fallback (regex matching "400" in the error message) also missed it because the message was just "Bad Request" without the number.After patching this locally, all 11 configured
retry_on_errorscodes are properly detected.Closes #2617
Summary by cubic
Fixes runtime fallback by extracting HTTP status codes from nested SDK errors and ignoring non‑numeric fields. Ensures
retry_on_errorsreliably triggers on 4xx/5xx (including 400/500/504); closes #2617.extractStatusCode()to check nested fields used by@ai-sdk/provider-utils(e.g.,AI_APICallError) and select the first numeric code (skips strings likestatus: "error").data/error/cause), non-numeric skip, regex message fallback, and precedence.Written for commit de66f1f. Summary will update on new commits.