feat: add TPM throttling error handling with 1-minute retry delay#1791
Merged
pomelo-nwu merged 14 commits intoQwenLM:mainfrom Feb 13, 2026
Merged
feat: add TPM throttling error handling with 1-minute retry delay#1791pomelo-nwu merged 14 commits intoQwenLM:mainfrom
pomelo-nwu merged 14 commits intoQwenLM:mainfrom
Conversation
Add support for detecting and handling TPM (Tokens Per Minute) throttling errors. When a TPM throttling error is detected (e.g., 'Throttling: TPM(10680324/10000000)'), the system now waits 1 minute before retrying instead of using exponential backoff. Changes: - Add isTPMThrottlingError() function to detect TPM throttling errors - Modify retryWithBackoff() to use fixed 1-minute delay for TPM errors - Add unit tests for TPM throttling detection and retry behavior Co-authored-by: Qwen-Coder <[email protected]>
yiliang114
reviewed
Feb 11, 2026
Co-authored-by: 易良 <[email protected]>
- Remove redundant error checking logic in isTPMThrottlingError function - Reuse isStructuredError and isApiError utilities from quotaErrorDetection module - Clean up duplicate import statements
- Move TPM throttling check before shouldRetryOnError to ensure TPM errors without standard HTTP status codes are still retried - Add comprehensive unit tests for edge cases: - TPM error without status property - Nested TPM error object without top-level status - Consecutive TPM throttling errors - Max attempts exhaustion for TPM errors
- Change 'as' to 'as unknown as' for proper type casting
yiliang114
approved these changes
Feb 12, 2026
…PM throttling test Add a .catch() handler to the promise before advancing timers to prevent Node.js from reporting an unhandled rejection when maxAttempts is exhausted during the TPM throttling retry test.
f8d914b to
1c38455
Compare
This reverts commit 9b882b4.
Collaborator
pomelo-nwu
reviewed
Feb 12, 2026
- Refactor retry utility to support GLM rate limit errors (code 1302) and TPM throttling - Add getRateLimitRetryInfo() for unified rate-limit error detection - Add exponential backoff for non-TPM rate limit errors - Extend StreamEventType.RETRY with RetryInfo payload for UI feedback - Add RetryCountdownMessage component for visual retry countdown - Update useGeminiStream hook to handle retry events with countdown timer - Add i18n support for rate limit messages (en/zh)
- Use fixed 60s delay matching DashScope per-minute quota window - Increase max retries from 3 to 10 to align with Claude Code behavior - Remove unused isTPMThrottlingError, isGLMRateLimitError, isRateLimitThrottlingError functions - Simplify getRateLimitRetryInfo to only extract reason, delay is now caller's responsibility Co-authored-by: Qwen-Coder <[email protected]>
pomelo-nwu
reviewed
Feb 12, 2026
packages/core/src/utils/retry.ts
Outdated
| } | ||
|
|
||
| // Try to extract code from JSON embedded in error message string | ||
| const message = getErrorMessage(error); |
Collaborator
There was a problem hiding this comment.
Avoid overhandling errorMessage here.
There should be a dedicated component in the CLI package to display messages for different error types — this part should only handle retry logic.
- Extract rate-limit detection into dedicated rateLimit.ts module - Support detection from ApiError, StructuredError, HttpError, and JSON strings - Handle common rate-limit codes: 429, 503, 1302 (GLM) - Simplify retry.ts by removing duplicated detection logic
pomelo-nwu
reviewed
Feb 13, 2026
| export type StreamEvent = | ||
| | { type: StreamEventType.CHUNK; value: GenerateContentResponse } | ||
| | { type: StreamEventType.RETRY }; | ||
| | { type: StreamEventType.RETRY; retryInfo?: RetryInfo }; |
Collaborator
There was a problem hiding this comment.
should be retryInfo: RetryInfo
Collaborator
|
@wenshao @yiliang114 Thanks for your contribution! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


Add support for detecting and handling TPM (Tokens Per Minute) throttling errors. When a TPM throttling error is detected (e.g., 'Throttling: TPM(10680324/10000000)'), the system now waits 1 minute before retrying instead of using exponential backoff.
Changes:
TLDR
Dive Deeper
Reviewer Test Plan
Testing Matrix
Linked issues / bugs