Skip to content

feat: add TPM throttling error handling with 1-minute retry delay#1791

Merged
pomelo-nwu merged 14 commits intoQwenLM:mainfrom
wenshao:feat/tpm-throttling-retry
Feb 13, 2026
Merged

feat: add TPM throttling error handling with 1-minute retry delay#1791
pomelo-nwu merged 14 commits intoQwenLM:mainfrom
wenshao:feat/tpm-throttling-retry

Conversation

@wenshao
Copy link
Contributor

@wenshao wenshao commented Feb 10, 2026

Add support for detecting and handling TPM (Tokens Per Minute) throttling errors. When a TPM throttling error is detected (e.g., 'Throttling: TPM(10680324/10000000)'), the system now waits 1 minute before retrying instead of using exponential backoff.

Changes:

  • Add isTPMThrottlingError() function to detect TPM throttling errors
  • Modify retryWithBackoff() to use fixed 1-minute delay for TPM errors
  • Add unit tests for TPM throttling detection and retry behavior

TLDR

Dive Deeper

Reviewer Test Plan

Testing Matrix

🍏 🪟 🐧
npm run
npx
Docker
Podman - -
Seatbelt - -

Linked issues / bugs

Add support for detecting and handling TPM (Tokens Per Minute) throttling errors.
When a TPM throttling error is detected (e.g., 'Throttling: TPM(10680324/10000000)'),
the system now waits 1 minute before retrying instead of using exponential backoff.

Changes:
- Add isTPMThrottlingError() function to detect TPM throttling errors
- Modify retryWithBackoff() to use fixed 1-minute delay for TPM errors
- Add unit tests for TPM throttling detection and retry behavior

Co-authored-by: Qwen-Coder <[email protected]>
wenshao and others added 5 commits February 11, 2026 16:56
- Remove redundant error checking logic in isTPMThrottlingError function
- Reuse isStructuredError and isApiError utilities from quotaErrorDetection module
- Clean up duplicate import statements
- Move TPM throttling check before shouldRetryOnError to ensure TPM errors
  without standard HTTP status codes are still retried
- Add comprehensive unit tests for edge cases:
  - TPM error without status property
  - Nested TPM error object without top-level status
  - Consecutive TPM throttling errors
  - Max attempts exhaustion for TPM errors
- Change 'as' to 'as unknown as' for proper type casting
…PM throttling test

Add a .catch() handler to the promise before advancing timers to prevent
Node.js from reporting an unhandled rejection when maxAttempts is exhausted
during the TPM throttling retry test.
@yiliang114 yiliang114 force-pushed the feat/tpm-throttling-retry branch from f8d914b to 1c38455 Compare February 12, 2026 05:10
This reverts commit 9b882b4.
@yiliang114
Copy link
Collaborator

During local simulation of a throttling event (TPM 12231856/10000000, HTTP 429), the error is gracefully handled in the background. End users in the TUI will not experience immediate disruption or error notifications. With debug logging enabled, these throttling events are recorded in the log files for operational visibility and diagnostics.

The left side shows the local proxy tool simulating a TPM throttling error. The right side shows the output from the local Qwen Code CLI. Below is the debug log file.

image image

yiliang114 and others added 4 commits February 12, 2026 16:21
- Refactor retry utility to support GLM rate limit errors (code 1302) and TPM throttling
- Add getRateLimitRetryInfo() for unified rate-limit error detection
- Add exponential backoff for non-TPM rate limit errors
- Extend StreamEventType.RETRY with RetryInfo payload for UI feedback
- Add RetryCountdownMessage component for visual retry countdown
- Update useGeminiStream hook to handle retry events with countdown timer
- Add i18n support for rate limit messages (en/zh)
- Use fixed 60s delay matching DashScope per-minute quota window
- Increase max retries from 3 to 10 to align with Claude Code behavior
- Remove unused isTPMThrottlingError, isGLMRateLimitError, isRateLimitThrottlingError functions
- Simplify getRateLimitRetryInfo to only extract reason, delay is now caller's responsibility

Co-authored-by: Qwen-Coder <[email protected]>
}

// Try to extract code from JSON embedded in error message string
const message = getErrorMessage(error);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid overhandling errorMessage here.
There should be a dedicated component in the CLI package to display messages for different error types — this part should only handle retry logic.

- Extract rate-limit detection into dedicated rateLimit.ts module
- Support detection from ApiError, StructuredError, HttpError, and JSON strings
- Handle common rate-limit codes: 429, 503, 1302 (GLM)
- Simplify retry.ts by removing duplicated detection logic
export type StreamEvent =
| { type: StreamEventType.CHUNK; value: GenerateContentResponse }
| { type: StreamEventType.RETRY };
| { type: StreamEventType.RETRY; retryInfo?: RetryInfo };
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be retryInfo: RetryInfo

Copy link
Collaborator

@pomelo-nwu pomelo-nwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@pomelo-nwu
Copy link
Collaborator

@wenshao @yiliang114 Thanks for your contribution!

@pomelo-nwu pomelo-nwu merged commit 001d010 into QwenLM:main Feb 13, 2026
24 of 25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments