Skip to content

feat(agents): retry empty-stream once before fallback#13820

Closed
Louise-Qiuqiu wants to merge 1 commit intoopenclaw:mainfrom
Louise-Qiuqiu:fix/empty-stream-retry-fallback-v2
Closed

feat(agents): retry empty-stream once before fallback#13820
Louise-Qiuqiu wants to merge 1 commit intoopenclaw:mainfrom
Louise-Qiuqiu:fix/empty-stream-retry-fallback-v2

Conversation

@Louise-Qiuqiu
Copy link
Copy Markdown

@Louise-Qiuqiu Louise-Qiuqiu commented Feb 11, 2026

This PR improves resilience for transient empty-stream failures (request ended without sending any chunks) by retrying once on the same model before proceeding through fallback models.

Changes:

  • classify empty-stream patterns as timeout failover reasons
  • add one-time in-model retry (300-800ms jitter, feature-flag controlled)
  • keep robust fallback behavior for remaining retryable errors
  • add unit tests for classification, retry success, retry->fallback, and feature-flag off path

Greptile Overview

Greptile Summary

This PR enhanced resilience for transient empty-stream failures by implementing a one-time in-model retry before proceeding to fallback models. The implementation correctly classifies empty-stream errors (request ended without sending any chunks, stream ended before first chunk) as timeout failover reasons and adds retry logic with configurable jitter (300-800ms default, feature-flag controlled via OPENCLAW_EMPTY_STREAM_RETRY).

Key changes:

  • Added empty-stream error patterns to timeout classification in error pattern matching
  • Implemented one-time in-model retry with exponential jitter before fallback
  • Added comprehensive test coverage for retry success, retry-then-fallback, and feature-flag disable scenarios
  • Maintained backward compatibility through feature flags and existing fallback behavior

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The implementation is well-designed with proper error classification, comprehensive test coverage covering all code paths (retry success, retry failure fallback, feature-flag disable), and backward-compatible feature flags. The retry logic is correctly isolated with a guard flag preventing infinite loops, and the delay mechanism uses appropriate jitter to prevent thundering herd issues.
  • No files require special attention

(2/5) Greptile learns from your feedback when you react with thumbs up/down!

@openclaw-barnacle openclaw-barnacle bot added the agents Agent runtime and tooling label Feb 11, 2026
@Louise-Qiuqiu
Copy link
Copy Markdown
Author

Louise-Qiuqiu commented Feb 11, 2026

Hey @gumadeiras 👋 Friendly ping — this PR adds a single empty-stream retry before falling back to the next model, which addresses the request ended without sending any chunks error we've been hitting in production. Some providers occasionally drop the stream without sending any data, triggering an immediate fallback even though a simple retry on the same model would succeed. CI is green and the change is minimal. Would appreciate a review when you get a chance, thanks!

@Louise-Qiuqiu Louise-Qiuqiu force-pushed the fix/empty-stream-retry-fallback-v2 branch from 521f702 to bb7fc6b Compare February 12, 2026 06:37
@Louise-Qiuqiu
Copy link
Copy Markdown
Author

Quick update on the latest commits:

bb7fc6b — Classify Chinese overload message for fallback

Some providers return 500 new_api_error with a Chinese message "负载已经达到上限" (capacity reached). This commit adds that pattern to isOverloadedErrorMessage() so it's correctly classified as rate_limit and triggers fallback instead of being treated as an unrecoverable error. Unit test included.

d237a9b — Prioritize overload before transient HTTP timeout

The new overload test case exposed a priority issue in classifyFailoverReason: the isTransientHttpError branch (matching any 5xx) was evaluated before the overload/rate-limit branches, so a 500 + overload message was classified as timeout instead of rate_limit. This commit reorders the checks so overload and rate-limit patterns are evaluated first. Unit test updated.


CI is fully green ✅ (all checks pass on d237a9b). The changes are backward-compatible — only adds new error patterns and fixes classification priority.

@Louise-Qiuqiu Louise-Qiuqiu force-pushed the fix/empty-stream-retry-fallback-v2 branch 4 times, most recently from 70b00f0 to 279f306 Compare February 14, 2026 01:08
sauerdaniel added a commit to sauerdaniel/openclaw that referenced this pull request Feb 14, 2026
@Louise-Qiuqiu Louise-Qiuqiu force-pushed the fix/empty-stream-retry-fallback-v2 branch from 279f306 to 4c5cd17 Compare February 15, 2026 09:44
@Louise-Qiuqiu Louise-Qiuqiu force-pushed the fix/empty-stream-retry-fallback-v2 branch 3 times, most recently from 217ea2c to 025fbf8 Compare February 18, 2026 18:35
@openclaw-barnacle
Copy link
Copy Markdown

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

@openclaw-barnacle openclaw-barnacle bot added the stale Marked as stale due to inactivity label Mar 10, 2026
@Louise-Qiuqiu Louise-Qiuqiu force-pushed the fix/empty-stream-retry-fallback-v2 branch from 05b7897 to 4c004de Compare March 10, 2026 11:38
@openclaw-barnacle openclaw-barnacle bot added size: S and removed size: M stale Marked as stale due to inactivity labels Mar 10, 2026
@Louise-Qiuqiu Louise-Qiuqiu force-pushed the fix/empty-stream-retry-fallback-v2 branch from 4d66837 to 1ef1641 Compare March 14, 2026 10:25
@openclaw-barnacle
Copy link
Copy Markdown

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

@openclaw-barnacle openclaw-barnacle bot added the stale Marked as stale due to inactivity label Mar 20, 2026
@Louise-Qiuqiu Louise-Qiuqiu force-pushed the fix/empty-stream-retry-fallback-v2 branch from 1ef1641 to 02c8308 Compare March 20, 2026 15:49
@openclaw-barnacle openclaw-barnacle bot removed the stale Marked as stale due to inactivity label Mar 21, 2026
@openclaw-barnacle
Copy link
Copy Markdown

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

@openclaw-barnacle openclaw-barnacle bot added the stale Marked as stale due to inactivity label Mar 30, 2026
@openclaw-barnacle
Copy link
Copy Markdown

Closing due to inactivity.
If you believe this PR should be revived, post in #pr-thunderdome-dangerzone on Discord to talk to a maintainer.
That channel is the escape hatch for high-quality PRs that get auto-closed.

@openclaw-barnacle openclaw-barnacle bot closed this Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling size: S stale Marked as stale due to inactivity

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant