fix(discord): retry bot sends on rate limit and server errors during chunk delivery by AnCoSONG · Pull Request #33054 · openclaw/openclaw

AnCoSONG · 2026-03-03T13:34:49Z

Summary

Problem: When a long agent response is split into multiple Discord message chunks (>2000 chars), if any chunk hits a 429 rate-limit or 5xx error mid-sequence, the error propagates up and all remaining chunks are silently dropped.
Why it matters: Users see partial message delivery — the first N chunks arrive but the rest never show up. Common with long responses that get split into 5+ chunks.
What changed: Added sendWithRetry() wrapper around the bot sender path in sendDiscordChunkWithFallback and sendAdditionalDiscordMedia. Retries up to 2 times for 429/5xx with backoff, respects retry-after header, and immediately throws for non-retryable 4xx errors.
What did NOT change (scope boundary): Webhook delivery path already has its own fallback. No changes to chunking logic or Discord API client.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes fix(discord): chunked message sends silently drop on rate-limit mid-sequence #32887

User-visible / Behavior Changes

Discord message chunks that encounter transient 429/5xx errors are now retried (up to 2 times) instead of being silently dropped.
Non-retryable errors (4xx) still fail immediately.

Security Impact (required)

New permissions/capabilities? (No)
Secrets/tokens handling changed? (No)
New/changed network calls? (No — same Discord API calls, just retried on failure)
Command/tool execution surface changed? (No)
Data access scope changed? (No)

Repro + Verification

Environment

OS: macOS
Runtime/container: Node 22 + Vitest
Integration/channel: Discord

Steps

Configure an agent that produces long responses (>10,000 chars)
The response gets chunked into 5+ Discord messages
If Discord rate-limits any chunk mid-sequence, remaining chunks are now retried

Expected

All chunks are delivered even when transient errors occur mid-sequence.

Actual

All 13 tests pass, including 4 new tests for retry on 429, retry on 500, no retry on 400, and exhausted retries.

Evidence

Failing test/log before + passing after

Human Verification (required)

Verified scenarios: pnpm test src/discord/monitor/reply-delivery.test.ts — 13/13 pass
Edge cases checked: 429 with retry-after, 500 server error, 400 client error (no retry), exhausted retries (3 total attempts)
What you did not verify: live Discord delivery with real rate limits

Compatibility / Migration

Backward compatible? (Yes)
Config/env changes? (No)
Migration needed? (No)

Failure Recovery (if this breaks)

How to disable/revert this change quickly: Revert this PR commit.
Files/config to restore: src/discord/monitor/reply-delivery.ts
Known bad symptoms reviewers should watch for: Delayed message delivery due to retry backoff; unlikely since max delay is ~3s total.

Risks and Mitigations

Risk: Retry delay could add latency to message delivery when Discord is rate-limiting aggressively.
- Mitigation: Max 2 retries with short backoff (1-2s). Total worst-case delay is ~3s, which is acceptable vs losing all remaining chunks.

Made with Cursor

…chunk delivery When a long reply is split into multiple Discord message chunks, a 429 rate-limit or 5xx error on any chunk causes all remaining chunks to be silently dropped. Add sendWithRetry() that retries up to 2 times with backoff for retryable errors (429/5xx) and respects retry-after headers. Closes openclaw#32887 Made-with: Cursor

greptile-apps · 2026-03-03T13:38:27Z

Greptile Summary

This PR adds a sendWithRetry helper to the Discord bot sender path so that transient 429/5xx errors during chunk delivery trigger up to 2 retries with backoff instead of silently dropping remaining message chunks.

The retry logic is sound — isRetryableDiscordError correctly distinguishes retryable (429, 5xx) from non-retryable (4xx) errors and honours the retry-after header. However, the fix is incomplete: a third sendMessageDiscord call at line 296 (the non-voice, media-with-text payload path) is not wrapped in sendWithRetry, so that path still fails immediately on 429/5xx errors.

Additionally, the four new retry tests use real setTimeout delays instead of vi.useFakeTimers(), adding ~5 seconds of wall-clock sleep to each test run. The tests should adopt the mocking pattern already shown in the "touches bound-thread activity" test.

Confidence Score: 2/5

The retry logic is correct, but the fix is incomplete — one of three bot sender paths still lacks retry protection.
The core retry mechanism works correctly and all tests pass, but the implementation doesn't fully address the stated problem. The PR claims to fix silent chunk dropping on transient 429/5xx errors, yet one of the three sendMessageDiscord call sites (the media-with-text payload path at line 296) remains unwrapped. This means the bug still exists for that specific payload type. Test efficiency can also be improved without changing behavior. These are fixable issues, but they indicate incomplete implementation.
src/discord/monitor/reply-delivery.ts requires wrapping the media-with-text call (line 296) in sendWithRetry. src/discord/monitor/reply-delivery.test.ts should use vi.useFakeTimers() in retry tests.

_{Last reviewed commit: 88009d4}

greptile-apps

_{2 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-03-03T13:38:31Z

src/discord/monitor/reply-delivery.test.ts

+  it("retries bot send on 429 rate limit then succeeds", async () => {
+    const rateLimitErr = Object.assign(new Error("rate limited"), { status: 429 });
+    sendMessageDiscordMock
+      .mockRejectedValueOnce(rateLimitErr)
+      .mockResolvedValueOnce({ messageId: "msg-1", channelId: "channel-1" });
+
+    await deliverDiscordReply({
+      replies: [{ text: "retry me" }],
+      target: "channel:123",
+      token: "token",
+      runtime,
+      textLimit: 2000,
+    });
+
+    expect(sendMessageDiscordMock).toHaveBeenCalledTimes(2);
+  });
+
+  it("retries bot send on 500 server error then succeeds", async () => {
+    const serverErr = Object.assign(new Error("internal"), { status: 500 });
+    sendMessageDiscordMock
+      .mockRejectedValueOnce(serverErr)
+      .mockResolvedValueOnce({ messageId: "msg-1", channelId: "channel-1" });
+
+    await deliverDiscordReply({
+      replies: [{ text: "retry me" }],
+      target: "channel:123",
+      token: "token",
+      runtime,
+      textLimit: 2000,
+    });
+
+    expect(sendMessageDiscordMock).toHaveBeenCalledTimes(2);
+  });
+
+  it("does not retry on 4xx client errors", async () => {
+    const clientErr = Object.assign(new Error("bad request"), { status: 400 });
+    sendMessageDiscordMock.mockRejectedValueOnce(clientErr);
+
+    await expect(
+      deliverDiscordReply({
+        replies: [{ text: "fail" }],
+        target: "channel:123",
+        token: "token",
+        runtime,
+        textLimit: 2000,
+      }),
+    ).rejects.toThrow("bad request");
+
+    expect(sendMessageDiscordMock).toHaveBeenCalledTimes(1);
+  });
+
+  it("throws after exhausting retry attempts", async () => {
+    const rateLimitErr = Object.assign(new Error("rate limited"), { status: 429 });
+    sendMessageDiscordMock.mockRejectedValue(rateLimitErr);
+
+    await expect(
+      deliverDiscordReply({
+        replies: [{ text: "persistent failure" }],
+        target: "channel:123",
+        token: "token",
+        runtime,
+        textLimit: 2000,
+      }),
+    ).rejects.toThrow("rate limited");
+
+    // 1 initial + 2 retries = 3 attempts
+    expect(sendMessageDiscordMock).toHaveBeenCalledTimes(3);
+  });


Retry tests run real timers instead of mocked timers, which adds ~5 seconds of wall-clock sleep to the test suite:

"retries bot send on 429…" — sleeps 1000 ms

"retries bot send on 500…" — sleeps 1000 ms

"throws after exhausting retry attempts" — sleeps 1000 + 2000 = 3000 ms

The "touches bound-thread activity" test (line 214) demonstrates the correct pattern with vi.useFakeTimers(). Consider wrapping each retry test similarly:

it("retries bot send on 429 rate limit then succeeds", async () => { vi.useFakeTimers(); try { const rateLimitErr = Object.assign(new Error("rate limited"), { status: 429 }); sendMessageDiscordMock .mockRejectedValueOnce(rateLimitErr) .mockResolvedValueOnce({ messageId: "msg-1", channelId: "channel-1" }); const promise = deliverDiscordReply({ ... }); await vi.runAllTimersAsync(); await promise; expect(sendMessageDiscordMock).toHaveBeenCalledTimes(2); } finally { vi.useRealTimers(); } });

This eliminates the real delays while still verifying retry logic.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/discord/monitor/reply-delivery.test.ts Line: 262-329 Comment: Retry tests run real timers instead of mocked timers, which adds ~5 seconds of wall-clock sleep to the test suite: - "retries bot send on 429…" — sleeps 1000 ms - "retries bot send on 500…" — sleeps 1000 ms - "throws after exhausting retry attempts" — sleeps 1000 + 2000 = 3000 ms The "touches bound-thread activity" test (line 214) demonstrates the correct pattern with `vi.useFakeTimers()`. Consider wrapping each retry test similarly: ```ts it("retries bot send on 429 rate limit then succeeds", async () => { vi.useFakeTimers(); try { const rateLimitErr = Object.assign(new Error("rate limited"), { status: 429 }); sendMessageDiscordMock .mockRejectedValueOnce(rateLimitErr) .mockResolvedValueOnce({ messageId: "msg-1", channelId: "channel-1" }); const promise = deliverDiscordReply({ ... }); await vi.runAllTimersAsync(); await promise; expect(sendMessageDiscordMock).toHaveBeenCalledTimes(2); } finally { vi.useRealTimers(); } }); ``` This eliminates the real delays while still verifying retry logic. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-03T13:38:36Z

Additional Comments (1)

src/discord/monitor/reply-delivery.ts
This sendMessageDiscord call (non-voice, media-with-text path) is not wrapped in sendWithRetry, leaving it vulnerable to the same 429/5xx failures that the PR claims to fix.

Compare with the identical call pattern in sendDiscordChunkWithFallback (line 143-150) and sendAdditionalDiscordMedia (line 163-171), which both wrap sendMessageDiscord in sendWithRetry. This path should also be wrapped with the same retry pattern to maintain consistency and ensure users sending text + media payloads receive the same transient-error resilience as other delivery paths.

Without this wrapper, users sending text + media payloads that encounter transient errors will still experience silent delivery failure.

Prompt To Fix With AI

This is a comment left during a code review.
Path: src/discord/monitor/reply-delivery.ts
Line: 295-302

Comment:
This `sendMessageDiscord` call (non-voice, media-with-text path) is not wrapped in `sendWithRetry`, leaving it vulnerable to the same 429/5xx failures that the PR claims to fix.

Compare with the identical call pattern in `sendDiscordChunkWithFallback` (line 143-150) and `sendAdditionalDiscordMedia` (line 163-171), which both wrap `sendMessageDiscord` in `sendWithRetry`. This path should also be wrapped with the same retry pattern to maintain consistency and ensure users sending text + media payloads receive the same transient-error resilience as other delivery paths.

Without this wrapper, users sending text + media payloads that encounter transient errors will still experience silent delivery failure.

How can I resolve this? If you propose a fix, please make it concise.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 88009d4a65

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-03T13:39:47Z

src/discord/monitor/reply-delivery.ts

+const RETRY_BASE_DELAY_MS = 1000;
+
+function isRetryableDiscordError(err: unknown): { retryable: boolean; retryAfterMs: number } {
+  const status = (err as { status?: number }).status ?? (err as { statusCode?: number }).statusCode;


Parse nested Discord status before deciding retryability

The new retry classifier only checks err.status and err.statusCode, but Discord failures in this repo are also handled with err.response.status (see extractDiscordErrorStatus in src/discord/monitor/thread-bindings.discord-api.ts). If sendMessageDiscord throws that shape for a 429 or 5xx, this function marks it non-retryable and the remaining chunks are still dropped after the first failed send, so the fix does not cover those real error objects.

Useful? React with 👍 / 👎.

thewilloftheshadow · 2026-03-03T16:05:40Z

Superseded by #33226 (combined Discord chunk ordering + retry fixes).

openclaw-barnacle bot added channel: discord Channel integration: discord size: S labels Mar 3, 2026

greptile-apps bot reviewed Mar 3, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Mar 3, 2026

View reviewed changes

thewilloftheshadow mentioned this pull request Mar 3, 2026

fix(discord): retry chunk delivery #33226

Merged

18 tasks

thewilloftheshadow closed this Mar 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(discord): retry bot sends on rate limit and server errors during chunk delivery#33054

fix(discord): retry bot sends on rate limit and server errors during chunk delivery#33054
AnCoSONG wants to merge 1 commit intoopenclaw:mainfrom
AnCoSONG:fix/discord-chunk-rate-limit-retry

AnCoSONG commented Mar 3, 2026

Uh oh!

greptile-apps bot commented Mar 3, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Mar 3, 2026

Uh oh!

greptile-apps bot commented Mar 3, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 3, 2026

Uh oh!

thewilloftheshadow commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

AnCoSONG commented Mar 3, 2026

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

User-visible / Behavior Changes

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification (required)

Compatibility / Migration

Failure Recovery (if this breaks)

Risks and Mitigations

Uh oh!

greptile-apps bot commented Mar 3, 2026

Greptile Summary

Confidence Score: 2/5

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Mar 3, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

thewilloftheshadow commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants