fix(discord): retry chunked message sends on rate-limit mid-sequence#32904
fix(discord): retry chunked message sends on rate-limit mid-sequence#32904Sid-Qin wants to merge 1 commit intoopenclaw:mainfrom
Conversation
The bot sender path in sendDiscordChunkWithFallback had no error handling, so a 429 or 5xx error on any chunk would drop all remaining chunks. Add sendWithRetry wrapper with up to 2 retries and retry-after header support. Closes openclaw#32887 Made-with: Cursor
|
🤖 We're reviewing this PR with Aisle We're running a security check on the changes in this PR now. This usually takes a few minutes. ⌛ Progress:
Latest run failed. Keeping previous successful results. Trace ID: Last updated on: 2026-03-03T10:40:42Z |
Greptile SummaryThis PR fixes a real bug where mid-sequence Discord rate-limit or server errors silently dropped all remaining message chunks. The solution — a One non-blocking issue worth addressing:
Confidence Score: 4/5
Last reviewed commit: e9c287e |
| it("retries bot sender on 429 rate-limit and succeeds (#32887)", async () => { | ||
| const rateLimitErr = Object.assign(new Error("rate limited"), { | ||
| status: 429, | ||
| headers: { "retry-after": "0.01" }, | ||
| }); | ||
| sendMessageDiscordMock | ||
| .mockRejectedValueOnce(rateLimitErr) | ||
| .mockResolvedValueOnce({ messageId: "msg-retry" }); | ||
|
|
||
| await deliverDiscordReply({ | ||
| replies: [{ text: "hello" }], | ||
| target: "channel:123", | ||
| token: "token", | ||
| runtime, | ||
| textLimit: 2000, | ||
| }); | ||
|
|
||
| expect(sendMessageDiscordMock).toHaveBeenCalledTimes(2); | ||
| }); | ||
|
|
||
| it("retries bot sender on 500 server error and succeeds (#32887)", async () => { | ||
| const serverErr = Object.assign(new Error("internal"), { status: 500 }); | ||
| sendMessageDiscordMock | ||
| .mockRejectedValueOnce(serverErr) | ||
| .mockResolvedValueOnce({ messageId: "msg-ok" }); | ||
|
|
||
| await deliverDiscordReply({ | ||
| replies: [{ text: "hello" }], | ||
| target: "channel:123", | ||
| token: "token", | ||
| runtime, | ||
| textLimit: 2000, | ||
| }); | ||
|
|
||
| expect(sendMessageDiscordMock).toHaveBeenCalledTimes(2); | ||
| }); | ||
|
|
||
| it("throws immediately on non-retryable 4xx errors (#32887)", async () => { | ||
| const clientErr = Object.assign(new Error("forbidden"), { status: 403 }); | ||
| sendMessageDiscordMock.mockRejectedValueOnce(clientErr); | ||
|
|
||
| await expect( | ||
| deliverDiscordReply({ | ||
| replies: [{ text: "hello" }], | ||
| target: "channel:123", | ||
| token: "token", | ||
| runtime, | ||
| textLimit: 2000, | ||
| }), | ||
| ).rejects.toThrow("forbidden"); | ||
|
|
||
| expect(sendMessageDiscordMock).toHaveBeenCalledTimes(1); | ||
| }); | ||
|
|
||
| it("delivers remaining chunks after a mid-sequence retry (#32887)", async () => { | ||
| sendMessageDiscordMock | ||
| .mockResolvedValueOnce({ messageId: "c1" }) | ||
| .mockRejectedValueOnce(Object.assign(new Error("rate limited"), { status: 429 })) | ||
| .mockResolvedValueOnce({ messageId: "c2-retry" }) | ||
| .mockResolvedValueOnce({ messageId: "c3" }); | ||
|
|
||
| await deliverDiscordReply({ | ||
| replies: [{ text: "A".repeat(6) }], | ||
| target: "channel:123", | ||
| token: "token", | ||
| runtime, | ||
| textLimit: 2, | ||
| }); | ||
|
|
||
| expect(sendMessageDiscordMock).toHaveBeenCalledTimes(4); | ||
| }); |
There was a problem hiding this comment.
Three of the four new retry tests invoke real setTimeout delays, each adding ~1 second of wall-clock time to the test suite.
Tests 1, 2, and 4 trigger the retry logic in sendWithRetry() (line 141 of reply-delivery.ts), which calls setTimeout(resolve, delayMs). The computed delay is:
- 429 test (
retry-after: "0.01"):delayMs = Math.max(10, 1_000 * 1) = 1_000 ms - 500 test (no header):
delayMs = Math.max(0, 1_000) = 1_000 ms - Mid-sequence test (no header):
delayMs = Math.max(0, 1_000) = 1_000 ms
The existing "touches bound-thread activity" test in this file already demonstrates the pattern: wrap timer-sensitive code in vi.useFakeTimers() / vi.useRealTimers(). Applying the same pattern to the three affected tests eliminates the ~3s slowdown without changing test coverage:
| it("retries bot sender on 429 rate-limit and succeeds (#32887)", async () => { | |
| const rateLimitErr = Object.assign(new Error("rate limited"), { | |
| status: 429, | |
| headers: { "retry-after": "0.01" }, | |
| }); | |
| sendMessageDiscordMock | |
| .mockRejectedValueOnce(rateLimitErr) | |
| .mockResolvedValueOnce({ messageId: "msg-retry" }); | |
| await deliverDiscordReply({ | |
| replies: [{ text: "hello" }], | |
| target: "channel:123", | |
| token: "token", | |
| runtime, | |
| textLimit: 2000, | |
| }); | |
| expect(sendMessageDiscordMock).toHaveBeenCalledTimes(2); | |
| }); | |
| it("retries bot sender on 500 server error and succeeds (#32887)", async () => { | |
| const serverErr = Object.assign(new Error("internal"), { status: 500 }); | |
| sendMessageDiscordMock | |
| .mockRejectedValueOnce(serverErr) | |
| .mockResolvedValueOnce({ messageId: "msg-ok" }); | |
| await deliverDiscordReply({ | |
| replies: [{ text: "hello" }], | |
| target: "channel:123", | |
| token: "token", | |
| runtime, | |
| textLimit: 2000, | |
| }); | |
| expect(sendMessageDiscordMock).toHaveBeenCalledTimes(2); | |
| }); | |
| it("throws immediately on non-retryable 4xx errors (#32887)", async () => { | |
| const clientErr = Object.assign(new Error("forbidden"), { status: 403 }); | |
| sendMessageDiscordMock.mockRejectedValueOnce(clientErr); | |
| await expect( | |
| deliverDiscordReply({ | |
| replies: [{ text: "hello" }], | |
| target: "channel:123", | |
| token: "token", | |
| runtime, | |
| textLimit: 2000, | |
| }), | |
| ).rejects.toThrow("forbidden"); | |
| expect(sendMessageDiscordMock).toHaveBeenCalledTimes(1); | |
| }); | |
| it("delivers remaining chunks after a mid-sequence retry (#32887)", async () => { | |
| sendMessageDiscordMock | |
| .mockResolvedValueOnce({ messageId: "c1" }) | |
| .mockRejectedValueOnce(Object.assign(new Error("rate limited"), { status: 429 })) | |
| .mockResolvedValueOnce({ messageId: "c2-retry" }) | |
| .mockResolvedValueOnce({ messageId: "c3" }); | |
| await deliverDiscordReply({ | |
| replies: [{ text: "A".repeat(6) }], | |
| target: "channel:123", | |
| token: "token", | |
| runtime, | |
| textLimit: 2, | |
| }); | |
| expect(sendMessageDiscordMock).toHaveBeenCalledTimes(4); | |
| }); | |
| it("retries bot sender on 429 rate-limit and succeeds (#32887)", async () => { | |
| vi.useFakeTimers(); | |
| try { | |
| const rateLimitErr = Object.assign(new Error("rate limited"), { | |
| status: 429, | |
| headers: { "retry-after": "0.01" }, | |
| }); | |
| sendMessageDiscordMock | |
| .mockRejectedValueOnce(rateLimitErr) | |
| .mockResolvedValueOnce({ messageId: "msg-retry" }); | |
| await deliverDiscordReply({ | |
| replies: [{ text: "hello" }], | |
| target: "channel:123", | |
| token: "token", | |
| runtime, | |
| textLimit: 2000, | |
| }); | |
| expect(sendMessageDiscordMock).toHaveBeenCalledTimes(2); | |
| } finally { | |
| vi.useRealTimers(); | |
| } | |
| }); | |
| it("retries bot sender on 500 server error and succeeds (#32887)", async () => { | |
| vi.useFakeTimers(); | |
| try { | |
| const serverErr = Object.assign(new Error("internal"), { status: 500 }); | |
| sendMessageDiscordMock | |
| .mockRejectedValueOnce(serverErr) | |
| .mockResolvedValueOnce({ messageId: "msg-ok" }); | |
| await deliverDiscordReply({ | |
| replies: [{ text: "hello" }], | |
| target: "channel:123", | |
| token: "token", | |
| runtime, | |
| textLimit: 2000, | |
| }); | |
| expect(sendMessageDiscordMock).toHaveBeenCalledTimes(2); | |
| } finally { | |
| vi.useRealTimers(); | |
| } | |
| }); | |
| it("delivers remaining chunks after a mid-sequence retry (#32887)", async () => { | |
| vi.useFakeTimers(); | |
| try { | |
| sendMessageDiscordMock | |
| .mockResolvedValueOnce({ messageId: "c1" }) | |
| .mockRejectedValueOnce(Object.assign(new Error("rate limited"), { status: 429 })) | |
| .mockResolvedValueOnce({ messageId: "c2-retry" }) | |
| .mockResolvedValueOnce({ messageId: "c3" }); | |
| await deliverDiscordReply({ | |
| replies: [{ text: "A".repeat(6) }], | |
| target: "channel:123", | |
| token: "token", | |
| runtime, | |
| textLimit: 2, | |
| }); | |
| expect(sendMessageDiscordMock).toHaveBeenCalledTimes(4); | |
| } finally { | |
| vi.useRealTimers(); | |
| } | |
| }); |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/discord/monitor/reply-delivery.test.ts
Line: 262-332
Comment:
Three of the four new retry tests invoke real `setTimeout` delays, each adding ~1 second of wall-clock time to the test suite.
Tests 1, 2, and 4 trigger the retry logic in `sendWithRetry()` (line 141 of reply-delivery.ts), which calls `setTimeout(resolve, delayMs)`. The computed delay is:
- **429 test** (`retry-after: "0.01"`): `delayMs = Math.max(10, 1_000 * 1) = 1_000 ms`
- **500 test** (no header): `delayMs = Math.max(0, 1_000) = 1_000 ms`
- **Mid-sequence test** (no header): `delayMs = Math.max(0, 1_000) = 1_000 ms`
The existing "touches bound-thread activity" test in this file already demonstrates the pattern: wrap timer-sensitive code in `vi.useFakeTimers()` / `vi.useRealTimers()`. Applying the same pattern to the three affected tests eliminates the ~3s slowdown without changing test coverage:
```suggestion
it("retries bot sender on 429 rate-limit and succeeds (#32887)", async () => {
vi.useFakeTimers();
try {
const rateLimitErr = Object.assign(new Error("rate limited"), {
status: 429,
headers: { "retry-after": "0.01" },
});
sendMessageDiscordMock
.mockRejectedValueOnce(rateLimitErr)
.mockResolvedValueOnce({ messageId: "msg-retry" });
await deliverDiscordReply({
replies: [{ text: "hello" }],
target: "channel:123",
token: "token",
runtime,
textLimit: 2000,
});
expect(sendMessageDiscordMock).toHaveBeenCalledTimes(2);
} finally {
vi.useRealTimers();
}
});
it("retries bot sender on 500 server error and succeeds (#32887)", async () => {
vi.useFakeTimers();
try {
const serverErr = Object.assign(new Error("internal"), { status: 500 });
sendMessageDiscordMock
.mockRejectedValueOnce(serverErr)
.mockResolvedValueOnce({ messageId: "msg-ok" });
await deliverDiscordReply({
replies: [{ text: "hello" }],
target: "channel:123",
token: "token",
runtime,
textLimit: 2000,
});
expect(sendMessageDiscordMock).toHaveBeenCalledTimes(2);
} finally {
vi.useRealTimers();
}
});
it("delivers remaining chunks after a mid-sequence retry (#32887)", async () => {
vi.useFakeTimers();
try {
sendMessageDiscordMock
.mockResolvedValueOnce({ messageId: "c1" })
.mockRejectedValueOnce(Object.assign(new Error("rate limited"), { status: 429 }))
.mockResolvedValueOnce({ messageId: "c2-retry" })
.mockResolvedValueOnce({ messageId: "c3" });
await deliverDiscordReply({
replies: [{ text: "A".repeat(6) }],
target: "channel:123",
token: "token",
runtime,
textLimit: 2,
});
expect(sendMessageDiscordMock).toHaveBeenCalledTimes(4);
} finally {
vi.useRealTimers();
}
});
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e9c287ed01
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| await sendWithRetry(() => | ||
| sendMessageDiscord(params.target, text, { | ||
| token: params.token, |
There was a problem hiding this comment.
Retry per physical send instead of whole sendMessageDiscord
Retrying the entire sendMessageDiscord call here is not idempotent and can duplicate output when that function has already sent part of the payload before failing. sendMessageDiscord can emit multiple Discord messages in one call (for example via sendDiscordText chunk loops when maxLinesPerMessage or forum-thread follow-up chunks apply), so if a later internal POST gets a 429/5xx, this wrapper replays earlier successful chunks on the next attempt and users receive duplicate content.
Useful? React with 👍 / 👎.
|
Superseded by #33226 (combined Discord chunk ordering + retry fixes). |
Summary
sendMessageDiscordand all remaining chunks are silently dropped.sendWithRetry()wrapper around the bot sender path insendDiscordChunkWithFallbackwith up to 2 retry attempts, exponential backoff, andretry-afterheader support. Non-retryable errors (4xx except 429) are thrown immediately.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
User-visible / Behavior Changes
Long agent responses split into multiple Discord messages will no longer silently drop remaining chunks when a mid-sequence rate-limit or server error occurs. Chunks are retried up to 2 times with backoff.
Security Impact (required)
Repro + Verification
Environment
Steps
Expected
Actual
Evidence
4 new vitest unit tests:
retries bot sender on 429 rate-limit and succeedsretries bot sender on 500 server error and succeedsthrows immediately on non-retryable 4xx errorsdelivers remaining chunks after a mid-sequence retryHuman Verification (required)
Compatibility / Migration
Failure Recovery (if this breaks)
sendMessageDiscordis restoredsrc/discord/monitor/reply-delivery.tsRisks and Mitigations