Skip to content

fix(discord): retry chunked message sends on rate-limit mid-sequence#32904

Closed
Sid-Qin wants to merge 1 commit intoopenclaw:mainfrom
Sid-Qin:fix/discord-chunk-retry-ratelimit-32887
Closed

fix(discord): retry chunked message sends on rate-limit mid-sequence#32904
Sid-Qin wants to merge 1 commit intoopenclaw:mainfrom
Sid-Qin:fix/discord-chunk-retry-ratelimit-32887

Conversation

@Sid-Qin
Copy link
Copy Markdown
Contributor

@Sid-Qin Sid-Qin commented Mar 3, 2026

Summary

  • Problem: When a long agent response is split into multiple Discord message chunks (>2000 chars), if any chunk hits a 429 rate-limit or 5xx error mid-sequence, the error propagates from sendMessageDiscord and all remaining chunks are silently dropped.
  • Why it matters: Users see partial message delivery — the first N chunks appear but the rest never arrive. This is common with long responses that produce 5+ chunks in rapid succession.
  • What changed: Added a sendWithRetry() wrapper around the bot sender path in sendDiscordChunkWithFallback with up to 2 retry attempts, exponential backoff, and retry-after header support. Non-retryable errors (4xx except 429) are thrown immediately.
  • What did NOT change (scope boundary): Webhook delivery path already has its own fallback (catch → bot sender). Voice and media paths are unchanged.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

User-visible / Behavior Changes

Long agent responses split into multiple Discord messages will no longer silently drop remaining chunks when a mid-sequence rate-limit or server error occurs. Chunks are retried up to 2 times with backoff.

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No — same Discord API calls, just retried on failure
  • Command/tool execution surface changed? No
  • Data access scope changed? No

Repro + Verification

Environment

  • OS: Any
  • Runtime: Node.js 22+

Steps

  1. Configure an agent producing long responses (>10,000 chars)
  2. Response gets chunked into 5+ Discord messages
  3. Simulate Discord rate-limit on chunk 3

Expected

  • All chunks are delivered (with retry delay on chunk 3)

Actual

  • Before: Chunks 4+ silently dropped
  • After: Chunk 3 retries with backoff, chunks 4+ delivered normally

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

4 new vitest unit tests:

  • retries bot sender on 429 rate-limit and succeeds
  • retries bot sender on 500 server error and succeeds
  • throws immediately on non-retryable 4xx errors
  • delivers remaining chunks after a mid-sequence retry

Human Verification (required)

  • Verified scenarios: 429 rate-limit retry, 500 server error retry, 403 immediate throw, multi-chunk delivery with mid-sequence failure
  • Edge cases checked: retry-after header parsing (NaN, 0, valid), all retries exhausted (throws), non-retryable status codes
  • What you did not verify: Live Discord API rate-limit behavior (tested with mocks only)

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No

Failure Recovery (if this breaks)

  • How to disable/revert: Revert this commit; the original direct call to sendMessageDiscord is restored
  • Files/config to restore: src/discord/monitor/reply-delivery.ts

Risks and Mitigations

  • Risk: Retry delay (up to 2s per attempt) could slow down chunk delivery when Discord is heavily rate-limiting
    • Mitigation: Maximum 2 retries with bounded delay; non-retryable errors still throw immediately. The alternative (dropping chunks) is strictly worse.

The bot sender path in sendDiscordChunkWithFallback had no error handling,
so a 429 or 5xx error on any chunk would drop all remaining chunks. Add
sendWithRetry wrapper with up to 2 retries and retry-after header support.

Closes openclaw#32887

Made-with: Cursor
@aisle-research-bot
Copy link
Copy Markdown

aisle-research-bot bot commented Mar 3, 2026

🤖 We're reviewing this PR with Aisle

We're running a security check on the changes in this PR now. This usually takes a few minutes. ⌛
We'll post the results here as soon as they're ready.

Progress:

  • Analysis
  • Triage
  • Finalization

Latest run failed. Keeping previous successful results. Trace ID: 019cb348ff30e11fe1c1862d3bbb051a.

Last updated on: 2026-03-03T10:40:42Z

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 3, 2026

Greptile Summary

This PR fixes a real bug where mid-sequence Discord rate-limit or server errors silently dropped all remaining message chunks. The solution — a sendWithRetry() wrapper with exponential backoff and retry-after header support — is well-scoped and correctly handles the retryable vs. non-retryable distinction.

One non-blocking issue worth addressing:

  • Test timers: The three new retry tests do not use vi.useFakeTimers(), so each invokes the real setTimeout with a 1,000 ms base delay, adding ~3 s of wall-clock time to every CI run. The "retry-after": "0.01" header on the 429 test was presumably intended to keep the test fast, but Math.max(retryAfterMs, RETRY_BASE_DELAY_MS * (attempt + 1)) overrides it with the base delay. The existing "touches bound-thread activity" test in this file shows the correct pattern using vi.useFakeTimers().

Confidence Score: 4/5

  • Safe to merge; the core retry logic is correct and the test timing issue is a non-blocking CI performance concern.
  • The retry logic itself is sound: loop bounds, retryable-status detection, and throw-on-non-retryable behavior are all correct. Tests cover the main scenarios (429, 500, 403, multi-chunk). The flagged issue (slow tests from missing fake timers) is a style/performance concern for the test suite, not a functional bug in the implementation.
  • reply-delivery.test.ts — three new retry tests need vi.useFakeTimers() wrappers to eliminate CI slowdown.

Last reviewed commit: e9c287e

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +262 to +332
it("retries bot sender on 429 rate-limit and succeeds (#32887)", async () => {
const rateLimitErr = Object.assign(new Error("rate limited"), {
status: 429,
headers: { "retry-after": "0.01" },
});
sendMessageDiscordMock
.mockRejectedValueOnce(rateLimitErr)
.mockResolvedValueOnce({ messageId: "msg-retry" });

await deliverDiscordReply({
replies: [{ text: "hello" }],
target: "channel:123",
token: "token",
runtime,
textLimit: 2000,
});

expect(sendMessageDiscordMock).toHaveBeenCalledTimes(2);
});

it("retries bot sender on 500 server error and succeeds (#32887)", async () => {
const serverErr = Object.assign(new Error("internal"), { status: 500 });
sendMessageDiscordMock
.mockRejectedValueOnce(serverErr)
.mockResolvedValueOnce({ messageId: "msg-ok" });

await deliverDiscordReply({
replies: [{ text: "hello" }],
target: "channel:123",
token: "token",
runtime,
textLimit: 2000,
});

expect(sendMessageDiscordMock).toHaveBeenCalledTimes(2);
});

it("throws immediately on non-retryable 4xx errors (#32887)", async () => {
const clientErr = Object.assign(new Error("forbidden"), { status: 403 });
sendMessageDiscordMock.mockRejectedValueOnce(clientErr);

await expect(
deliverDiscordReply({
replies: [{ text: "hello" }],
target: "channel:123",
token: "token",
runtime,
textLimit: 2000,
}),
).rejects.toThrow("forbidden");

expect(sendMessageDiscordMock).toHaveBeenCalledTimes(1);
});

it("delivers remaining chunks after a mid-sequence retry (#32887)", async () => {
sendMessageDiscordMock
.mockResolvedValueOnce({ messageId: "c1" })
.mockRejectedValueOnce(Object.assign(new Error("rate limited"), { status: 429 }))
.mockResolvedValueOnce({ messageId: "c2-retry" })
.mockResolvedValueOnce({ messageId: "c3" });

await deliverDiscordReply({
replies: [{ text: "A".repeat(6) }],
target: "channel:123",
token: "token",
runtime,
textLimit: 2,
});

expect(sendMessageDiscordMock).toHaveBeenCalledTimes(4);
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three of the four new retry tests invoke real setTimeout delays, each adding ~1 second of wall-clock time to the test suite.

Tests 1, 2, and 4 trigger the retry logic in sendWithRetry() (line 141 of reply-delivery.ts), which calls setTimeout(resolve, delayMs). The computed delay is:

  • 429 test (retry-after: "0.01"): delayMs = Math.max(10, 1_000 * 1) = 1_000 ms
  • 500 test (no header): delayMs = Math.max(0, 1_000) = 1_000 ms
  • Mid-sequence test (no header): delayMs = Math.max(0, 1_000) = 1_000 ms

The existing "touches bound-thread activity" test in this file already demonstrates the pattern: wrap timer-sensitive code in vi.useFakeTimers() / vi.useRealTimers(). Applying the same pattern to the three affected tests eliminates the ~3s slowdown without changing test coverage:

Suggested change
it("retries bot sender on 429 rate-limit and succeeds (#32887)", async () => {
const rateLimitErr = Object.assign(new Error("rate limited"), {
status: 429,
headers: { "retry-after": "0.01" },
});
sendMessageDiscordMock
.mockRejectedValueOnce(rateLimitErr)
.mockResolvedValueOnce({ messageId: "msg-retry" });
await deliverDiscordReply({
replies: [{ text: "hello" }],
target: "channel:123",
token: "token",
runtime,
textLimit: 2000,
});
expect(sendMessageDiscordMock).toHaveBeenCalledTimes(2);
});
it("retries bot sender on 500 server error and succeeds (#32887)", async () => {
const serverErr = Object.assign(new Error("internal"), { status: 500 });
sendMessageDiscordMock
.mockRejectedValueOnce(serverErr)
.mockResolvedValueOnce({ messageId: "msg-ok" });
await deliverDiscordReply({
replies: [{ text: "hello" }],
target: "channel:123",
token: "token",
runtime,
textLimit: 2000,
});
expect(sendMessageDiscordMock).toHaveBeenCalledTimes(2);
});
it("throws immediately on non-retryable 4xx errors (#32887)", async () => {
const clientErr = Object.assign(new Error("forbidden"), { status: 403 });
sendMessageDiscordMock.mockRejectedValueOnce(clientErr);
await expect(
deliverDiscordReply({
replies: [{ text: "hello" }],
target: "channel:123",
token: "token",
runtime,
textLimit: 2000,
}),
).rejects.toThrow("forbidden");
expect(sendMessageDiscordMock).toHaveBeenCalledTimes(1);
});
it("delivers remaining chunks after a mid-sequence retry (#32887)", async () => {
sendMessageDiscordMock
.mockResolvedValueOnce({ messageId: "c1" })
.mockRejectedValueOnce(Object.assign(new Error("rate limited"), { status: 429 }))
.mockResolvedValueOnce({ messageId: "c2-retry" })
.mockResolvedValueOnce({ messageId: "c3" });
await deliverDiscordReply({
replies: [{ text: "A".repeat(6) }],
target: "channel:123",
token: "token",
runtime,
textLimit: 2,
});
expect(sendMessageDiscordMock).toHaveBeenCalledTimes(4);
});
it("retries bot sender on 429 rate-limit and succeeds (#32887)", async () => {
vi.useFakeTimers();
try {
const rateLimitErr = Object.assign(new Error("rate limited"), {
status: 429,
headers: { "retry-after": "0.01" },
});
sendMessageDiscordMock
.mockRejectedValueOnce(rateLimitErr)
.mockResolvedValueOnce({ messageId: "msg-retry" });
await deliverDiscordReply({
replies: [{ text: "hello" }],
target: "channel:123",
token: "token",
runtime,
textLimit: 2000,
});
expect(sendMessageDiscordMock).toHaveBeenCalledTimes(2);
} finally {
vi.useRealTimers();
}
});
it("retries bot sender on 500 server error and succeeds (#32887)", async () => {
vi.useFakeTimers();
try {
const serverErr = Object.assign(new Error("internal"), { status: 500 });
sendMessageDiscordMock
.mockRejectedValueOnce(serverErr)
.mockResolvedValueOnce({ messageId: "msg-ok" });
await deliverDiscordReply({
replies: [{ text: "hello" }],
target: "channel:123",
token: "token",
runtime,
textLimit: 2000,
});
expect(sendMessageDiscordMock).toHaveBeenCalledTimes(2);
} finally {
vi.useRealTimers();
}
});
it("delivers remaining chunks after a mid-sequence retry (#32887)", async () => {
vi.useFakeTimers();
try {
sendMessageDiscordMock
.mockResolvedValueOnce({ messageId: "c1" })
.mockRejectedValueOnce(Object.assign(new Error("rate limited"), { status: 429 }))
.mockResolvedValueOnce({ messageId: "c2-retry" })
.mockResolvedValueOnce({ messageId: "c3" });
await deliverDiscordReply({
replies: [{ text: "A".repeat(6) }],
target: "channel:123",
token: "token",
runtime,
textLimit: 2,
});
expect(sendMessageDiscordMock).toHaveBeenCalledTimes(4);
} finally {
vi.useRealTimers();
}
});
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/discord/monitor/reply-delivery.test.ts
Line: 262-332

Comment:
Three of the four new retry tests invoke real `setTimeout` delays, each adding ~1 second of wall-clock time to the test suite.

Tests 1, 2, and 4 trigger the retry logic in `sendWithRetry()` (line 141 of reply-delivery.ts), which calls `setTimeout(resolve, delayMs)`. The computed delay is:
- **429 test** (`retry-after: "0.01"`): `delayMs = Math.max(10, 1_000 * 1) = 1_000 ms`
- **500 test** (no header): `delayMs = Math.max(0, 1_000) = 1_000 ms`
- **Mid-sequence test** (no header): `delayMs = Math.max(0, 1_000) = 1_000 ms`

The existing "touches bound-thread activity" test in this file already demonstrates the pattern: wrap timer-sensitive code in `vi.useFakeTimers()` / `vi.useRealTimers()`. Applying the same pattern to the three affected tests eliminates the ~3s slowdown without changing test coverage:

```suggestion
  it("retries bot sender on 429 rate-limit and succeeds (#32887)", async () => {
    vi.useFakeTimers();
    try {
      const rateLimitErr = Object.assign(new Error("rate limited"), {
        status: 429,
        headers: { "retry-after": "0.01" },
      });
      sendMessageDiscordMock
        .mockRejectedValueOnce(rateLimitErr)
        .mockResolvedValueOnce({ messageId: "msg-retry" });

      await deliverDiscordReply({
        replies: [{ text: "hello" }],
        target: "channel:123",
        token: "token",
        runtime,
        textLimit: 2000,
      });

      expect(sendMessageDiscordMock).toHaveBeenCalledTimes(2);
    } finally {
      vi.useRealTimers();
    }
  });

  it("retries bot sender on 500 server error and succeeds (#32887)", async () => {
    vi.useFakeTimers();
    try {
      const serverErr = Object.assign(new Error("internal"), { status: 500 });
      sendMessageDiscordMock
        .mockRejectedValueOnce(serverErr)
        .mockResolvedValueOnce({ messageId: "msg-ok" });

      await deliverDiscordReply({
        replies: [{ text: "hello" }],
        target: "channel:123",
        token: "token",
        runtime,
        textLimit: 2000,
      });

      expect(sendMessageDiscordMock).toHaveBeenCalledTimes(2);
    } finally {
      vi.useRealTimers();
    }
  });

  it("delivers remaining chunks after a mid-sequence retry (#32887)", async () => {
    vi.useFakeTimers();
    try {
      sendMessageDiscordMock
        .mockResolvedValueOnce({ messageId: "c1" })
        .mockRejectedValueOnce(Object.assign(new Error("rate limited"), { status: 429 }))
        .mockResolvedValueOnce({ messageId: "c2-retry" })
        .mockResolvedValueOnce({ messageId: "c3" });

      await deliverDiscordReply({
        replies: [{ text: "A".repeat(6) }],
        target: "channel:123",
        token: "token",
        runtime,
        textLimit: 2,
      });

      expect(sendMessageDiscordMock).toHaveBeenCalledTimes(4);
    } finally {
      vi.useRealTimers();
    }
  });
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e9c287ed01

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +108 to +110
await sendWithRetry(() =>
sendMessageDiscord(params.target, text, {
token: params.token,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Retry per physical send instead of whole sendMessageDiscord

Retrying the entire sendMessageDiscord call here is not idempotent and can duplicate output when that function has already sent part of the payload before failing. sendMessageDiscord can emit multiple Discord messages in one call (for example via sendDiscordText chunk loops when maxLinesPerMessage or forum-thread follow-up chunks apply), so if a later internal POST gets a 429/5xx, this wrapper replays earlier successful chunks on the next attempt and users receive duplicate content.

Useful? React with 👍 / 👎.

@thewilloftheshadow
Copy link
Copy Markdown
Member

Superseded by #33226 (combined Discord chunk ordering + retry fixes).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: discord Channel integration: discord size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(discord): chunked message sends silently drop on rate-limit mid-sequence

2 participants