Skip to content

fix(discord): retry bot sends on rate limit and server errors during chunk delivery#33054

Closed
AnCoSONG wants to merge 1 commit intoopenclaw:mainfrom
AnCoSONG:fix/discord-chunk-rate-limit-retry
Closed

fix(discord): retry bot sends on rate limit and server errors during chunk delivery#33054
AnCoSONG wants to merge 1 commit intoopenclaw:mainfrom
AnCoSONG:fix/discord-chunk-rate-limit-retry

Conversation

@AnCoSONG
Copy link
Copy Markdown

@AnCoSONG AnCoSONG commented Mar 3, 2026

Summary

  • Problem: When a long agent response is split into multiple Discord message chunks (>2000 chars), if any chunk hits a 429 rate-limit or 5xx error mid-sequence, the error propagates up and all remaining chunks are silently dropped.
  • Why it matters: Users see partial message delivery — the first N chunks arrive but the rest never show up. Common with long responses that get split into 5+ chunks.
  • What changed: Added sendWithRetry() wrapper around the bot sender path in sendDiscordChunkWithFallback and sendAdditionalDiscordMedia. Retries up to 2 times for 429/5xx with backoff, respects retry-after header, and immediately throws for non-retryable 4xx errors.
  • What did NOT change (scope boundary): Webhook delivery path already has its own fallback. No changes to chunking logic or Discord API client.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

User-visible / Behavior Changes

  • Discord message chunks that encounter transient 429/5xx errors are now retried (up to 2 times) instead of being silently dropped.
  • Non-retryable errors (4xx) still fail immediately.

Security Impact (required)

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No — same Discord API calls, just retried on failure)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)

Repro + Verification

Environment

  • OS: macOS
  • Runtime/container: Node 22 + Vitest
  • Integration/channel: Discord

Steps

  1. Configure an agent that produces long responses (>10,000 chars)
  2. The response gets chunked into 5+ Discord messages
  3. If Discord rate-limits any chunk mid-sequence, remaining chunks are now retried

Expected

  • All chunks are delivered even when transient errors occur mid-sequence.

Actual

  • All 13 tests pass, including 4 new tests for retry on 429, retry on 500, no retry on 400, and exhausted retries.

Evidence

  • Failing test/log before + passing after

Human Verification (required)

  • Verified scenarios: pnpm test src/discord/monitor/reply-delivery.test.ts — 13/13 pass
  • Edge cases checked: 429 with retry-after, 500 server error, 400 client error (no retry), exhausted retries (3 total attempts)
  • What you did not verify: live Discord delivery with real rate limits

Compatibility / Migration

  • Backward compatible? (Yes)
  • Config/env changes? (No)
  • Migration needed? (No)

Failure Recovery (if this breaks)

  • How to disable/revert this change quickly: Revert this PR commit.
  • Files/config to restore: src/discord/monitor/reply-delivery.ts
  • Known bad symptoms reviewers should watch for: Delayed message delivery due to retry backoff; unlikely since max delay is ~3s total.

Risks and Mitigations

  • Risk: Retry delay could add latency to message delivery when Discord is rate-limiting aggressively.
    • Mitigation: Max 2 retries with short backoff (1-2s). Total worst-case delay is ~3s, which is acceptable vs losing all remaining chunks.

Made with Cursor

…chunk delivery

When a long reply is split into multiple Discord message chunks, a 429
rate-limit or 5xx error on any chunk causes all remaining chunks to be
silently dropped. Add sendWithRetry() that retries up to 2 times with
backoff for retryable errors (429/5xx) and respects retry-after headers.

Closes openclaw#32887

Made-with: Cursor
@openclaw-barnacle openclaw-barnacle bot added channel: discord Channel integration: discord size: S labels Mar 3, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 3, 2026

Greptile Summary

This PR adds a sendWithRetry helper to the Discord bot sender path so that transient 429/5xx errors during chunk delivery trigger up to 2 retries with backoff instead of silently dropping remaining message chunks.

The retry logic is sound — isRetryableDiscordError correctly distinguishes retryable (429, 5xx) from non-retryable (4xx) errors and honours the retry-after header. However, the fix is incomplete: a third sendMessageDiscord call at line 296 (the non-voice, media-with-text payload path) is not wrapped in sendWithRetry, so that path still fails immediately on 429/5xx errors.

Additionally, the four new retry tests use real setTimeout delays instead of vi.useFakeTimers(), adding ~5 seconds of wall-clock sleep to each test run. The tests should adopt the mocking pattern already shown in the "touches bound-thread activity" test.

Confidence Score: 2/5

  • The retry logic is correct, but the fix is incomplete — one of three bot sender paths still lacks retry protection.
  • The core retry mechanism works correctly and all tests pass, but the implementation doesn't fully address the stated problem. The PR claims to fix silent chunk dropping on transient 429/5xx errors, yet one of the three sendMessageDiscord call sites (the media-with-text payload path at line 296) remains unwrapped. This means the bug still exists for that specific payload type. Test efficiency can also be improved without changing behavior. These are fixable issues, but they indicate incomplete implementation.
  • src/discord/monitor/reply-delivery.ts requires wrapping the media-with-text call (line 296) in sendWithRetry. src/discord/monitor/reply-delivery.test.ts should use vi.useFakeTimers() in retry tests.

Last reviewed commit: 88009d4

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +262 to +329
it("retries bot send on 429 rate limit then succeeds", async () => {
const rateLimitErr = Object.assign(new Error("rate limited"), { status: 429 });
sendMessageDiscordMock
.mockRejectedValueOnce(rateLimitErr)
.mockResolvedValueOnce({ messageId: "msg-1", channelId: "channel-1" });

await deliverDiscordReply({
replies: [{ text: "retry me" }],
target: "channel:123",
token: "token",
runtime,
textLimit: 2000,
});

expect(sendMessageDiscordMock).toHaveBeenCalledTimes(2);
});

it("retries bot send on 500 server error then succeeds", async () => {
const serverErr = Object.assign(new Error("internal"), { status: 500 });
sendMessageDiscordMock
.mockRejectedValueOnce(serverErr)
.mockResolvedValueOnce({ messageId: "msg-1", channelId: "channel-1" });

await deliverDiscordReply({
replies: [{ text: "retry me" }],
target: "channel:123",
token: "token",
runtime,
textLimit: 2000,
});

expect(sendMessageDiscordMock).toHaveBeenCalledTimes(2);
});

it("does not retry on 4xx client errors", async () => {
const clientErr = Object.assign(new Error("bad request"), { status: 400 });
sendMessageDiscordMock.mockRejectedValueOnce(clientErr);

await expect(
deliverDiscordReply({
replies: [{ text: "fail" }],
target: "channel:123",
token: "token",
runtime,
textLimit: 2000,
}),
).rejects.toThrow("bad request");

expect(sendMessageDiscordMock).toHaveBeenCalledTimes(1);
});

it("throws after exhausting retry attempts", async () => {
const rateLimitErr = Object.assign(new Error("rate limited"), { status: 429 });
sendMessageDiscordMock.mockRejectedValue(rateLimitErr);

await expect(
deliverDiscordReply({
replies: [{ text: "persistent failure" }],
target: "channel:123",
token: "token",
runtime,
textLimit: 2000,
}),
).rejects.toThrow("rate limited");

// 1 initial + 2 retries = 3 attempts
expect(sendMessageDiscordMock).toHaveBeenCalledTimes(3);
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retry tests run real timers instead of mocked timers, which adds ~5 seconds of wall-clock sleep to the test suite:

  • "retries bot send on 429…" — sleeps 1000 ms
  • "retries bot send on 500…" — sleeps 1000 ms
  • "throws after exhausting retry attempts" — sleeps 1000 + 2000 = 3000 ms

The "touches bound-thread activity" test (line 214) demonstrates the correct pattern with vi.useFakeTimers(). Consider wrapping each retry test similarly:

it("retries bot send on 429 rate limit then succeeds", async () => {
  vi.useFakeTimers();
  try {
    const rateLimitErr = Object.assign(new Error("rate limited"), { status: 429 });
    sendMessageDiscordMock
      .mockRejectedValueOnce(rateLimitErr)
      .mockResolvedValueOnce({ messageId: "msg-1", channelId: "channel-1" });

    const promise = deliverDiscordReply({ ... });
    await vi.runAllTimersAsync();
    await promise;

    expect(sendMessageDiscordMock).toHaveBeenCalledTimes(2);
  } finally {
    vi.useRealTimers();
  }
});

This eliminates the real delays while still verifying retry logic.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/discord/monitor/reply-delivery.test.ts
Line: 262-329

Comment:
Retry tests run real timers instead of mocked timers, which adds ~5 seconds of wall-clock sleep to the test suite:

- "retries bot send on 429…" — sleeps 1000 ms
- "retries bot send on 500…" — sleeps 1000 ms  
- "throws after exhausting retry attempts" — sleeps 1000 + 2000 = 3000 ms

The "touches bound-thread activity" test (line 214) demonstrates the correct pattern with `vi.useFakeTimers()`. Consider wrapping each retry test similarly:

```ts
it("retries bot send on 429 rate limit then succeeds", async () => {
  vi.useFakeTimers();
  try {
    const rateLimitErr = Object.assign(new Error("rate limited"), { status: 429 });
    sendMessageDiscordMock
      .mockRejectedValueOnce(rateLimitErr)
      .mockResolvedValueOnce({ messageId: "msg-1", channelId: "channel-1" });

    const promise = deliverDiscordReply({ ... });
    await vi.runAllTimersAsync();
    await promise;

    expect(sendMessageDiscordMock).toHaveBeenCalledTimes(2);
  } finally {
    vi.useRealTimers();
  }
});
```

This eliminates the real delays while still verifying retry logic.

How can I resolve this? If you propose a fix, please make it concise.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 3, 2026

Additional Comments (1)

src/discord/monitor/reply-delivery.ts
This sendMessageDiscord call (non-voice, media-with-text path) is not wrapped in sendWithRetry, leaving it vulnerable to the same 429/5xx failures that the PR claims to fix.

Compare with the identical call pattern in sendDiscordChunkWithFallback (line 143-150) and sendAdditionalDiscordMedia (line 163-171), which both wrap sendMessageDiscord in sendWithRetry. This path should also be wrapped with the same retry pattern to maintain consistency and ensure users sending text + media payloads receive the same transient-error resilience as other delivery paths.

Without this wrapper, users sending text + media payloads that encounter transient errors will still experience silent delivery failure.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/discord/monitor/reply-delivery.ts
Line: 295-302

Comment:
This `sendMessageDiscord` call (non-voice, media-with-text path) is not wrapped in `sendWithRetry`, leaving it vulnerable to the same 429/5xx failures that the PR claims to fix.

Compare with the identical call pattern in `sendDiscordChunkWithFallback` (line 143-150) and `sendAdditionalDiscordMedia` (line 163-171), which both wrap `sendMessageDiscord` in `sendWithRetry`. This path should also be wrapped with the same retry pattern to maintain consistency and ensure users sending text + media payloads receive the same transient-error resilience as other delivery paths.

Without this wrapper, users sending text + media payloads that encounter transient errors will still experience silent delivery failure.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 88009d4a65

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

const RETRY_BASE_DELAY_MS = 1000;

function isRetryableDiscordError(err: unknown): { retryable: boolean; retryAfterMs: number } {
const status = (err as { status?: number }).status ?? (err as { statusCode?: number }).statusCode;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Parse nested Discord status before deciding retryability

The new retry classifier only checks err.status and err.statusCode, but Discord failures in this repo are also handled with err.response.status (see extractDiscordErrorStatus in src/discord/monitor/thread-bindings.discord-api.ts). If sendMessageDiscord throws that shape for a 429 or 5xx, this function marks it non-retryable and the remaining chunks are still dropped after the first failed send, so the fix does not cover those real error objects.

Useful? React with 👍 / 👎.

@thewilloftheshadow
Copy link
Copy Markdown
Member

Superseded by #33226 (combined Discord chunk ordering + retry fixes).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: discord Channel integration: discord size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(discord): chunked message sends silently drop on rate-limit mid-sequence

2 participants