Skip to content

fix(discord): chunked message sends silently drop on rate-limit mid-sequence #32887

@proto-genesys-x

Description

@proto-genesys-x

Bug

When a long agent response is split into multiple Discord message chunks (>2000 chars), if any chunk hits a 429 rate-limit or 5xx error mid-sequence, the error propagates up from sendMessageDiscord and all remaining chunks are silently dropped.

The webhook path in sendDiscordChunkWithFallback has a try/catch with fallback, but the bot sender path (sendMessageDiscord) has no error handling or retry logic.

Impact

Partial message delivery: users see the first N chunks but the rest never arrive. The agent session transcript shows the full response was generated, but it never reached Discord. This is particularly common with long responses that get split into 5+ chunks firing in rapid succession.

Steps to Reproduce

  1. Configure an agent that produces long responses (>10,000 chars)
  2. The response gets chunked into 5+ Discord messages
  3. If Discord rate-limits any chunk mid-sequence, remaining chunks are lost
  4. No error is surfaced to the user or logged

Proposed Fix

Add sendWithRetry() wrapper around the bot sender path with:

  • Up to 2 retry attempts for 429 and 5xx errors
  • Backoff with retry-after header support
  • Immediate throw for non-retryable errors (4xx)
--- a/src/discord/monitor/reply-delivery.ts
+++ b/src/discord/monitor/reply-delivery.ts
@@ -105,12 +105,44 @@ async function sendDiscordChunkWithFallback(params: {
       // Fall through to the standard bot sender path.
     }
   }
-  await sendMessageDiscord(params.target, text, {
-    token: params.token,
-    rest: params.rest,
-    accountId: params.accountId,
-    replyTo: params.replyTo,
-  });
+  await sendWithRetry(() =>
+    sendMessageDiscord(params.target, text, {
+      token: params.token,
+      rest: params.rest,
+      accountId: params.accountId,
+      replyTo: params.replyTo,
+    }),
+  );
+}
+
+const RETRY_ATTEMPTS = 2;
+const RETRY_BASE_DELAY_MS = 1000;
+
+async function sendWithRetry(fn: () => Promise<unknown>): Promise<void> {
+  for (let attempt = 0; attempt <= RETRY_ATTEMPTS; attempt++) {
+    try {
+      await fn();
+      return;
+    } catch (err: unknown) {
+      const isLast = attempt === RETRY_ATTEMPTS;
+      if (isLast) {
+        throw err;
+      }
+      const status =
+        (err as { status?: number }).status ?? (err as { statusCode?: number }).statusCode;
+      if (status === 429 || (status !== undefined && status >= 500)) {
+        const retryAfterMs =
+          Number((err as { headers?: Record<string, string> }).headers?.["retry-after"]) * 1000 || 0;
+        const delayMs = Math.max(retryAfterMs, RETRY_BASE_DELAY_MS * (attempt + 1));
+        await new Promise((resolve) => setTimeout(resolve, delayMs));
+        continue;
+      }
+      throw err;
+    }
+  }
 }

File: src/discord/monitor/reply-delivery.ts

Happy to open a PR if the approach looks good — our fork currently cannot push branches due to OAuth workflow scope limitations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions