Skip to content

fix(discord): prevent WebSocket death spiral + fix numeric channel ID…#21463

Closed
akropp wants to merge 1 commit intoopenclaw:mainfrom
akropp:fix/discord-websocket-death-spiral
Closed

fix(discord): prevent WebSocket death spiral + fix numeric channel ID…#21463
akropp wants to merge 1 commit intoopenclaw:mainfrom
akropp:fix/discord-websocket-death-spiral

Conversation

@akropp
Copy link
Copy Markdown

@akropp akropp commented Feb 20, 2026

… resolution

Two bugs:

  1. Message handler awaited processDiscordMessage inline, blocking the Discord event listener. Slow agent responses (30-150s) prevented WebSocket heartbeat servicing, causing code 1005/1006 disconnects and reconnect loops. Changed to fire-and-forget with error catching.

  2. Channel resolver compared numeric channel IDs against channel names when config used guildId/channelId format (e.g. '123/456'). The second segment was treated as a name and slug-matched, which never matched numeric IDs. Now matches by ID when the channel query is numeric.

Summary

Describe the problem and fix in 2–5 bullets:

  • Problem:
  • Why it matters:
  • What changed:
  • What did NOT change (scope boundary):

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #
  • Related #

User-visible / Behavior Changes

List user-visible changes (including defaults/config).
If none, write None.

Security Impact (required)

  • New permissions/capabilities? (Yes/No)
  • Secrets/tokens handling changed? (Yes/No)
  • New/changed network calls? (Yes/No)
  • Command/tool execution surface changed? (Yes/No)
  • Data access scope changed? (Yes/No)
  • If any Yes, explain risk + mitigation:

Repro + Verification

Environment

  • OS:
  • Runtime/container:
  • Model/provider:
  • Integration/channel (if any):
  • Relevant config (redacted):

Steps

Expected

Actual

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios:
  • Edge cases checked:
  • What you did not verify:

Compatibility / Migration

  • Backward compatible? (Yes/No)
  • Config/env changes? (Yes/No)
  • Migration needed? (Yes/No)
  • If yes, exact upgrade steps:

Failure Recovery (if this breaks)

  • How to disable/revert this change quickly:
  • Files/config to restore:
  • Known bad symptoms reviewers should watch for:

Risks and Mitigations

List only real risks for this PR. Add/remove entries as needed. If none, write None.

  • Risk:
    • Mitigation:

Greptile Summary

This PR fixes two critical Discord integration bugs:

Changes:

  • WebSocket heartbeat fix (message-handler.ts): Changed processDiscordMessage from blocking (await) to fire-and-forget (void + .catch()) to prevent slow agent responses (30-150s) from blocking the Discord event listener and causing WebSocket heartbeat failures (code 1005/1006 disconnects)
  • Numeric channel ID resolution (resolve-channels.ts): Added numeric ID detection (/^\d+$/) to match channels by ID when using guildId/channelId format (e.g., 123/456), instead of incorrectly treating the numeric channel ID as a name and attempting slug-based matching

Impact:
These are well-targeted fixes that address production stability issues. The fire-and-forget pattern is correct for this use case—errors are still caught and logged, but the event loop remains responsive. The numeric ID fix resolves a logic error where numeric channel IDs in the second segment of guildId/channelId patterns were being compared against channel names instead of channel IDs.

Confidence Score: 4/5

  • This PR is safe to merge with low risk
  • Both fixes are well-scoped and address clear bugs with minimal surface area. The fire-and-forget pattern correctly prevents blocking while maintaining error logging, and the numeric ID matching fix is a straightforward logic correction. Score is 4 (not 5) because the fire-and-forget change alters concurrency behavior in a production-critical path, though the change is sound and necessary.
  • No files require special attention

Last reviewed commit: c396ee3

(2/5) Greptile learns from your feedback when you react with thumbs up/down!

… resolution

Two bugs:

1. Message handler awaited processDiscordMessage inline, blocking the
   Discord event listener. Slow agent responses (30-150s) prevented
   WebSocket heartbeat servicing, causing code 1005/1006 disconnects
   and reconnect loops. Changed to fire-and-forget with error catching.

2. Channel resolver compared numeric channel IDs against channel names
   when config used guildId/channelId format (e.g. '123/456'). The
   second segment was treated as a name and slug-matched, which never
   matched numeric IDs. Now matches by ID when the channel query is
   numeric.
@openclaw-barnacle
Copy link
Copy Markdown

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

@thewilloftheshadow
Copy link
Copy Markdown
Member

Superseded by #33142 which consolidates the fixes from this PR. Closing this in favor of that PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: discord Channel integration: discord size: XS stale Marked as stale due to inactivity

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants