Skip to content

BUG: Discord channel fetch failure crashes gateway (unhandled rejection in GatewayPlugin.registerClient) #37375

@kAIborg24

Description

@kAIborg24

Bug type

Regression (worked before, now fails)

Summary

When the Discord API is unreachable during provider startup, @buape/carbon's GatewayPlugin.registerClient() throws an error that becomes an unhandled promise rejection, crashing the gateway. Combined with systemd auto-restart, this creates an infinite crash loop that persists until the network recovers or Discord is manually disabled. On our system, this caused 76 crashes in a single day (94% of all restarts).

Steps to reproduce

  1. Configure OpenClaw with Discord channel enabled
  2. Run on a system with intermittent network connectivity (WSL2 is a reliable reproducer due to known DNS/TCP instability)
  3. Wait for health-monitor to detect a stale socket (every 300s) during a network blip
  4. Health-monitor restarts the Discord provider
  5. Discord provider initialization calls GatewayPlugin.registerClient(), which fetches https://discord.com/api/v10/gateway/bot
  6. Fetch fails → unhandled promise rejection → process.exit(1)
  7. systemd restarts the gateway → Discord provider starts again → same fetch fails → crash loop

Minimal trigger: Any network interruption lasting >10 seconds during Discord provider startup.

Expected behavior

• Discord channel fetch failure should be treated as a transient network error
• Gateway should log a warning, skip or retry Discord connection, and continue running with other channels (Telegram, etc.)
• A single channel's connectivity issues should never crash the entire gateway

Actual behavior

Gateway crashes with:

[openclaw] Unhandled promise rejection: Error: Failed to get gateway information from Discord: fetch failed
at GatewayPlugin.registerClient (file:///usr/lib/node_modules/openclaw/node_modules/@buape/carbon/dist/src/plugins/gateway/GatewayPlugin.js:73:23)
at process.processTicksAndRejections (node:internal/process/task_queues:105:5)

Followed by: openclaw-gateway.service: Main process exited, code=exited, status=1/FAILURE

Root Cause Analysis

Three bugs interact to create this crash:

Bug 1 (High) — @buape/carbon Client constructor doesn't await async plugin registration

// @buape/carbon Client.js line 122
for (const plugin of plugins) {
plugin.registerClient?.(this); // No await — constructor can't be async
plugin.registerRoutes?.(this);
this.plugins.push({ id: plugin.id, plugin });
}

registerClient is async (returns a Promise), but the constructor calls it synchronously. The rejected Promise has no handler — it's a fire-and-forget async call. This means any error thrown in registerClient becomes an unhandled rejection by design.

Bug 2 (Medium) — @buape/carbon GatewayPlugin strips error cause chain

// @buape/carbon GatewayPlugin.js line 72-73
catch (error) {
throw new Error(Failed to get gateway information from Discord: ${error instanceof Error ? error.message : String(error)});
// Missing: { cause: error }
}

The original TypeError("fetch failed") is discarded. OpenClaw's isTransientNetworkError() walks the error graph via .cause, but there's nothing to find.

Note: OpenClaw's own ProxyGatewayPlugin (for proxy mode) correctly includes { cause: error } — this only affects the non-proxy code path.

Bug 3 (Medium) — OpenClaw's transient error detection misses wrapped "fetch failed" messages

isTransientNetworkError() checks:

• candidate instanceof TypeError && candidate.message === "fetch failed" → ❌ The wrapped error is Error, not TypeError
• message === "fetch failed" (exact match) → ❌ Actual message is "Failed to get gateway information from Discord: fetch failed"
• TRANSIENT_NETWORK_MESSAGE_SNIPPETS.some(s => message.includes(s)) → ❌ No snippet matches

The error falls through to the default process.exit(1) handler.

The Full Kill Chain

Network blip (e.g. WSL2 DNS/TCP instability)
→ health-monitor detects stale-socket (every 300s)
→ restarts Discord provider
→ channel resolve fails (gracefully handled — just a warning)
→ Carbon Client constructor calls registerClient() without await (Bug 1)
→ registerClient fetches discord.com/api/v10/gateway/bot → fails
→ throws Error without { cause } (Bug 2) → unhandled rejection
→ isTransientNetworkError misses wrapped message (Bug 3)
→ process.exit(1) → systemd restart → repeat

Timing-Dependent Behavior

The crash only occurs when the network is still down when registerClient's fetch fires (~15-25s after provider start). Of 99 channel resolve failures observed, only 76 led to crashes — the other 23 times, the network recovered before registerClient needed it.

During crash loops, systemd restarts in ~10s, and Discord immediately retries registerClient. If the outage lasts longer than the restart window, every restart hits the same failure. One cluster produced 7 consecutive crashes in 3 minutes.

Proposed Fixes

Any single fix would break the crash chain

Fix Where What
A @buape/carbon GatewayPlugin.js Add { cause: error } to the rethrown Error — preserves error chain for isTransientNetworkError
B OpenClaw isTransientNetworkError Add "fetch failed" to TRANSIENT_NETWORK_MESSAGE_SNIPPETS — catches wrapped messages
C OpenClaw isTransientNetworkError Change message === "fetch failed" to message.includes("fetch failed") — more targeted variant of B
D OpenClaw Discord provider Wrap registerClient call in try/catch with retry/backoff, or catch the unhandled rejection from the Client constructor — prevents it from ever reaching the global handler

Recommendation: Fix D is the most robust (handles the unawaited async call at the source). Fix B or C provides a safety net for any future wrapped transient errors. Fixes A and B could be reported upstream to @buape/carbon.

OpenClaw version

2026.3.2

Operating system

Linux (WSL2, x64)

Install method

pnpm (global)

Logs, screenshots, and evidence

Crash Statistics (single day)

• 81 total gateway restarts
• 76 caused by Discord fetch failure (94%)
• 6 caused by TLS/setSession bug (separate issue: #36585)
• 62 triggered by health-monitor stale-socket detection
• Uptime between crashes during loop: ~25-40 seconds

Representative crash sequence (one of 76):

[health-monitor] [discord:default] health-monitor: restarting (reason: stale-socket)
[discord] [default] starting provider
[discord] channel resolve failed; using config entries. fetch failed
[discord] failed to deploy native commands: fetch failed
[openclaw] Unhandled promise rejection: Error: Failed to get gateway information from Discord: fetch failed
    at GatewayPlugin.registerClient (file:///usr/lib/node_modules/openclaw/node_modules/@buape/carbon/dist/src/plugins/gateway/GatewayPlugin.js:73:23)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
openclaw-gateway.service: Main process exited, code=exited, status=1/FAILURE
openclaw-gateway.service: Scheduled restart job, restart counter is at 61.

Note: Both channel resolve and command deploy fail gracefully (caught and logged). Only registerClient is unhandled.

Workaround

Disable Discord channel (channels.discord.enabled: false + plugins.entries.discord.enabled: false). This eliminated 94% of crashes immediately.

Related Issues

• #36585 — TLS setSession null dereference crash (different bug, same process.exit(1) handler behavior)#36588 — PR for TLS fix (adds isTlsSocketNullDeref detection)

Impact and severity

Severity: Critical — causes gateway crash loop (complete service outage)

Impact:

• Any user with Discord enabled on an unstable network (WSL2, VPN, mobile hotspot, flaky ISP) is vulnerable
• A single network blip >10 seconds can trigger an infinite crash loop requiring manual intervention (disable Discord or wait for network recovery)
• All other channels (Telegram, WhatsApp, etc.) go down with Discord — one channel's failure takes out the entire gateway
• 76 crashes in a single day observed; worst cluster was 7 crashes in 3 minutes
• No user-facing warning before the loop starts — it just dies

Frequency: Reliable reproducer on WSL2; likely affects any environment with intermittent connectivity

Additional information

Node.js: v22.22.0
@buape/carbon: 0.0.0-beta-20260216184201 (bundled with OpenClaw)
Gateway service: systemd (user), Restart=always

• The earlyGatewayErrorGuard mechanism in the Discord provider catches gateway emitter "error" events, but registerClient throws a regular Error from an async function — different error propagation path, so the guard doesn't help here
• Telegram (Grammy-based) handles connection failures internally with retries — the failure never escapes as an unhandled rejection, which is why Telegram never crashes from network blips
• The crash is probabilistic: depends on whether the network is still down when registerClient's fetch fires. Sustained outages (>10s) reliably trigger the loop; brief blips (<15s) may not

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingregressionBehavior that previously worked and now fails

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions