Skip to content

[Bug]: Discord plugin does not reconnect after network outage during initial startup #51370

@bhooker-github

Description

@bhooker-github

Bug type

Behavior bug (incorrect output/state without crash)

Summary

When the Discord plugin's initial startup occurs while the network is unavailable (e.g., ISP outage), all API calls (fetchDiscordGatewayInfo, deployDiscordCommands, fetchUser("@me")) fail with fetch failed. The plugin logs "logged in to discord" despite never establishing a WebSocket connection, then enters a permanently dead state with no reconnection attempts. The service process stays alive, so systemd Restart=always never triggers recovery. Manual restart is required after network recovery.

Steps to reproduce

  1. Run OpenClaw gateway via systemd user service with Discord channel configured
  2. Lose upstream internet connectivity (ISP outage; local network and host remain up on UPS)
  3. The gateway service is running throughout — it never crashes or restarts
  4. Internet connectivity recovers
  5. Observe that Discord remains disconnected indefinitely — no messages are sent or received
  6. Confirm via journalctl --user -u openclaw-gateway that the Discord plugin logged startup errors but never retried

Expected behavior

After the initial connection fails due to a transient network outage, the Discord plugin should retry the initial startup sequence with exponential backoff until it successfully connects. Alternatively, it should exit/crash on startup failure so that systemd's Restart=always handles recovery.

Actual behavior

The plugin logs multiple fetch failed errors during startup, then logs logged in to discord (without bot identity or WebSocket), and sits in a dead state permanently. reconnectAttempts=0 and gatewayConnected=false are logged — no retry ever occurs. The gateway lifecycle runner (runDiscordGatewayLifecycle) only handles reconnection after a successful initial connection; the gap is in the initial startup sequence.

OpenClaw version

2026.3.13 (61d171a)

Operating system

Debian 12 (Bookworm) aarch64 — Raspberry Pi 5, kernel 6.12.75+rpt-rpi-2712

Install method

npm global (Homebrew node v25.8.1 via linuxbrew)

Model

anthropic/claude-opus-4-6

Provider / routing chain

openclaw -> anthropic (direct)

Additional provider/model setup details

NOT_ENOUGH_INFO — provider/model is not relevant to this bug. The issue is in the Discord channel plugin startup, not in model routing.

Logs, screenshots, and evidence

Full journald output from the failed startup (PID 769) through manual restart (PID 85563):

# Failed startup at 20:16 CDT — internet was down (ISP outage, host on UPS)
Mar 20 20:16:28 <hostname> node[PID1]: 2026-03-20T20:16:28.177-05:00 [discord] startup [default] deploy-commands:start 2624ms native=on commandCount=65 gatewayConnected=false reconnectAttempts=0
Mar 20 20:16:28 <hostname> node[PID1]: 2026-03-20T20:16:28.181-05:00 [discord] startup [default] deploy-rest:put:start 2628ms path=/applications/REDACTED/commands commands=65 bytes=19920
Mar 20 20:16:28 <hostname> node[PID1]: 2026-03-20T20:16:28.187-05:00 [openclaw] Non-fatal unhandled rejection (continuing): Error: Failed to get gateway information from Discord: fetch failed
Mar 20 20:16:28 <hostname> node[PID1]:     at createGatewayMetadataError (file:///...openclaw/dist/reply-Bm8VrLQh.js:133733:31)
Mar 20 20:16:28 <hostname> node[PID1]:     at fetchDiscordGatewayInfo (file:///...openclaw/dist/reply-Bm8VrLQh.js:133747:9)
Mar 20 20:16:28 <hostname> node[PID1]:     at SafeGatewayPlugin.registerClient (file:///...openclaw/dist/reply-Bm8VrLQh.js:133789:46)
Mar 20 20:16:28 <hostname> node[PID1]: 2026-03-20T20:16:28.194-05:00 [discord] startup [default] deploy-rest:put:error 2641ms path=/applications/REDACTED/commands requestMs=13 error=fetch failed
Mar 20 20:16:28 <hostname> node[PID1]: 2026-03-20T20:16:28.197-05:00 [discord] failed to deploy native commands: fetch failed
Mar 20 20:16:28 <hostname> node[PID1]: 2026-03-20T20:16:28.203-05:00 [discord] startup [default] deploy-commands:done 2648ms gatewayConnected=false reconnectAttempts=0
Mar 20 20:16:28 <hostname> node[PID1]: 2026-03-20T20:16:28.206-05:00 [discord] startup [default] fetch-bot-identity:start 2654ms gatewayConnected=false reconnectAttempts=0
Mar 20 20:16:28 <hostname> node[PID1]: 2026-03-20T20:16:28.215-05:00 [discord] failed to fetch bot identity: TypeError: fetch failed
Mar 20 20:16:28 <hostname> node[PID1]: 2026-03-20T20:16:28.219-05:00 [discord] startup [default] fetch-bot-identity:error 2666ms TypeError: fetch failed gatewayConnected=false reconnectAttempts=0
Mar 20 20:16:28 <hostname> node[PID1]: 2026-03-20T20:16:28.222-05:00 [discord] logged in to discord

# Dead state persists for 12+ minutes until manual restart at 20:28 CDT (internet was back)
Mar 20 20:28:01 <hostname> sudo[PID]:       <user> : TTY=pts/0 ; PWD=/home/<user> ; USER=root ; COMMAND=/usr/bin/systemctl restart openclaw-gateway

# Successful startup after restart at 20:29 CDT
Mar 20 20:29:35 <hostname> node[PID2]: 2026-03-20T20:29:35.427-05:00 [discord] startup [default] gateway-debug 959ms WebSocket connection opened
Mar 20 20:29:35 <hostname> node[PID2]: 2026-03-20T20:29:35.568-05:00 [discord] startup [default] fetch-bot-identity:done 1100ms botUserId=REDACTED botUserName=Sparky gatewayConnected=false reconnectAttempts=0
Mar 20 20:29:35 <hostname> node[PID2]: 2026-03-20T20:29:35.608-05:00 [discord] logged in to discord as REDACTED (Sparky)

Key observations:

  • Failed startup: logged in to discord (no identity, no WebSocket)
  • Successful startup: logged in to discord as REDACTED (Sparky) (identity resolved, WebSocket opened)
  • reconnectAttempts=0 throughout — no retry logic executed
  • gatewayConnected=false throughout — WebSocket never opened in the failed case
  • The Non-fatal unhandled rejection from SafeGatewayPlugin.registerClientfetchDiscordGatewayInfo is caught and swallowed

Impact and severity

Affected: Any OpenClaw instance with Discord channel configured that experiences network loss during gateway startup or restart
Severity: High — Discord channel is completely non-functional with no self-recovery; requires manual intervention
Frequency: Deterministic — always occurs when network is unavailable during Discord plugin startup
Consequence: Missed messages, no agent responses on Discord until someone notices and manually restarts the service. Particularly impactful for headless/unattended deployments (Raspberry Pi, VPS) where the operator may not notice for hours.

Additional information

The code path is in the bundled reply-Bm8VrLQh.js, source likely in extensions/discord/src/setup-core.ts or extensions/discord/src/client.ts. The startup sequence calls fetchDiscordGatewayInfo, deployDiscordCommands, and fetchUser("@me") — all three fail silently (errors are logged but caught). The plugin then calls runDiscordGatewayLifecycle which handles reconnection, but only after an initial successful connection.

A retry.ts module exists in the Discord extension source (extensions/discord/src/retry.ts) but does not appear to be wired into the initial startup path.

Workaround: External cron watchdog that checks journald for the failure pattern (fetch failed + no WebSocket connection opened) and restarts the systemd service.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions