-
-
Notifications
You must be signed in to change notification settings - Fork 69.4k
[Bug]: Discord plugin does not reconnect after network outage during initial startup #51370
Description
Bug type
Behavior bug (incorrect output/state without crash)
Summary
When the Discord plugin's initial startup occurs while the network is unavailable (e.g., ISP outage), all API calls (fetchDiscordGatewayInfo, deployDiscordCommands, fetchUser("@me")) fail with fetch failed. The plugin logs "logged in to discord" despite never establishing a WebSocket connection, then enters a permanently dead state with no reconnection attempts. The service process stays alive, so systemd Restart=always never triggers recovery. Manual restart is required after network recovery.
Steps to reproduce
- Run OpenClaw gateway via systemd user service with Discord channel configured
- Lose upstream internet connectivity (ISP outage; local network and host remain up on UPS)
- The gateway service is running throughout — it never crashes or restarts
- Internet connectivity recovers
- Observe that Discord remains disconnected indefinitely — no messages are sent or received
- Confirm via
journalctl --user -u openclaw-gatewaythat the Discord plugin logged startup errors but never retried
Expected behavior
After the initial connection fails due to a transient network outage, the Discord plugin should retry the initial startup sequence with exponential backoff until it successfully connects. Alternatively, it should exit/crash on startup failure so that systemd's Restart=always handles recovery.
Actual behavior
The plugin logs multiple fetch failed errors during startup, then logs logged in to discord (without bot identity or WebSocket), and sits in a dead state permanently. reconnectAttempts=0 and gatewayConnected=false are logged — no retry ever occurs. The gateway lifecycle runner (runDiscordGatewayLifecycle) only handles reconnection after a successful initial connection; the gap is in the initial startup sequence.
OpenClaw version
2026.3.13 (61d171a)
Operating system
Debian 12 (Bookworm) aarch64 — Raspberry Pi 5, kernel 6.12.75+rpt-rpi-2712
Install method
npm global (Homebrew node v25.8.1 via linuxbrew)
Model
anthropic/claude-opus-4-6
Provider / routing chain
openclaw -> anthropic (direct)
Additional provider/model setup details
NOT_ENOUGH_INFO — provider/model is not relevant to this bug. The issue is in the Discord channel plugin startup, not in model routing.
Logs, screenshots, and evidence
Full journald output from the failed startup (PID 769) through manual restart (PID 85563):
# Failed startup at 20:16 CDT — internet was down (ISP outage, host on UPS)
Mar 20 20:16:28 <hostname> node[PID1]: 2026-03-20T20:16:28.177-05:00 [discord] startup [default] deploy-commands:start 2624ms native=on commandCount=65 gatewayConnected=false reconnectAttempts=0
Mar 20 20:16:28 <hostname> node[PID1]: 2026-03-20T20:16:28.181-05:00 [discord] startup [default] deploy-rest:put:start 2628ms path=/applications/REDACTED/commands commands=65 bytes=19920
Mar 20 20:16:28 <hostname> node[PID1]: 2026-03-20T20:16:28.187-05:00 [openclaw] Non-fatal unhandled rejection (continuing): Error: Failed to get gateway information from Discord: fetch failed
Mar 20 20:16:28 <hostname> node[PID1]: at createGatewayMetadataError (file:///...openclaw/dist/reply-Bm8VrLQh.js:133733:31)
Mar 20 20:16:28 <hostname> node[PID1]: at fetchDiscordGatewayInfo (file:///...openclaw/dist/reply-Bm8VrLQh.js:133747:9)
Mar 20 20:16:28 <hostname> node[PID1]: at SafeGatewayPlugin.registerClient (file:///...openclaw/dist/reply-Bm8VrLQh.js:133789:46)
Mar 20 20:16:28 <hostname> node[PID1]: 2026-03-20T20:16:28.194-05:00 [discord] startup [default] deploy-rest:put:error 2641ms path=/applications/REDACTED/commands requestMs=13 error=fetch failed
Mar 20 20:16:28 <hostname> node[PID1]: 2026-03-20T20:16:28.197-05:00 [discord] failed to deploy native commands: fetch failed
Mar 20 20:16:28 <hostname> node[PID1]: 2026-03-20T20:16:28.203-05:00 [discord] startup [default] deploy-commands:done 2648ms gatewayConnected=false reconnectAttempts=0
Mar 20 20:16:28 <hostname> node[PID1]: 2026-03-20T20:16:28.206-05:00 [discord] startup [default] fetch-bot-identity:start 2654ms gatewayConnected=false reconnectAttempts=0
Mar 20 20:16:28 <hostname> node[PID1]: 2026-03-20T20:16:28.215-05:00 [discord] failed to fetch bot identity: TypeError: fetch failed
Mar 20 20:16:28 <hostname> node[PID1]: 2026-03-20T20:16:28.219-05:00 [discord] startup [default] fetch-bot-identity:error 2666ms TypeError: fetch failed gatewayConnected=false reconnectAttempts=0
Mar 20 20:16:28 <hostname> node[PID1]: 2026-03-20T20:16:28.222-05:00 [discord] logged in to discord
# Dead state persists for 12+ minutes until manual restart at 20:28 CDT (internet was back)
Mar 20 20:28:01 <hostname> sudo[PID]: <user> : TTY=pts/0 ; PWD=/home/<user> ; USER=root ; COMMAND=/usr/bin/systemctl restart openclaw-gateway
# Successful startup after restart at 20:29 CDT
Mar 20 20:29:35 <hostname> node[PID2]: 2026-03-20T20:29:35.427-05:00 [discord] startup [default] gateway-debug 959ms WebSocket connection opened
Mar 20 20:29:35 <hostname> node[PID2]: 2026-03-20T20:29:35.568-05:00 [discord] startup [default] fetch-bot-identity:done 1100ms botUserId=REDACTED botUserName=Sparky gatewayConnected=false reconnectAttempts=0
Mar 20 20:29:35 <hostname> node[PID2]: 2026-03-20T20:29:35.608-05:00 [discord] logged in to discord as REDACTED (Sparky)
Key observations:
- Failed startup:
logged in to discord(no identity, no WebSocket) - Successful startup:
logged in to discord as REDACTED (Sparky)(identity resolved, WebSocket opened) reconnectAttempts=0throughout — no retry logic executedgatewayConnected=falsethroughout — WebSocket never opened in the failed case- The
Non-fatal unhandled rejectionfromSafeGatewayPlugin.registerClient→fetchDiscordGatewayInfois caught and swallowed
Impact and severity
Affected: Any OpenClaw instance with Discord channel configured that experiences network loss during gateway startup or restart
Severity: High — Discord channel is completely non-functional with no self-recovery; requires manual intervention
Frequency: Deterministic — always occurs when network is unavailable during Discord plugin startup
Consequence: Missed messages, no agent responses on Discord until someone notices and manually restarts the service. Particularly impactful for headless/unattended deployments (Raspberry Pi, VPS) where the operator may not notice for hours.
Additional information
The code path is in the bundled reply-Bm8VrLQh.js, source likely in extensions/discord/src/setup-core.ts or extensions/discord/src/client.ts. The startup sequence calls fetchDiscordGatewayInfo, deployDiscordCommands, and fetchUser("@me") — all three fail silently (errors are logged but caught). The plugin then calls runDiscordGatewayLifecycle which handles reconnection, but only after an initial successful connection.
A retry.ts module exists in the Discord extension source (extensions/discord/src/retry.ts) but does not appear to be wired into the initial startup path.
Workaround: External cron watchdog that checks journald for the failure pattern (fetch failed + no WebSocket connection opened) and restarts the systemd service.