Skip to content

Health monitor false-positive stale-socket restarts on idle Discord channels #58339

@fanyangCS

Description

@fanyangCS

Problem

The channel health monitor restarts idle-but-connected Discord channels every ~35 minutes. When no dispatch events occur within the 30-minute staleEventThresholdMs window, evaluateChannelHealth() returns stale-socket — even though the WebSocket is alive.

Impact: 63 unnecessary restarts over 3 days (23 on Mar 29, 39 on Mar 30, 1 on Mar 31 before fix).

[health-monitor] [discord:default] health-monitor: restarting (reason: stale-socket)
# repeats every ~35 min (30 min threshold + 5 min check interval)

Root Cause

In channel-health-policy.ts, the eventAge > staleEventThresholdMs branch returns stale-socket without checking snapshot.connected:

if (eventAge > policy.staleEventThresholdMs) {
  return { healthy: false, reason: "stale-socket" };  // no connected check
}

This conflates "no user activity" with "dead socket". Checking snapshot.connected is sufficient because Discord Gateway uses a single WebSocket for both heartbeats and dispatch events (opcode 0). If the connection were truly dead, heartbeat ACKs would stop and discord.js would set connected = false. There is no scenario where heartbeats succeed but event delivery silently fails.

Fix

     const eventAge = policy.now - snapshot.lastEventAt;
     if (eventAge > policy.staleEventThresholdMs) {
+      if (snapshot.connected === true) {
+        return { healthy: true, reason: "healthy" };
+      }
       return { healthy: false, reason: "stale-socket" };
     }

Result

After deploying at 2026-03-31T00:42Z: 0 stale-socket restarts in 9+ hours (previously ~2/hour). healthVersion reached 215+ confirming repeated healthy evaluations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions