Skip to content

[Bug P2] scheduleReconnect() has no max retry limit — infinite reconnect loop #45469

@chrislro

Description

@chrislro

Summary

scheduleReconnect() in src/gateway/client.ts retries indefinitely with exponential backoff capped at 30s, but has no maximum retry count. If the gateway is unreachable, the node client will loop forever, accumulating WebSocket listeners and zombie processes.

Root Cause

private scheduleReconnect() {
  if (this.closed) return;
  const delay = this.backoffMs;
  this.backoffMs = Math.min(this.backoffMs * 2, 30_000);
  setTimeout(() => this.start(), delay).unref(); // always retries, no limit
}

Impact

  • Infinite loop when gateway is permanently unreachable
  • Each reconnect attempt creates a new WebSocket + event listeners
  • Observed: 272 zombie processes in Railway container after 266-day uptime
  • Mac node CPU sustained at 16-18% during idle (expected <1%)

Suggested Fix

private retryCount = 0;
private readonly maxRetries = 10;

private scheduleReconnect() {
  if (this.closed) return;
  if (this.retryCount >= this.maxRetries) {
    this.opts.onConnectError?.(new Error("Max reconnection attempts reached"));
    return;
  }
  this.retryCount++;
  const delay = this.backoffMs;
  this.backoffMs = Math.min(this.backoffMs * 2, 30_000);
  setTimeout(() => this.start(), delay).unref();
}

// Reset on successful connect:
private onConnect() {
  this.retryCount = 0;
  // ... rest of connect logic
}

Note

The LaunchAgent on macOS already handles process restart (30s interval), so the node client itself does not need to retry indefinitely — it can fail fast and let the OS-level supervisor handle the restart.

Environment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions