fix(discord): break resume death spiral when session goes stale#25974
fix(discord): break resume death spiral when session goes stale#25974mr-sk wants to merge 5 commits intoopenclaw:mainfrom
Conversation
When Discord sessions expire (network blip, server-side timeout), the gateway gets stuck in an infinite resume loop: it connects, Discord immediately closes with code 1005, but the client never clears stale session state so canResume() keeps returning true. The bots go fully offline and only a manual restart recovers them. Fix: when the HELLO timeout fires (30s with no HELLO from Discord), invalidate the session state before reconnecting so the next connect performs a fresh IDENTIFY instead of retrying a dead resume. Also reduces maxAttempts from 50 to 10 — 50 attempts at exponential backoff meant ~25 minutes of retrying before giving up. Co-Authored-By: Claude Opus 4.6 <[email protected]>
The fallback now returns ResilientGatewayPlugin instead of plain GatewayPlugin, so the prototype check needs to match. Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
CI note: The This PR only modifies files in |
Co-Authored-By: Claude Opus 4.6 <[email protected]>
Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
CI update: All checks pass except |
…death-spiral # Conflicts: # src/discord/monitor/provider.lifecycle.ts
|
Closing — upstream has since implemented a more comprehensive version of this fix directly on main (consecutive stall counter, reconnect watchdog, status reporting, |
Summary
ResilientGatewayPluginsubclass withresetSession()to invalidate stale sessions when the HELLO timeout fires (30s with no HELLO)maxAttemptsfrom 50 to 10 — 50 was ~25 minutes of pointless retryingTest plan
npm run buildpassesnpx vitest run src/discord/monitor/provider.lifecycle.test.tspasses (3/3)🤖 Generated with Claude Code
Greptile Summary
Fixes Discord gateway resume death spiral by adding session reset capability when connections stall. The PR introduces
ResilientGatewayPluginthat can invalidate stale session state (sessionId,resumeGatewayUrl,sequence) when the HELLO timeout fires (30s), forcing a fresh IDENTIFY instead of repeatedly attempting to resume a dead session. Also reduces reconnection attempts from 50 to 10 to avoid ~25 minutes of pointless retries.Changes:
ResilientGatewayPluginsubclass withresetSession()method to clear session stateResilientGatewayPlugininstead of baseGatewayPluginmaxAttemptsfrom 50 to 10 for faster failure detectionConfidence Score: 5/5
ResilientGatewayPluginsubclass cleanly extends existing functionality without breaking changes. The session reset logic is properly guarded withinstanceofcheck. Tests pass and the implementation follows TypeScript best practices. ReducingmaxAttemptsfrom 50 to 10 is a sensible change that makes failures surface faster rather than hanging for 25+ minutes.Last reviewed commit: b5566a2
(4/5) You can add custom instructions or style guidelines for the agent here!