Skip to content

Event loop blocking causes WebSocket disconnections and cron failures #75414

@tangda18

Description

@tangda18

Bug Description

Synchronous event loop blocking (up to 2581 seconds / 43 minutes) causes QQBot WebSocket disconnections and cron job failures.

Steps to Reproduce

  1. Run OpenClaw gateway with QQBot channel enabled
  2. After extended session with large trajectory files (20MB+), observe:
    • eventLoopDelayMaxMs reaching 2,581,275ms (43 minutes)
    • cpuCoreRatio=0 (not CPU-bound, waiting on I/O)
    • QQ WebSocket connection drops with code 1006 / 4009 (session timeout)
    • OpenClaw automatically reconnects but cron jobs fail due to missed heartbeats

Root Cause Analysis

The primary suspect is session compaction synchronously writing large trajectory files. When a trajectory.jsonl reaches 20MB+, the synchronous JSON serialization and file write blocks the Node.js event loop completely, preventing WebSocket heartbeat processing.

Log excerpt:

liveness warning: reasons=event_loop_delay,event_loop_utilization interval=2590s 
eventLoopDelayP99Ms=32.2 eventLoopDelayMaxMs=2581275.3 eventLoopUtilization=0.996 cpuCoreRatio=0 
active=0 waiting=0 queued=0

Causal chain:

  1. Large session trajectory file (20.5MB) triggers compaction
  2. Synchronous write of compacted history blocks event loop
  3. WebSocket heartbeat cannot be processed -> QQ server closes connection (4009 session timeout)
  4. OpenClaw reconnects automatically but cron tasks fail

Expected Behavior

  • Event loop should not block for extended periods even with large session files
  • WebSocket connections should be resilient to temporary event loop delays
  • Cron jobs should not fail due to platform-level event loop blocking

Environment

  • OS: Windows_NT 10.0.19045 (x64)
  • Node: v25.6.0
  • OpenClaw: latest (updated 2026.4.21)
  • Channel: QQBot (WebSocket)
  • Session trajectory files up to 20.5MB

Suggested Fixes

  1. Async compaction writes: Make session trajectory writes asynchronous or chunked to prevent event loop blocking
  2. WebSocket heartbeat resilience: Increase heartbeat timeout or implement connection-level keepalive
  3. Protected config: maxActiveTranscriptBytes and truncateAfterCompaction are protected from runtime patching - consider whether these should be adjustable via config to prevent large trajectory accumulation
  4. Large file guard: Warn or auto-archive trajectory files above a size threshold before they cause blocking events

Workaround Applied

Archived the 20.5MB trajectory file to reduce compaction load. Largest remaining trajectory is ~2.6MB.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions