Bug Description
Synchronous event loop blocking (up to 2581 seconds / 43 minutes) causes QQBot WebSocket disconnections and cron job failures.
Steps to Reproduce
- Run OpenClaw gateway with QQBot channel enabled
- After extended session with large trajectory files (20MB+), observe:
eventLoopDelayMaxMs reaching 2,581,275ms (43 minutes)
cpuCoreRatio=0 (not CPU-bound, waiting on I/O)
- QQ WebSocket connection drops with code 1006 / 4009 (session timeout)
- OpenClaw automatically reconnects but cron jobs fail due to missed heartbeats
Root Cause Analysis
The primary suspect is session compaction synchronously writing large trajectory files. When a trajectory.jsonl reaches 20MB+, the synchronous JSON serialization and file write blocks the Node.js event loop completely, preventing WebSocket heartbeat processing.
Log excerpt:
liveness warning: reasons=event_loop_delay,event_loop_utilization interval=2590s
eventLoopDelayP99Ms=32.2 eventLoopDelayMaxMs=2581275.3 eventLoopUtilization=0.996 cpuCoreRatio=0
active=0 waiting=0 queued=0
Causal chain:
- Large session trajectory file (20.5MB) triggers compaction
- Synchronous write of compacted history blocks event loop
- WebSocket heartbeat cannot be processed -> QQ server closes connection (4009 session timeout)
- OpenClaw reconnects automatically but cron tasks fail
Expected Behavior
- Event loop should not block for extended periods even with large session files
- WebSocket connections should be resilient to temporary event loop delays
- Cron jobs should not fail due to platform-level event loop blocking
Environment
- OS: Windows_NT 10.0.19045 (x64)
- Node: v25.6.0
- OpenClaw: latest (updated 2026.4.21)
- Channel: QQBot (WebSocket)
- Session trajectory files up to 20.5MB
Suggested Fixes
- Async compaction writes: Make session trajectory writes asynchronous or chunked to prevent event loop blocking
- WebSocket heartbeat resilience: Increase heartbeat timeout or implement connection-level keepalive
- Protected config:
maxActiveTranscriptBytes and truncateAfterCompaction are protected from runtime patching - consider whether these should be adjustable via config to prevent large trajectory accumulation
- Large file guard: Warn or auto-archive trajectory files above a size threshold before they cause blocking events
Workaround Applied
Archived the 20.5MB trajectory file to reduce compaction load. Largest remaining trajectory is ~2.6MB.
Bug Description
Synchronous event loop blocking (up to 2581 seconds / 43 minutes) causes QQBot WebSocket disconnections and cron job failures.
Steps to Reproduce
eventLoopDelayMaxMsreaching 2,581,275ms (43 minutes)cpuCoreRatio=0(not CPU-bound, waiting on I/O)Root Cause Analysis
The primary suspect is session compaction synchronously writing large trajectory files. When a trajectory.jsonl reaches 20MB+, the synchronous JSON serialization and file write blocks the Node.js event loop completely, preventing WebSocket heartbeat processing.
Log excerpt:
Causal chain:
Expected Behavior
Environment
Suggested Fixes
maxActiveTranscriptBytesandtruncateAfterCompactionare protected from runtime patching - consider whether these should be adjustable via config to prevent large trajectory accumulationWorkaround Applied
Archived the 20.5MB trajectory file to reduce compaction load. Largest remaining trajectory is ~2.6MB.