Skip to content

Gateway event loop blocked during agent prep: max delay 32s, utilization 100%, API requests hang #75330

@erforschtbot-cmyk

Description

@erforschtbot-cmyk

Version

OpenClaw 2026.4.29 (a448042)

Environment

  • OS: CachyOS (Arch Linux), Kernel 6.19.12-1
  • Node: v25.9.0
  • Install: npm global
  • Hardware: Dell Latitude 5300

Symptoms

Gateway API requests become extremely slow or hang during agent preparation. The event loop shows massive blocking spikes. The allgemein main session is continuously reported as "stuck" at 140+ seconds.

Measured Event Loop Metrics

Metric Worst Value
eventLoopDelayMaxMs 32,581 ms
eventLoopDelayMaxMs 15,544 ms
eventLoopUtilization 1.0 (100%)
eventLoopUtilization 0.898

API Request Degradation (during agent prep)

Request Duration Normal
models.list 21,119 ms ~300-500ms
sessions.list 5,209 ms ~500-700ms
node.list 6,360 ms ~300-500ms
node.list 5,777 ms ~300-500ms

Correlation: Prep stages ↔ event loop blocks

02:46:38 - eventLoopDelayMaxMs=15,544ms  eventLoopUtilization=0.90
02:49:20 - eventLoopDelayMaxMs=6,492ms   eventLoopUtilization=0.60  
02:51:20 - eventLoopUtilization=1.0       ← immediately followed by:
02:51:23 - totalMs=41,088 (agent prep completes)
02:56:25 - eventLoopDelayMaxMs=5,544ms   eventLoopUtilization=0.34
02:57:03 - totalMs=43,772 (agent prep completes)

The event loop blocks map directly to agent prep phases. The gateway is completely unresponsive to API calls during the 40+ second agent preparation window.

Stuck Session

The main session continuously triggers stuck-session diagnostics:

sessionId=allgemein state=processing age=146s queueDepth=1 reason=processing_with_queued_work

Recovery is skipped because an embedded run is active.

Additional Error (may be related)

During one stuck period, a lane task error appeared:

Error: channels.telegram.botToken: unresolved SecretRef "file:filemain:/telegram/botToken"
durationMs=10017

This 10-second error contributes to the event loop blocking but is secondary to the main prep-stage issue.

Root Cause Hypothesis

The system-prompt building stage (#75329) performs synchronous/blocking operations for the entire 30-35 second duration. This blocks the Node.js event loop, preventing the gateway from handling any other requests. The eventLoopUtilization=1.0 and delay metrics confirm the event loop starvation.

Possible causes:

  1. Synchronous file I/O during SKILL.md/manifest parsing
  2. frontmatter-Cc-V8aI2.js doing synchronous parsing that can fail/retry (related to Telegram channel crashes: Cannot find package 'json5' (ESM import failure in bundled deps) #75328 json5 issue)
  3. Large in-memory string operations not being broken into chunks
  4. Missing await or setImmediate yields in the prep pipeline

Impact

Regression

Version 2026.4.23: event loop clean, API requests <1s.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions