Skip to content

IMessageRpcClient.request() does not handle async stdin write errors → uncaughtException → gateway crash #75438

@SL4N

Description

@SL4N

Summary

In dist/probe-*.js (the IMessageRpcClient class, an imsg rpc JSON-RPC client over a child process's stdin), request() writes to this.child.stdin.write(line) without an error callback or stream error listener. When the child closes its stdin (rate limit, child crash, network hiccup), Node emits an asynchronous error event with EPIPE. Since neither request()'s try/catch nor any 'error' listener catches async write failures, the error escapes to the process-level uncaughtException handler, which calls process.exit(1).

Affected code (2026.4.29, file probe-DGfoCahw.js:113)

async request(method, params, opts) {
    if (!this.child || !this.child.stdin) throw new Error("imsg rpc not running");
    const id = this.nextId++;
    const line = `${JSON.stringify({ jsonrpc: "2.0", id, method, params: params ?? {} })}\n`;
    const timeoutMs = opts?.timeoutMs ?? 1e4;
    const response = new Promise((resolve, reject) => {
        const key = String(id);
        const timer = timeoutMs > 0 ? setTimeout(() => {
            this.pending.delete(key);
            reject(new Error(`imsg rpc timeout (${method})`));
        }, timeoutMs) : void 0;
        this.pending.set(key, { resolve, reject, timer });
    });
    this.child.stdin.write(line);   // ← no callback, no error handler
    return await response;
}

Symptom (real-world incident)

2026-05-01T10:50:57.032+08:00 [openclaw] Uncaught exception: Error: write EPIPE
  at <CodexAppServerClient or IMessageRpcClient>.writeMessage

Supervisor:

gateway exited (code=1, total=38044s, listen=38026s)

Gateway had been up 10.5 hours; one EPIPE killed it.

Same pattern was present in 2026.4.11

  • harness-CmLE805l.js:478 (CodexAppServerClient.writeMessage) — refactored away in 2026.4.29 (good)
  • probe-Bh4qEP-V.js:343 (IMessageRpcClient.request) — still present at probe-DGfoCahw.js:113

Other unfixed sites (lower risk but same class of bug)

In 2026.4.29:

  • exec-BgVqrNG-.js: child.stdin.write(input ?? "") after spawn (one-shot pattern)
  • supervisor-8fcihB5y.js: child.stdin.write(params.input) (one-shot, mirrored pattern)
  • bash-tools.exec-runtime-*.js (if still present): PTY DSR cursor response from stdout event handler

Suggested patch

this.child.stdin.write(line, (err) => {
    if (err) {
        const pending = this.pending.get(String(id));
        if (pending) {
            this.pending.delete(String(id));
            if (pending.timer) clearTimeout(pending.timer);
            pending.reject(err);
        }
    }
});

This rejects the awaiting request() promise with the EPIPE error (caller sees a clean exception), avoids uncaughtException, and leaves the rest of the pending-map cleanup to the existing failAll / stop paths.

Reproduce locally (rough)

  1. Start gateway with anything that uses imsg rpc (e.g. embedded Codex app-server, or one of the bundled plugins that use IMessageRpcClient).
  2. Force the child to close its stdin while a request is in-flight (kill the child process, close the underlying transport).
  3. Trigger another request() on the (now closed) stdin → uncaught EPIPE → process.exit(1).

Environment

  • Node v22.22
  • Termux on Android (glibc wrapper), but EPIPE behaviour is kernel-level and reproduces on any platform.
  • Same code path exists on Linux/macOS Node installs.

Why an 'error' listener on the stream is also worth considering

Even with the per-write callback fix, an attached child.stdin.on('error', ...) (set once at client init, no-op or routed to failAll) would catch races where write() returns synchronously OK but the kernel later signals EPIPE on flush, before any specific request can claim the error.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions