Skip to content

fix: clamp setTimeout values to 32-bit safe max to prevent gateway hang#9576

Closed
pycckuu wants to merge 2 commits intoopenclaw:mainfrom
pycckuu:fix/settimeout-overflow-9572
Closed

fix: clamp setTimeout values to 32-bit safe max to prevent gateway hang#9576
pycckuu wants to merge 2 commits intoopenclaw:mainfrom
pycckuu:fix/settimeout-overflow-9572

Conversation

@pycckuu
Copy link
Contributor

@pycckuu pycckuu commented Feb 5, 2026

Fix

Prevents setTimeout integer overflow that causes gateway hangs.

Closes #9572

Root Cause

NO_TIMEOUT_MS in timeout.ts was set to 30 days (2,592,000,000ms). Node.js setTimeout uses a 32-bit signed integer internally — max safe value is 2,147,483,647ms (≈24.8 days). When subagent-registry.ts adds a 10s buffer, the total becomes 2,592,010,000ms, which overflows and silently wraps to 1ms.

This causes callGateway to time out instantly, blocking the session lane and eventually making the entire gateway unresponsive (process alive but not processing messages).

Changes

File Change
src/agents/timeout.ts Reduce NO_TIMEOUT_MS from 30 days → 24 days (2,073,600,000ms), safely under 2^31-1 even with buffers
src/gateway/call.ts Add MAX_SAFE_TIMEOUT_MS clamp as defense-in-depthcallGateway now caps any timeout to 2^31-1 before passing to setTimeout
src/agents/timeout.test.ts Regression tests verifying NO_TIMEOUT_MS stays within 32-bit safe range, including with the 10s buffer

Defense in Depth

Two layers of protection:

  1. Source fix (timeout.ts): The "no timeout" sentinel value is now inherently safe
  2. Safety net (call.ts): Even if future code passes a too-large value, callGateway will clamp it

Testing

New test in timeout.test.ts:

it("NO_TIMEOUT_MS + 10s buffer still fits in 32-bit signed integer", () => {
  const ms = resolveAgentTimeoutMs({ cfg: undefined, overrideSeconds: 0 });
  expect(ms + 10_000).toBeLessThan(2_147_483_647);
});

Greptile Overview

Greptile Summary

This PR addresses a real Node.js setTimeout overflow footgun by (1) lowering the “no timeout” sentinel in src/agents/timeout.ts to 24 days so it remains < 2^31-1 even after the subagent buffer, and (2) clamping callGateway’s timeoutMs to 2_147_483_647 in src/gateway/call.ts as defense-in-depth. It also adds a new vitest regression suite in src/agents/timeout.test.ts to lock in the sentinel value and the +10s buffer invariant used by subagent-registry.ts.

Confidence Score: 4/5

  • This PR is close to safe to merge, but there are a couple of edge cases that can still cause immediate timeouts or test-load failures.
  • The core fix (reduce NO_TIMEOUT_MS and clamp callGateway timeout) is correct and localized. Remaining concerns are around non-finite/NaN timeout inputs bypassing the intended clamp via Math.min producing NaN, and a potential vitest module-resolution issue due to importing ./timeout.js from a TS test file.
  • src/gateway/call.ts, src/agents/timeout.test.ts

(2/5) Greptile learns from your feedback when you react with thumbs up/down!

Node.js setTimeout uses a 32-bit signed integer internally. Values
exceeding 2^31-1 (2,147,483,647ms) silently wrap to 1ms, breaking
timeout semantics and causing cascading gateway failures.

Root cause: NO_TIMEOUT_MS was 30 days (2,592,000,000ms) which exceeds
the 32-bit limit. When subagent-registry adds a 10s buffer, the total
(2,592,010,000ms) overflows, causing setTimeout to fire instantly (1ms)
instead of after 30 days.

Changes:
- timeout.ts: Reduce NO_TIMEOUT_MS from 30 days to 24 days
  (2,073,600,000ms), safely under the 32-bit limit even with buffers
- call.ts: Add MAX_SAFE_TIMEOUT_MS clamp as defense-in-depth so
  callGateway never passes an overflowing value to setTimeout
- timeout.test.ts: Add regression tests verifying NO_TIMEOUT_MS stays
  within 32-bit safe range, including with the 10s buffer

Fixes openclaw#9572
@openclaw-barnacle openclaw-barnacle bot added gateway Gateway runtime agents Agent runtime and tooling labels Feb 5, 2026
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

opts: CallGatewayOptions,
): Promise<T> {
const timeoutMs = opts.timeoutMs ?? 10_000;
const timeoutMs = Math.min(opts.timeoutMs ?? 10_000, MAX_SAFE_TIMEOUT_MS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timeout clamp misses NaN

Math.min(opts.timeoutMs ?? 10_000, MAX_SAFE_TIMEOUT_MS) will produce NaN if opts.timeoutMs is NaN (or a non-finite number typed as number), and setTimeout(..., NaN) effectively becomes 0, causing immediate timeouts. If timeoutMs is user/config-derived anywhere, this is a behavior change that can still “hang” workflows. Consider normalizing here (e.g., Number.isFinite + floor + >= 1) before clamping.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/gateway/call.ts
Line: 157:157

Comment:
**Timeout clamp misses NaN**

`Math.min(opts.timeoutMs ?? 10_000, MAX_SAFE_TIMEOUT_MS)` will produce `NaN` if `opts.timeoutMs` is `NaN` (or a non-finite number typed as `number`), and `setTimeout(..., NaN)` effectively becomes `0`, causing immediate timeouts. If `timeoutMs` is user/config-derived anywhere, this is a behavior change that can still “hang” workflows. Consider normalizing here (e.g., `Number.isFinite` + floor + `>= 1`) before clamping.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in 4c9844b.

Comment on lines +1 to +2
import { describe, expect, it } from "vitest";
import { resolveAgentTimeoutMs } from "./timeout.js";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test import path may fail

This new test imports resolveAgentTimeoutMs from "./timeout.js". If vitest is running against TS sources (common in this repo) rather than built ESM output, importing the .js extension can fail module resolution and make the test suite fail to load. Other tests typically import via ./timeout or ./timeout.ts depending on config—worth aligning this import with the repo’s established convention so tests run in all environments.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/timeout.test.ts
Line: 1:2

Comment:
**Test import path may fail**

This new test imports `resolveAgentTimeoutMs` from `"./timeout.js"`. If vitest is running against TS sources (common in this repo) rather than built ESM output, importing the `.js` extension can fail module resolution and make the test suite fail to load. Other tests typically import via `./timeout` or `./timeout.ts` depending on config—worth aligning this import with the repo’s established convention so tests run in all environments.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .js extension is the established convention in this repo — all other test files in src/agents/*.test.ts use .js imports (e.g. agent-paths.test.ts, agent-scope.test.ts, auth-health.test.ts). This aligns with the ESM + TypeScript moduleResolution setup.

Address review feedback: Math.min with NaN produces NaN, which
setTimeout treats as 0 (immediate). Now normalize with
Number.isFinite + floor + minimum of 1, falling back to default
10s timeout for invalid values.
@pycckuu
Copy link
Contributor Author

pycckuu commented Feb 6, 2026

Related: PR #10355 adds a drain deadline and reduced timeouts to the announce queue, addressing the broader blocking issue in #10334. This setTimeout overflow fix is complementary and should be merged separately.

@pycckuu pycckuu closed this Feb 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling gateway Gateway runtime

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sessions_spawn timeout overflow causes gateway hang (Node.js 32-bit setTimeout limit)

1 participant

Comments