fix(gateway): health check always times out when lsof is not installed#32613
fix(gateway): health check always times out when lsof is not installed#32613riftzen-bit wants to merge 1 commit intoopenclaw:mainfrom
Conversation
The `ownsPort` check in `inspectGatewayRestart` only verified port
ownership via `portUsage.listeners.some(...)` when `runtimePid` was
known. When `lsof` is not installed, `inspectPortUsage` returns
`{ status: "busy", listeners: [] }` because `checkPortInUse` detects
the port is occupied but cannot enumerate listeners. `.some()` on an
empty array always returns `false`, so `ownsPort` was always `false`,
causing the health-check loop to spin for the full 60 s timeout on
every restart.
Add the same `(status === "busy" && listeners.length === 0)` fallback
that already existed in the `runtimePid == null` branch so that a
running service with a known PID is treated as the port owner when
listener enumeration is unavailable.
Greptile SummaryThis PR fixes a critical bug in Changes:
The fix is minimal, well-scoped, and has no negative impact on platforms where Confidence Score: 5/5
Last reviewed commit: 45fc88c |
There was a problem hiding this comment.
Pull request overview
Fixes a restart health-check false-negative in the gateway CLI when process listener enumeration is unavailable (notably when lsof isn’t installed on Linux), which previously caused openclaw gateway restart to always time out despite a healthy running gateway.
Changes:
- Extend the
runtimePid != nullownership check to fall back to “port is busy and listeners are empty” when listener enumeration can’t be performed. - Add a regression test covering the
lsof-missing / empty-listeners scenario with a known runtime PID.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| src/cli/daemon-cli/restart-health.ts | Aligns port-ownership fallback logic across PID-known and PID-unknown branches to avoid restart health-check timeouts when listeners can’t be enumerated. |
| src/cli/daemon-cli/restart-health.test.ts | Adds a test ensuring healthy === true when runtime PID is known, runtime is running, port is busy, and listener list is empty (e.g., lsof missing). |
Summary
Severity: HIGH — This is a critical bug that causes
openclaw gateway restartto always time out after 60 seconds on any Linux system withoutlsofinstalled (common on minimal/container installs, Arch Linux, Alpine, etc.).Root Cause
In
inspectGatewayRestart(), theownsPortcheck whenruntimePidis known only verifies ownership viaportUsage.listeners.some(listenerOwnedByRuntimePid). Whenlsofis not installed,inspectPortUsage()falls back tocheckPortInUse()(which tries to bind and getsEADDRINUSE), returning{ status: "busy", listeners: [] }. Since.some()on an empty array always returnsfalse,ownsPortis alwaysfalse→healthyis alwaysfalse→ the health-check loop polls for 120 attempts × 500ms = 60s then reports a timeout error, even though the gateway is running and healthy.The
runtimePid == nullbranch already had the correct fallback(portUsage.status === "busy" && portUsage.listeners.length === 0), but theruntimePid != nullbranch was missing it.Fix
Add the same
(status === "busy" && listeners.length === 0)fallback to theruntimePid != nullbranch, so a running service with a known PID is treated as the port owner when listener enumeration is unavailable.Impact
openclaw gateway restarton a system withoutlsofwas broken — always timing out after 60slsofby default (Arch, Alpine, minimal Debian/Ubuntu, containers)Test plan
"treats port as owned when runtime pid is known but listeners are empty (e.g. lsof missing)"pnpm checkpasses (lint + format + typecheck)