fix(gateway): signal-based stop/restart fallback for containers without service manager#36202
fix(gateway): signal-based stop/restart fallback for containers without service manager#36202RealKai42 wants to merge 8 commits intoopenclaw:mainfrom
Conversation
Greptile SummaryThis PR fixes Issues in the core signal-based fallback implementation:
Confidence Score: 2/5
Last reviewed commit: b16ae1b |
| } | ||
| if (!json) { | ||
| defaultRuntime.log(`Gateway restart signal sent. Waiting for health...`); | ||
| } |
There was a problem hiding this comment.
The log message "Gateway restart signal sent. Waiting for health..." is misleading — no health check actually runs in this code path.
When onNotLoaded returns true (line 157), the runServiceRestart function in lifecycle-core.ts exits immediately with return true; before the postRestartCheck callback is ever invoked. The postRestartCheck is only called inside the try block of the normal restart path (after service restart), so the entire health-polling loop, stale-PID cleanup, and timeout/failure logic is completely bypassed.
To fix this:
- Either move the health-check logic into the
onNotLoadedcallback itself (after sending SIGUSR1, before returning), or - Modify the
lifecycle-core.tscontract so thatpostRestartCheckis still invoked even whenonNotLoadedhandles the restart.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/cli/daemon-cli/lifecycle.ts
Line: 154-157
Comment:
The log message **"Gateway restart signal sent. Waiting for health..."** is misleading — no health check actually runs in this code path.
When `onNotLoaded` returns `true` (line 157), the `runServiceRestart` function in `lifecycle-core.ts` exits immediately with `return true;` before the `postRestartCheck` callback is ever invoked. The `postRestartCheck` is only called inside the `try` block of the normal restart path (after service restart), so the entire health-polling loop, stale-PID cleanup, and timeout/failure logic is completely bypassed.
To fix this:
1. Either move the health-check logic into the `onNotLoaded` callback itself (after sending SIGUSR1, before returning), or
2. Modify the `lifecycle-core.ts` contract so that `postRestartCheck` is still invoked even when `onNotLoaded` handles the restart.
How can I resolve this? If you propose a fix, please make it concise.| const handled = await params.onNotLoaded().catch(() => false); | ||
| if (handled) { | ||
| return; | ||
| } | ||
| } |
There was a problem hiding this comment.
Missing JSON response on signal-based stop. When onNotLoaded returns true and handles the stop, the function returns at line 225 without calling emit(...). In --json mode, this produces no output at all, breaking any JSON consumer (CI scripts, wrappers, etc.).
The normal stop path always emits { ok: true, result: "not-loaded" | "stopped", ... } before returning. The fallback path should do the same.
Add an emit call before the return; statement, similar to the normal "not-loaded" path on line 229:
if (params.onNotLoaded) {
const handled = await params.onNotLoaded().catch(() => false);
if (handled) {
emit({
ok: true,
result: "stopped",
message: "Gateway stopped via SIGTERM.",
service: buildDaemonServiceSnapshot(params.service, false),
});
return;
}
}Prompt To Fix With AI
This is a comment left during a code review.
Path: src/cli/daemon-cli/lifecycle-core.ts
Line: 223-227
Comment:
**Missing JSON response on signal-based stop.** When `onNotLoaded` returns `true` and handles the stop, the function returns at line 225 without calling `emit(...)`. In `--json` mode, this produces no output at all, breaking any JSON consumer (CI scripts, wrappers, etc.).
The normal stop path always emits `{ ok: true, result: "not-loaded" | "stopped", ... }` before returning. The fallback path should do the same.
Add an `emit` call before the `return;` statement, similar to the normal "not-loaded" path on line 229:
```typescript
if (params.onNotLoaded) {
const handled = await params.onNotLoaded().catch(() => false);
if (handled) {
emit({
ok: true,
result: "stopped",
message: "Gateway stopped via SIGTERM.",
service: buildDaemonServiceSnapshot(params.service, false),
});
return;
}
}
```
How can I resolve this? If you propose a fix, please make it concise.| if (params.onNotLoaded) { | ||
| const handled = await params.onNotLoaded().catch(() => false); | ||
| if (handled) { | ||
| return true; | ||
| } |
There was a problem hiding this comment.
Missing JSON response on signal-based restart. When onNotLoaded returns true and handles the restart, the function returns true at line 289 without calling emit(...). In --json mode, this produces no output at all.
The normal restart path always emits a structured JSON response before returning. The fallback path should do the same.
Add an emit call before the return true; statement:
if (params.onNotLoaded) {
const handled = await params.onNotLoaded().catch(() => false);
if (handled) {
emit({
ok: true,
result: "restarted",
message: "Gateway restarted via SIGUSR1.",
service: buildDaemonServiceSnapshot(params.service, true),
warnings: warnings.length ? warnings : undefined,
});
return true;
}
}Prompt To Fix With AI
This is a comment left during a code review.
Path: src/cli/daemon-cli/lifecycle-core.ts
Line: 286-290
Comment:
**Missing JSON response on signal-based restart.** When `onNotLoaded` returns `true` and handles the restart, the function returns `true` at line 289 without calling `emit(...)`. In `--json` mode, this produces no output at all.
The normal restart path always emits a structured JSON response before returning. The fallback path should do the same.
Add an `emit` call before the `return true;` statement:
```typescript
if (params.onNotLoaded) {
const handled = await params.onNotLoaded().catch(() => false);
if (handled) {
emit({
ok: true,
result: "restarted",
message: "Gateway restarted via SIGUSR1.",
service: buildDaemonServiceSnapshot(params.service, true),
warnings: warnings.length ? warnings : undefined,
});
return true;
}
}
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b16ae1bb1a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const port = await resolveGatewayRestartPort().catch(() => | ||
| resolveGatewayPort(loadConfig(), process.env), | ||
| ); |
There was a problem hiding this comment.
Defer port resolution until stop fallback is needed
runDaemonStop now resolves port before calling runServiceStop, which eagerly executes loadConfig() via resolveGatewayRestartPort(). That makes openclaw gateway stop fail on malformed/legacy config even when the service manager is loaded and could stop the process without reading config, which is a regression in exactly the recovery scenario where stop is needed most. Resolve the port inside onNotLoaded so config parsing only happens for the container fallback path.
Useful? React with 👍 / 👎.
| if (handled) { | ||
| return true; |
There was a problem hiding this comment.
Continue restart verification after fallback signal
In the not-loaded path, a handled onNotLoaded restart returns immediately here, which skips the normal postRestartCheck health verification. In containers without systemd/launchd, this allows openclaw gateway restart to report success even if SIGUSR1 was not delivered/handled and the gateway never becomes healthy (despite lifecycle.ts logging “Waiting for health…”). The fallback path should still run restart health checks before returning success.
Useful? React with 👍 / 👎.
| if (handled) { | ||
| return; |
There was a problem hiding this comment.
Emit JSON output when stop fallback handles the request
When onNotLoaded returns true, this branch exits before any emit(...) call. As a result, in service-manager-less environments where the signal fallback succeeds, openclaw gateway stop --json produces no JSON payload, breaking automation that expects one structured response per command. Emit a success response for this handled-fallback path (and mirror the same behavior for restart).
Useful? React with 👍 / 👎.
Summary
Closes #36137.
openclaw gateway stopandopenclaw gateway restartboth failed in Docker/Podman/LXC containers that lack a service manager (systemd/launchd).isLoaded()returnsfalsein these environments, causing both commands to immediately print "Gateway service disabled" and exit without touching the running process.Fix: Add an optional
onNotLoadedcallback torunServiceStopandrunServiceRestartinlifecycle-core.ts. When the service manager does not recognize the service as loaded, the gateway-specific lifecycle functions inlifecycle.tsfall back to port-based PID discovery (viainspectPortUsage) and send OS signals directly:gateway stop→ sends SIGTERM to all gateway processes listening on the configured portgateway restart→ sends SIGUSR1 to trigger the gateway's in-process restart loop (the same signal used for hot restarts; handled byrun-loop.ts'sonSigusr1handler)This matches the behavior already described in the issue: the gateway already handles both signals natively — the CLI just couldn't reach them.
Changes
src/cli/daemon-cli/lifecycle-core.ts: add optionalonNotLoadedcallback torunServiceStopandrunServiceRestartsrc/cli/daemon-cli/lifecycle.ts: implement signal-based fallback usingfindGatewayPidsOnPort(port inspection viainspectPortUsage)Test plan
pnpm tsgo— no TypeScript errorspnpm check— passes lint and formatopenclaw gateway stop/openclaw gateway restartwith a foreground gateway process and verify signals are sent