Add Windows service support to gateway status#53121
Add Windows service support to gateway status#53121dammtyler1 wants to merge 10 commits intoopenclaw:mainfrom
Conversation
Greptile SummaryThis PR adds transparent Windows service (NSSM / native SCM) detection to the gateway status flow. When a Windows service named
Confidence Score: 4/5
Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/daemon/windows-service.ts
Line: 245-258
Comment:
**Synchronous `Atomics.wait` blocks the event loop**
`sleepSync` / `waitForServiceState` call `Atomics.wait` on the main Node.js thread. Unlike browsers, Node.js does allow `Atomics.wait` on the main thread, but it **fully blocks the event loop** — no I/O, no timers, no promises can progress while it spins. With a 15-second deadline and 250 ms ticks, callers of `stopWindowsService` / `restartWindowsService` can freeze the process for up to 15 seconds.
For a CLI that otherwise uses async I/O throughout, consider an async polling loop instead:
```ts
async function waitForServiceState(
serviceName: string,
matcher: (state?: string) => boolean,
): Promise<boolean> {
const deadline = Date.now() + 15_000;
while (Date.now() < deadline) {
const query = parseScQueryExOutput(readScQueryEx(serviceName).stdout);
if (matcher(query.state)) return true;
await new Promise((resolve) => setTimeout(resolve, 250));
}
return false;
}
```
The callers (`stopWindowsService`, `restartWindowsService`) are already `async`, so this change is drop-in.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/daemon/service.ts
Line: 153-160
Comment:
**`sc.exe qc` is spawned twice in the Windows-service happy path**
`readWindowsCommandOrScheduledTask` calls `useWindowsService(env)` (→ `readScQc`) to decide which branch to take, then immediately calls `readWindowsServiceCommand(env)` which calls `readScQc` a second time. Since `sc.exe` is a subprocess spawn, this doubles the latency and process-spawn overhead for every status read on a Windows-service installation.
The same pattern recurs for `readWindowsRuntimeOrScheduledTask`, `stopWindowsRuntimeOrScheduledTask`, and `restartWindowsRuntimeOrScheduledTask`.
A simple way to address this is to have `readWindowsServiceCommand` accept an optional pre-read `qc` result, or to inline the detection into the read functions so the first successful `sc qc` is reused rather than discarded.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "Add Windows service support to gateway s..." | Re-trigger Greptile |
src/daemon/windows-service.ts
Outdated
| function sleepSync(ms: number): void { | ||
| Atomics.wait(new Int32Array(new SharedArrayBuffer(4)), 0, 0, ms); | ||
| } | ||
|
|
||
| function waitForServiceState(serviceName: string, matcher: (state?: string) => boolean): boolean { | ||
| const deadline = Date.now() + 15_000; | ||
| while (Date.now() < deadline) { | ||
| const query = parseScQueryExOutput(readScQueryEx(serviceName).stdout); | ||
| if (matcher(query.state)) { | ||
| return true; | ||
| } | ||
| sleepSync(250); | ||
| } | ||
| return false; |
There was a problem hiding this comment.
Synchronous
Atomics.wait blocks the event loop
sleepSync / waitForServiceState call Atomics.wait on the main Node.js thread. Unlike browsers, Node.js does allow Atomics.wait on the main thread, but it fully blocks the event loop — no I/O, no timers, no promises can progress while it spins. With a 15-second deadline and 250 ms ticks, callers of stopWindowsService / restartWindowsService can freeze the process for up to 15 seconds.
For a CLI that otherwise uses async I/O throughout, consider an async polling loop instead:
async function waitForServiceState(
serviceName: string,
matcher: (state?: string) => boolean,
): Promise<boolean> {
const deadline = Date.now() + 15_000;
while (Date.now() < deadline) {
const query = parseScQueryExOutput(readScQueryEx(serviceName).stdout);
if (matcher(query.state)) return true;
await new Promise((resolve) => setTimeout(resolve, 250));
}
return false;
}The callers (stopWindowsService, restartWindowsService) are already async, so this change is drop-in.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/daemon/windows-service.ts
Line: 245-258
Comment:
**Synchronous `Atomics.wait` blocks the event loop**
`sleepSync` / `waitForServiceState` call `Atomics.wait` on the main Node.js thread. Unlike browsers, Node.js does allow `Atomics.wait` on the main thread, but it **fully blocks the event loop** — no I/O, no timers, no promises can progress while it spins. With a 15-second deadline and 250 ms ticks, callers of `stopWindowsService` / `restartWindowsService` can freeze the process for up to 15 seconds.
For a CLI that otherwise uses async I/O throughout, consider an async polling loop instead:
```ts
async function waitForServiceState(
serviceName: string,
matcher: (state?: string) => boolean,
): Promise<boolean> {
const deadline = Date.now() + 15_000;
while (Date.now() < deadline) {
const query = parseScQueryExOutput(readScQueryEx(serviceName).stdout);
if (matcher(query.state)) return true;
await new Promise((resolve) => setTimeout(resolve, 250));
}
return false;
}
```
The callers (`stopWindowsService`, `restartWindowsService`) are already `async`, so this change is drop-in.
How can I resolve this? If you propose a fix, please make it concise.| async function readWindowsCommandOrScheduledTask( | ||
| env: GatewayServiceEnv, | ||
| ): Promise<GatewayServiceCommandConfig | null> { | ||
| if (await useWindowsService(env)) { | ||
| return await readWindowsServiceCommand(env); | ||
| } | ||
| return await readScheduledTaskCommand(env); | ||
| } |
There was a problem hiding this comment.
sc.exe qc is spawned twice in the Windows-service happy path
readWindowsCommandOrScheduledTask calls useWindowsService(env) (→ readScQc) to decide which branch to take, then immediately calls readWindowsServiceCommand(env) which calls readScQc a second time. Since sc.exe is a subprocess spawn, this doubles the latency and process-spawn overhead for every status read on a Windows-service installation.
The same pattern recurs for readWindowsRuntimeOrScheduledTask, stopWindowsRuntimeOrScheduledTask, and restartWindowsRuntimeOrScheduledTask.
A simple way to address this is to have readWindowsServiceCommand accept an optional pre-read qc result, or to inline the detection into the read functions so the first successful sc qc is reused rather than discarded.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/daemon/service.ts
Line: 153-160
Comment:
**`sc.exe qc` is spawned twice in the Windows-service happy path**
`readWindowsCommandOrScheduledTask` calls `useWindowsService(env)` (→ `readScQc`) to decide which branch to take, then immediately calls `readWindowsServiceCommand(env)` which calls `readScQc` a second time. Since `sc.exe` is a subprocess spawn, this doubles the latency and process-spawn overhead for every status read on a Windows-service installation.
The same pattern recurs for `readWindowsRuntimeOrScheduledTask`, `stopWindowsRuntimeOrScheduledTask`, and `restartWindowsRuntimeOrScheduledTask`.
A simple way to address this is to have `readWindowsServiceCommand` accept an optional pre-read `qc` result, or to inline the detection into the read functions so the first successful `sc qc` is reused rather than discarded.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 51545516ca
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
src/daemon/windows-service.ts
Outdated
| waitForServiceState(serviceName, (state) => state?.toLowerCase().includes("running") ?? false); | ||
| stdout.write(`Restarted Windows service: ${serviceName}\n`); | ||
| return { outcome: "completed" }; |
There was a problem hiding this comment.
Propagate failed RUNNING transitions from Windows service restart
On Windows services that stay START_PENDING or crash immediately after sc start, waitForServiceState() returns false here but the result is discarded and the function still reports success. That becomes user-visible because runServiceStart in src/cli/daemon-cli/lifecycle-core.ts:227-258 only rechecks isLoaded(), and this adapter treats any installed Windows service as loaded, so openclaw gateway start / restart can claim success even though the service never reached RUNNING.
Useful? React with 👍 / 👎.
| return { | ||
| ...GATEWAY_SERVICE_REGISTRY.win32, | ||
| uninstall: uninstallWindowsRuntimeOrScheduledTask, | ||
| stop: stopWindowsRuntimeOrScheduledTask, | ||
| restart: restartWindowsRuntimeOrScheduledTask, |
There was a problem hiding this comment.
Avoid creating a second Windows supervisor that status ignores
This Windows adapter still inherits installScheduledTask via ...GATEWAY_SERVICE_REGISTRY.win32, but the new read/control paths below prefer the Windows service whenever sc qc succeeds. On a host that already has an NSSM/native service, openclaw gateway install --force now creates a Scheduled Task that later status, stop, and restart will never inspect or control, so the CLI can miss the gateway instance that is actually running. The install path needs a single backend or explicit dual-backend handling.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4425117ba6
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| environment: { | ||
| OPENCLAW_WINDOWS_SERVICE_NAME: serviceName, | ||
| }, |
There was a problem hiding this comment.
Preserve Windows service env overrides in status reads
readWindowsServiceCommand() returns only OPENCLAW_WINDOWS_SERVICE_NAME here, so wrapper-based services lose any OPENCLAW_PROFILE, OPENCLAW_STATE_DIR, OPENCLAW_CONFIG_PATH, or OPENCLAW_GATEWAY_PORT overrides they actually run with. gatherDaemonStatus() and the config audit consume command.environment to resolve the daemon config path, auth, and fallback port, so a Windows service launched through a wrapper with env overrides can now show the CLI's config instead of the service's and report false probe/config mismatches.
Useful? React with 👍 / 👎.
| if (command.sourcePath?.startsWith("service:")) { | ||
| return; |
There was a problem hiding this comment.
Keep command validation for non-wrapper Windows services
This early return suppresses gateway-command-missing for every Windows service, because readWindowsServiceCommand() always tags them as sourcePath: service:<name> even when it fell back to sc qc's raw BINARY_PATH_NAME. That fixes NSSM wrapper false positives, but it also hides real misconfigurations for native services or broken entries that no longer launch openclaw gateway ..., so doctor/gateway status can report a clean service config when the service command is wrong.
Useful? React with 👍 / 👎.
4425117 to
60bfb6d
Compare
|
Addressed the two automated review suggestions in follow-up commit 262ceaf:
Local verification:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 262ceaf567
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
src/daemon/windows-service.ts
Outdated
| await waitForServiceState(serviceName, (state) => state?.toLowerCase().includes("stopped") ?? false); | ||
| stdout.write(`Stopped Windows service: ${serviceName}\n`); |
There was a problem hiding this comment.
Reject stop when service never reaches STOPPED
stopWindowsService ignores the boolean returned by waitForServiceState, so a service that remains STOP_PENDING/RUNNING after sc stop still leads to a success message and a successful return. In that timeout path, runServiceStop treats the stop as completed even though the Windows service is still active, which can mislead operators and leave the gateway running after a reported stop.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a395e17fe5
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (qc.code !== 0) { | ||
| return null; |
There was a problem hiding this comment.
Distinguish service-missing from probe failures
probeWindowsService() currently treats every non-zero sc qc exit as “service not installed”, and the win32 adapter then falls back to Scheduled Task control paths for status/stop/restart/uninstall. When sc qc fails for reasons like permission errors or command timeouts (not service absence), the CLI can report/control the wrong backend or claim the service is missing even though it exists. Only true missing-service errors should return null; other probe failures should be surfaced.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 86e955abce
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (statusOnly) { | ||
| const manager = new MemoryIndexManager({ | ||
| return new MemoryIndexManager({ | ||
| cacheKey: key, |
There was a problem hiding this comment.
Reuse or close status-only fallback managers
This branch now creates a fresh uncached MemoryIndexManager for every purpose: "status" request, but some production status callers still do not close what they receive (for example the memory_get tool path in src/agents/tools/memory-tool.ts), and getMemorySearchManager can route into this code when QMD falls back to builtin. In that scenario, repeated calls create new sqlite-backed managers without running close(), so handles accumulate in long-lived processes; before this commit, status managers were cached and reused, which avoided this churn.
Useful? React with 👍 / 👎.
86e955a to
a82fe5e
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a82fe5e233
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (await probeWindowsService(args.env).catch(() => null)) { | ||
| await uninstallWindowsService(args); | ||
| return; |
There was a problem hiding this comment.
Uninstall both Windows backends when service is detected
This branch exits after deleting only the Windows service, but the win32 adapter still installs/maintains Scheduled Task as the default backend via GATEWAY_SERVICE_REGISTRY.win32. On hosts where both exist (for example after running openclaw gateway install --force while an NSSM/native service is present), gateway uninstall removes the service and leaves the task installed/running; the post-uninstall isLoaded check can then fail with "service still loaded" and the gateway can remain active despite an uninstall attempt.
Useful? React with 👍 / 👎.
a82fe5e to
0f8c477
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3da74e7b1f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (result.code !== 0 && !/service has not been started/i.test(detail)) { | ||
| throw new Error(`sc stop failed: ${detail || "unknown error"}`.trim()); |
There was a problem hiding this comment.
Handle already-stopped services by SC error code
The stop/restart paths only treat "already stopped" as non-fatal when sc.exe output contains the exact English text service has not been started. On non-English Windows installs, sc stop still returns non-zero (typically error 1062) but with localized text, so openclaw gateway stop and restart can fail even though the service is simply already stopped. Matching the numeric SCM error code (as done for missing-service handling with 1060) would make this behavior locale-safe.
Useful? React with 👍 / 👎.
Summary
Testing