Skip to content

Add Windows service support to gateway status#53121

Open
dammtyler1 wants to merge 10 commits intoopenclaw:mainfrom
dammtyler1:windows-service-status-support
Open

Add Windows service support to gateway status#53121
dammtyler1 wants to merge 10 commits intoopenclaw:mainfrom
dammtyler1:windows-service-status-support

Conversation

@dammtyler1
Copy link
Copy Markdown

Summary

  • detect NSSM/native Windows services in gateway service status flows
  • prefer Windows service runtime/command metadata over Scheduled Task labels when present
  • keep Scheduled Task support as the fallback on Windows
  • avoid false config-audit warnings for service wrapper commands

Testing

  • pnpm exec vitest run src/daemon/windows-service.test.ts src/commands/status.service-summary.test.ts src/daemon/service.test.ts src/daemon/service-audit.test.ts
  • verified locally with an NSSM-managed OpenClaw gateway on Windows that openclaw gateway status and openclaw status --all report Windows service and a healthy running PID

@openclaw-barnacle openclaw-barnacle bot added gateway Gateway runtime cli CLI command changes commands Command implementations size: L labels Mar 23, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 23, 2026

Greptile Summary

This PR adds transparent Windows service (NSSM / native SCM) detection to the gateway status flow. When a Windows service named OpenClawGateway (or a profile variant) is present, the runtime reads from it via sc.exe qc / sc queryex and the registry, falling back to the existing Scheduled Task path otherwise. Label metadata (label, loadedText, notLoadedText) is piggybacked on the GatewayServiceCommandConfig and propagated through both status.gather.ts and status.service-summary.ts, replacing the previous hard-coded service-level strings.

  • Two non-blocking concerns are worth tracking before or after merge:
    • Synchronous event-loop block (windows-service.ts lines 245–258): sleepSync / waitForServiceState use Atomics.wait on the main Node.js thread. This freezes the entire event loop for up to 15 s while polling for STOPPED/RUNNING state during stop/restart. Swapping for an async/setTimeout loop would be a straightforward fix.
    • Duplicate sc.exe qc spawn (service.ts lines 153–160): the Windows-service code path calls sc.exe qc twice per operation — once in useWindowsService to detect, and again inside readWindowsServiceCommand / the runtime helpers to read. Merging those would halve the subprocess overhead on the happy path.
  • The sourcePath: "service:" sentinel correctly prevents auditGatewayCommand from flagging NSSM/PowerShell wrapper commands as missing the gateway subcommand.
  • Install continues to use Scheduled Tasks; Windows service support is read-only detection, which aligns with the PR description.

Confidence Score: 4/5

  • Safe to merge; the two P2 items are performance/style improvements and do not affect correctness for the primary use case.
  • Logic is sound, fallback to Scheduled Tasks is preserved, audit false-positives are cleanly handled, and the new behavior is well-tested. The synchronous event-loop blocking in waitForServiceState and the duplicate sc.exe qc spawn are the only actionable concerns, and neither causes data loss or incorrect status reporting under normal conditions.
  • src/daemon/windows-service.tssleepSync / waitForServiceState block the main thread; src/daemon/service.ts — double sc.exe qc invocation per Windows-service operation.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/daemon/windows-service.ts
Line: 245-258

Comment:
**Synchronous `Atomics.wait` blocks the event loop**

`sleepSync` / `waitForServiceState` call `Atomics.wait` on the main Node.js thread. Unlike browsers, Node.js does allow `Atomics.wait` on the main thread, but it **fully blocks the event loop** — no I/O, no timers, no promises can progress while it spins. With a 15-second deadline and 250 ms ticks, callers of `stopWindowsService` / `restartWindowsService` can freeze the process for up to 15 seconds.

For a CLI that otherwise uses async I/O throughout, consider an async polling loop instead:

```ts
async function waitForServiceState(
  serviceName: string,
  matcher: (state?: string) => boolean,
): Promise<boolean> {
  const deadline = Date.now() + 15_000;
  while (Date.now() < deadline) {
    const query = parseScQueryExOutput(readScQueryEx(serviceName).stdout);
    if (matcher(query.state)) return true;
    await new Promise((resolve) => setTimeout(resolve, 250));
  }
  return false;
}
```

The callers (`stopWindowsService`, `restartWindowsService`) are already `async`, so this change is drop-in.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/daemon/service.ts
Line: 153-160

Comment:
**`sc.exe qc` is spawned twice in the Windows-service happy path**

`readWindowsCommandOrScheduledTask` calls `useWindowsService(env)` (→ `readScQc`) to decide which branch to take, then immediately calls `readWindowsServiceCommand(env)` which calls `readScQc` a second time. Since `sc.exe` is a subprocess spawn, this doubles the latency and process-spawn overhead for every status read on a Windows-service installation.

The same pattern recurs for `readWindowsRuntimeOrScheduledTask`, `stopWindowsRuntimeOrScheduledTask`, and `restartWindowsRuntimeOrScheduledTask`.

A simple way to address this is to have `readWindowsServiceCommand` accept an optional pre-read `qc` result, or to inline the detection into the read functions so the first successful `sc qc` is reused rather than discarded.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "Add Windows service support to gateway s..." | Re-trigger Greptile

Comment on lines +245 to +258
function sleepSync(ms: number): void {
Atomics.wait(new Int32Array(new SharedArrayBuffer(4)), 0, 0, ms);
}

function waitForServiceState(serviceName: string, matcher: (state?: string) => boolean): boolean {
const deadline = Date.now() + 15_000;
while (Date.now() < deadline) {
const query = parseScQueryExOutput(readScQueryEx(serviceName).stdout);
if (matcher(query.state)) {
return true;
}
sleepSync(250);
}
return false;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Synchronous Atomics.wait blocks the event loop

sleepSync / waitForServiceState call Atomics.wait on the main Node.js thread. Unlike browsers, Node.js does allow Atomics.wait on the main thread, but it fully blocks the event loop — no I/O, no timers, no promises can progress while it spins. With a 15-second deadline and 250 ms ticks, callers of stopWindowsService / restartWindowsService can freeze the process for up to 15 seconds.

For a CLI that otherwise uses async I/O throughout, consider an async polling loop instead:

async function waitForServiceState(
  serviceName: string,
  matcher: (state?: string) => boolean,
): Promise<boolean> {
  const deadline = Date.now() + 15_000;
  while (Date.now() < deadline) {
    const query = parseScQueryExOutput(readScQueryEx(serviceName).stdout);
    if (matcher(query.state)) return true;
    await new Promise((resolve) => setTimeout(resolve, 250));
  }
  return false;
}

The callers (stopWindowsService, restartWindowsService) are already async, so this change is drop-in.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/daemon/windows-service.ts
Line: 245-258

Comment:
**Synchronous `Atomics.wait` blocks the event loop**

`sleepSync` / `waitForServiceState` call `Atomics.wait` on the main Node.js thread. Unlike browsers, Node.js does allow `Atomics.wait` on the main thread, but it **fully blocks the event loop** — no I/O, no timers, no promises can progress while it spins. With a 15-second deadline and 250 ms ticks, callers of `stopWindowsService` / `restartWindowsService` can freeze the process for up to 15 seconds.

For a CLI that otherwise uses async I/O throughout, consider an async polling loop instead:

```ts
async function waitForServiceState(
  serviceName: string,
  matcher: (state?: string) => boolean,
): Promise<boolean> {
  const deadline = Date.now() + 15_000;
  while (Date.now() < deadline) {
    const query = parseScQueryExOutput(readScQueryEx(serviceName).stdout);
    if (matcher(query.state)) return true;
    await new Promise((resolve) => setTimeout(resolve, 250));
  }
  return false;
}
```

The callers (`stopWindowsService`, `restartWindowsService`) are already `async`, so this change is drop-in.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +153 to +160
async function readWindowsCommandOrScheduledTask(
env: GatewayServiceEnv,
): Promise<GatewayServiceCommandConfig | null> {
if (await useWindowsService(env)) {
return await readWindowsServiceCommand(env);
}
return await readScheduledTaskCommand(env);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 sc.exe qc is spawned twice in the Windows-service happy path

readWindowsCommandOrScheduledTask calls useWindowsService(env) (→ readScQc) to decide which branch to take, then immediately calls readWindowsServiceCommand(env) which calls readScQc a second time. Since sc.exe is a subprocess spawn, this doubles the latency and process-spawn overhead for every status read on a Windows-service installation.

The same pattern recurs for readWindowsRuntimeOrScheduledTask, stopWindowsRuntimeOrScheduledTask, and restartWindowsRuntimeOrScheduledTask.

A simple way to address this is to have readWindowsServiceCommand accept an optional pre-read qc result, or to inline the detection into the read functions so the first successful sc qc is reused rather than discarded.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/daemon/service.ts
Line: 153-160

Comment:
**`sc.exe qc` is spawned twice in the Windows-service happy path**

`readWindowsCommandOrScheduledTask` calls `useWindowsService(env)` (→ `readScQc`) to decide which branch to take, then immediately calls `readWindowsServiceCommand(env)` which calls `readScQc` a second time. Since `sc.exe` is a subprocess spawn, this doubles the latency and process-spawn overhead for every status read on a Windows-service installation.

The same pattern recurs for `readWindowsRuntimeOrScheduledTask`, `stopWindowsRuntimeOrScheduledTask`, and `restartWindowsRuntimeOrScheduledTask`.

A simple way to address this is to have `readWindowsServiceCommand` accept an optional pre-read `qc` result, or to inline the detection into the read functions so the first successful `sc qc` is reused rather than discarded.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 51545516ca

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +290 to +292
waitForServiceState(serviceName, (state) => state?.toLowerCase().includes("running") ?? false);
stdout.write(`Restarted Windows service: ${serviceName}\n`);
return { outcome: "completed" };
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Propagate failed RUNNING transitions from Windows service restart

On Windows services that stay START_PENDING or crash immediately after sc start, waitForServiceState() returns false here but the result is discarded and the function still reports success. That becomes user-visible because runServiceStart in src/cli/daemon-cli/lifecycle-core.ts:227-258 only rechecks isLoaded(), and this adapter treats any installed Windows service as loaded, so openclaw gateway start / restart can claim success even though the service never reached RUNNING.

Useful? React with 👍 / 👎.

Comment on lines +203 to +207
return {
...GATEWAY_SERVICE_REGISTRY.win32,
uninstall: uninstallWindowsRuntimeOrScheduledTask,
stop: stopWindowsRuntimeOrScheduledTask,
restart: restartWindowsRuntimeOrScheduledTask,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid creating a second Windows supervisor that status ignores

This Windows adapter still inherits installScheduledTask via ...GATEWAY_SERVICE_REGISTRY.win32, but the new read/control paths below prefer the Windows service whenever sc qc succeeds. On a host that already has an NSSM/native service, openclaw gateway install --force now creates a Scheduled Task that later status, stop, and restart will never inspect or control, so the CLI can miss the gateway instance that is actually running. The install path needs a single backend or explicit dual-backend handling.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4425117ba6

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +183 to +185
environment: {
OPENCLAW_WINDOWS_SERVICE_NAME: serviceName,
},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve Windows service env overrides in status reads

readWindowsServiceCommand() returns only OPENCLAW_WINDOWS_SERVICE_NAME here, so wrapper-based services lose any OPENCLAW_PROFILE, OPENCLAW_STATE_DIR, OPENCLAW_CONFIG_PATH, or OPENCLAW_GATEWAY_PORT overrides they actually run with. gatherDaemonStatus() and the config audit consume command.environment to resolve the daemon config path, auth, and fallback port, so a Windows service launched through a wrapper with env overrides can now show the CLI's config instead of the service's and report false probe/config mismatches.

Useful? React with 👍 / 👎.

Comment on lines +199 to +200
if (command.sourcePath?.startsWith("service:")) {
return;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep command validation for non-wrapper Windows services

This early return suppresses gateway-command-missing for every Windows service, because readWindowsServiceCommand() always tags them as sourcePath: service:<name> even when it fell back to sc qc's raw BINARY_PATH_NAME. That fixes NSSM wrapper false positives, but it also hides real misconfigurations for native services or broken entries that no longer launch openclaw gateway ..., so doctor/gateway status can report a clean service config when the service command is wrong.

Useful? React with 👍 / 👎.

@dammtyler1 dammtyler1 force-pushed the windows-service-status-support branch from 4425117 to 60bfb6d Compare March 24, 2026 16:42
@dammtyler1
Copy link
Copy Markdown
Author

Addressed the two automated review suggestions in follow-up commit 262ceaf:

  • replaced the synchronous Windows service polling loop with async timer-based waiting so stop/restart no longer blocks the event loop
  • centralized Windows service probing so the status path can reuse a single sc.exe qc result instead of probing twice before falling back to scheduled tasks

Local verification:

  • pnpm exec vitest run src/daemon/windows-service.test.ts src/commands/status.service-summary.test.ts src/daemon/service.test.ts src/daemon/service-audit.test.ts src/cli/daemon-cli/status.gather.test.ts

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 262ceaf567

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +291 to +292
await waitForServiceState(serviceName, (state) => state?.toLowerCase().includes("stopped") ?? false);
stdout.write(`Stopped Windows service: ${serviceName}\n`);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reject stop when service never reaches STOPPED

stopWindowsService ignores the boolean returned by waitForServiceState, so a service that remains STOP_PENDING/RUNNING after sc stop still leads to a success message and a successful return. In that timeout path, runServiceStop treats the stop as completed even though the Windows service is still active, which can mislead operators and leave the gateway running after a reported stop.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a395e17fe5

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +203 to +204
if (qc.code !== 0) {
return null;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Distinguish service-missing from probe failures

probeWindowsService() currently treats every non-zero sc qc exit as “service not installed”, and the win32 adapter then falls back to Scheduled Task control paths for status/stop/restart/uninstall. When sc qc fails for reasons like permission errors or command timeouts (not service absence), the CLI can report/control the wrong backend or claim the service is missing even though it exists. Only true missing-service errors should return null; other probe failures should be surfaced.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 86e955abce

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 177 to 179
if (statusOnly) {
const manager = new MemoryIndexManager({
return new MemoryIndexManager({
cacheKey: key,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reuse or close status-only fallback managers

This branch now creates a fresh uncached MemoryIndexManager for every purpose: "status" request, but some production status callers still do not close what they receive (for example the memory_get tool path in src/agents/tools/memory-tool.ts), and getMemorySearchManager can route into this code when QMD falls back to builtin. In that scenario, repeated calls create new sqlite-backed managers without running close(), so handles accumulate in long-lived processes; before this commit, status managers were cached and reused, which avoided this churn.

Useful? React with 👍 / 👎.

@dammtyler1 dammtyler1 force-pushed the windows-service-status-support branch from 86e955a to a82fe5e Compare March 24, 2026 18:10
@openclaw-barnacle openclaw-barnacle bot added the docs Improvements or additions to documentation label Mar 24, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a82fe5e233

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +210 to +212
if (await probeWindowsService(args.env).catch(() => null)) {
await uninstallWindowsService(args);
return;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Uninstall both Windows backends when service is detected

This branch exits after deleting only the Windows service, but the win32 adapter still installs/maintains Scheduled Task as the default backend via GATEWAY_SERVICE_REGISTRY.win32. On hosts where both exist (for example after running openclaw gateway install --force while an NSSM/native service is present), gateway uninstall removes the service and leaves the task installed/running; the post-uninstall isLoaded check can then fail with "service still loaded" and the gateway can remain active despite an uninstall attempt.

Useful? React with 👍 / 👎.

@dammtyler1 dammtyler1 force-pushed the windows-service-status-support branch from a82fe5e to 0f8c477 Compare March 24, 2026 18:51
@openclaw-barnacle openclaw-barnacle bot added channel: discord Channel integration: discord scripts Repository scripts labels Mar 24, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3da74e7b1f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +288 to +289
if (result.code !== 0 && !/service has not been started/i.test(detail)) {
throw new Error(`sc stop failed: ${detail || "unknown error"}`.trim());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle already-stopped services by SC error code

The stop/restart paths only treat "already stopped" as non-fatal when sc.exe output contains the exact English text service has not been started. On non-English Windows installs, sc stop still returns non-zero (typically error 1062) but with localized text, so openclaw gateway stop and restart can fail even though the service is simply already stopped. Matching the numeric SCM error code (as done for missing-service handling with 1060) would make this behavior locale-safe.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: discord Channel integration: discord cli CLI command changes commands Command implementations docs Improvements or additions to documentation gateway Gateway runtime scripts Repository scripts size: L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant