fix(windows): use detached restart script for scheduled task restart#46869
fix(windows): use detached restart script for scheduled task restart#46869EvaZenith wants to merge 5 commits intoopenclaw:mainfrom
Conversation
Greptile SummaryThis PR fixes a real Windows-specific bug where Two issues are worth addressing before merging:
Confidence Score: 3/5
Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/daemon/schtasks.ts
Line: 727-730
Comment:
**Missing port-release verification after force-kill**
The original code had a second `waitForGatewayPortRelease(restartPort, 2_000)` guard after calling `terminateBusyPortListeners`. That 2-second check was the only mechanism that confirmed the port was actually free *before* the task was scheduled to start again. It has been removed here without a replacement.
The detached restart script's retry loop only re-runs `schtasks /Run` if the command itself fails (e.g. the task is still in a "stopping" state). It has no awareness of port availability. If `schtasks /Run` succeeds but the gateway process can't bind the port because a listener was not yet fully released, the gateway will exit immediately and silently — and the retry loop won't fire again because the initial `schtasks /Run` call returned exit code 0.
Consider restoring the port-release guard (or an equivalent check) before the function returns, for example:
```typescript
if (restartPort) {
const released = await waitForGatewayPortRelease(restartPort);
if (!released) {
await terminateBusyPortListeners(restartPort);
const releasedAfterForce = await waitForGatewayPortRelease(restartPort, 2_000);
if (!releasedAfterForce) {
// Port still busy; the detached script will likely start a gateway that can't bind.
// Log a warning, or throw so the caller surfaces the issue.
}
}
}
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/daemon/schtasks.ts
Line: 723-731
Comment:
**Potentially unreachable cleanup code**
The PR description states: *"`schtasks /End` terminates the current process immediately"* — yet all four cleanup steps (`terminateScheduledTaskGatewayListeners`, `terminateInstalledStartupRuntime`, `waitForGatewayPortRelease`, `terminateBusyPortListeners`) sit after the `/End` call and would never run if the process is killed synchronously.
If the process is indeed killed immediately on Windows, these lines are dead code in the common path, and any cleanup they were meant to do (releasing port listeners, stopping the startup runtime) is skipped entirely. The detached restart script then races against those still-running resources.
Consider either:
- Moving critical cleanup steps **before** `execSchtasks(["/End", …])` (mirrors what `relaunchGatewayScheduledTask` itself does by spawning before ending), or
- Adding a brief comment explaining these are intentionally best-effort and may not execute.
How can I resolve this? If you propose a fix, please make it concise.Last reviewed commit: 266ff0f |
| const released = await waitForGatewayPortRelease(restartPort); | ||
| if (!released) { | ||
| await terminateBusyPortListeners(restartPort); | ||
| const releasedAfterForce = await waitForGatewayPortRelease(restartPort, 2_000); | ||
| if (!releasedAfterForce) { | ||
| throw new Error(`gateway port ${restartPort} is still busy before restart`); | ||
| } | ||
| } |
There was a problem hiding this comment.
Missing port-release verification after force-kill
The original code had a second waitForGatewayPortRelease(restartPort, 2_000) guard after calling terminateBusyPortListeners. That 2-second check was the only mechanism that confirmed the port was actually free before the task was scheduled to start again. It has been removed here without a replacement.
The detached restart script's retry loop only re-runs schtasks /Run if the command itself fails (e.g. the task is still in a "stopping" state). It has no awareness of port availability. If schtasks /Run succeeds but the gateway process can't bind the port because a listener was not yet fully released, the gateway will exit immediately and silently — and the retry loop won't fire again because the initial schtasks /Run call returned exit code 0.
Consider restoring the port-release guard (or an equivalent check) before the function returns, for example:
if (restartPort) {
const released = await waitForGatewayPortRelease(restartPort);
if (!released) {
await terminateBusyPortListeners(restartPort);
const releasedAfterForce = await waitForGatewayPortRelease(restartPort, 2_000);
if (!releasedAfterForce) {
// Port still busy; the detached script will likely start a gateway that can't bind.
// Log a warning, or throw so the caller surfaces the issue.
}
}
}Prompt To Fix With AI
This is a comment left during a code review.
Path: src/daemon/schtasks.ts
Line: 727-730
Comment:
**Missing port-release verification after force-kill**
The original code had a second `waitForGatewayPortRelease(restartPort, 2_000)` guard after calling `terminateBusyPortListeners`. That 2-second check was the only mechanism that confirmed the port was actually free *before* the task was scheduled to start again. It has been removed here without a replacement.
The detached restart script's retry loop only re-runs `schtasks /Run` if the command itself fails (e.g. the task is still in a "stopping" state). It has no awareness of port availability. If `schtasks /Run` succeeds but the gateway process can't bind the port because a listener was not yet fully released, the gateway will exit immediately and silently — and the retry loop won't fire again because the initial `schtasks /Run` call returned exit code 0.
Consider restoring the port-release guard (or an equivalent check) before the function returns, for example:
```typescript
if (restartPort) {
const released = await waitForGatewayPortRelease(restartPort);
if (!released) {
await terminateBusyPortListeners(restartPort);
const releasedAfterForce = await waitForGatewayPortRelease(restartPort, 2_000);
if (!releasedAfterForce) {
// Port still busy; the detached script will likely start a gateway that can't bind.
// Log a warning, or throw so the caller surfaces the issue.
}
}
}
```
How can I resolve this? If you propose a fix, please make it concise.
src/daemon/schtasks.ts
Outdated
| await execSchtasks(["/End", "/TN", taskName]); | ||
| await terminateScheduledTaskGatewayListeners(effectiveEnv); | ||
| await terminateInstalledStartupRuntime(effectiveEnv); | ||
| if (restartPort) { | ||
| const released = await waitForGatewayPortRelease(restartPort); | ||
| if (!released) { | ||
| await terminateBusyPortListeners(restartPort); | ||
| const releasedAfterForce = await waitForGatewayPortRelease(restartPort, 2_000); | ||
| if (!releasedAfterForce) { | ||
| throw new Error(`gateway port ${restartPort} is still busy before restart`); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Potentially unreachable cleanup code
The PR description states: "schtasks /End terminates the current process immediately" — yet all four cleanup steps (terminateScheduledTaskGatewayListeners, terminateInstalledStartupRuntime, waitForGatewayPortRelease, terminateBusyPortListeners) sit after the /End call and would never run if the process is killed synchronously.
If the process is indeed killed immediately on Windows, these lines are dead code in the common path, and any cleanup they were meant to do (releasing port listeners, stopping the startup runtime) is skipped entirely. The detached restart script then races against those still-running resources.
Consider either:
- Moving critical cleanup steps before
execSchtasks(["/End", …])(mirrors whatrelaunchGatewayScheduledTaskitself does by spawning before ending), or - Adding a brief comment explaining these are intentionally best-effort and may not execute.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/daemon/schtasks.ts
Line: 723-731
Comment:
**Potentially unreachable cleanup code**
The PR description states: *"`schtasks /End` terminates the current process immediately"* — yet all four cleanup steps (`terminateScheduledTaskGatewayListeners`, `terminateInstalledStartupRuntime`, `waitForGatewayPortRelease`, `terminateBusyPortListeners`) sit after the `/End` call and would never run if the process is killed synchronously.
If the process is indeed killed immediately on Windows, these lines are dead code in the common path, and any cleanup they were meant to do (releasing port listeners, stopping the startup runtime) is skipped entirely. The detached restart script then races against those still-running resources.
Consider either:
- Moving critical cleanup steps **before** `execSchtasks(["/End", …])` (mirrors what `relaunchGatewayScheduledTask` itself does by spawning before ending), or
- Adding a brief comment explaining these are intentionally best-effort and may not execute.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 266ff0f353
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const released = await waitForGatewayPortRelease(restartPort); | ||
| if (!released) { | ||
| await terminateBusyPortListeners(restartPort); | ||
| const releasedAfterForce = await waitForGatewayPortRelease(restartPort, 2_000); | ||
| if (!releasedAfterForce) { | ||
| throw new Error(`gateway port ${restartPort} is still busy before restart`); | ||
| } | ||
| } |
There was a problem hiding this comment.
Re-check port release before marking restart complete
This restart path now stops after terminateBusyPortListeners(restartPort) without verifying that the gateway port was actually freed, then reports success. In cases where a listener cannot be killed (for example, permission-restricted PID or a non-gateway process holding the port), the detached helper still calls schtasks /Run; that command can return success even if the gateway immediately dies on bind, so users get a false "Restarted" result and the service stays down. The previous logic explicitly re-waited and threw when the port remained busy.
Useful? React with 👍 / 👎.
|
Thanks for the thorough review! Both points are valid — addressed in the latest commit. Re: missing port-release verification Re: potentially unreachable cleanup code
This eliminates the race between the detached script and any still-running listeners. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 235a36a5bd
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| await execSchtasks(["/End", "/TN", taskName]); | ||
| stdout.write(`${formatLine("Restarted Scheduled Task", taskName)}\n`); |
There was a problem hiding this comment.
Validate task stop before reporting restart success
restartScheduledTask now unconditionally reports success right after execSchtasks(["/End", ...]), but it no longer checks any synchronous /Run result in this code path. When task control actually fails (for example, ACL-denied task control, or the task being removed/disabled between the earlier existence check and restart), the detached helper can fail silently and this function still returns { outcome: "completed" } with "Restarted Scheduled Task", which is a regression from the prior behavior that surfaced restart failures via the inline /Run status check.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c719874ea7
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // task. On Windows, `schtasks /End` terminates the current process | ||
| // immediately (the agent runs inside the scheduled task), so any code after | ||
| // that call is unreliable dead code in the common path. | ||
| await terminateScheduledTaskGatewayListeners(effectiveEnv); |
There was a problem hiding this comment.
Spawn restart handoff before pre-end listener termination
restartScheduledTask now terminates gateway listeners before calling relaunchGatewayScheduledTask, which can kill the current process in the exact Windows fallback case where verified PID discovery misses. resolveScheduledTaskGatewayListenerPids falls back to inspectPortUsage and does not filter out process.pid, so when restart is triggered from the running scheduled-task gateway, terminateScheduledTaskGatewayListeners can end the caller before the detached /Run script is created, leaving the task stopped with no relaunch attempt.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bda85f4a43
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (!attempt.ok) { | ||
| throw new Error(`failed to prepare restart script: ${attempt.detail ?? "unknown error"}`); |
There was a problem hiding this comment.
Provide inline /Run fallback when handoff setup fails
In restartScheduledTask, all stop/cleanup work runs before this check, so when relaunchGatewayScheduledTask returns ok: false (for example, temp script write/spawn failures handled in src/infra/windows-task-restart.ts), this throw aborts without any /Run attempt and can leave the gateway down after a restart request. This regression is specific to environments where detached-script setup fails, and an inline /Run fallback (when still possible) would avoid turning a transient handoff error into an outage.
Useful? React with 👍 / 👎.
|
The failing CI check (checks-windows (node, test, 3, 6, pnpm test)) is unrelated to this PR. The 6 failing tests are all in src/tts/tts.test.ts (summary model tests), which are not touched by this change. The other 5 Windows shards all pass. This appears to be a pre-existing flaky test or an upstream issue. |
|
Hi maintainers, The CI failure on checks-windows (node, test, 3, 6) is unrelated to this PR. The failing tests are in src/tts/tts.test.ts with this error:
This is a pre-existing mock issue in the TTS module, not caused by my changes to src/daemon/schtasks.ts. My PR only modifies the Windows gateway restart logic and has been verified locally on Windows 10. Could you please review and merge manually? Thanks! |
Summary
Fixes #5065, #27802, #41804
On Windows,
openclaw gateway restart(and the in-chat/restartcommand) stops the gateway but never relaunches it. The gateway remains offline until manually restarted.Root Cause
restartScheduledTask()insrc/daemon/schtasks.tscallsschtasks /Endandschtasks /Runinline in the same process. On Windows, the agent process is owned by the Scheduled Task — soschtasks /Endterminates the current process immediately, beforeschtasks /Runever executes.Fix
Call
relaunchGatewayScheduledTask()(already present insrc/infra/windows-task-restart.ts) before callingschtasks /End. This function:.cmdscript to the OpenClaw tmp directorystdio: ignore,child.unref())schtasks /Runwith up to 12 retriesThe gateway process then shuts down cleanly, and the detached script handles the relaunch independently.
Changes
src/daemon/schtasks.ts: importrelaunchGatewayScheduledTaskand call it beforeschtasks /End; remove the now-unnecessary inlineschtasks /RuncallTesting
Verified locally on Windows 10 (x64) with OpenClaw v2026.3.8 — gateway successfully restarts after
openclaw gateway restart.