Gateway: refresh service env after update.run#32575
Gateway: refresh service env after update.run#32575zwright8 wants to merge 1 commit intoopenclaw:mainfrom
Conversation
Greptile SummaryThis PR adds a best-effort service-env refresh step to Key points:
Confidence Score: 4/5
Last reviewed commit: d9188dc |
| if (!refreshResult.ok) { | ||
| context?.logGateway?.warn( | ||
| `update.run service env refresh skipped ${formatControlPlaneActor(actor)} reason=${refreshResult.reason ?? "unknown"}`, | ||
| ); | ||
| } |
There was a problem hiding this comment.
Misleading log label for attempted-but-failed refresh
The warning always says "service env refresh skipped", but when attempted: true and ok: false (a candidate file was found, the command ran, and returned a non-zero exit code) the refresh was not skipped — it was attempted and failed. The word "skipped" only accurately describes the cases where the service is not installed (ok: true) or the command was never invoked (service check threw, or no candidate file existed).
Consider branching on refreshResult.attempted to distinguish the two situations:
if (!refreshResult.ok) {
const verb = refreshResult.attempted ? "failed" : "skipped";
context?.logGateway?.warn(
`update.run service env refresh ${verb} ${formatControlPlaneActor(actor)} reason=${refreshResult.reason ?? "unknown"}`,
);
}Prompt To Fix With AI
This is a comment left during a code review.
Path: src/gateway/server-methods/update.ts
Line: 143-147
Comment:
**Misleading log label for attempted-but-failed refresh**
The warning always says `"service env refresh skipped"`, but when `attempted: true` and `ok: false` (a candidate file was found, the command ran, and returned a non-zero exit code) the refresh was not skipped — it was attempted and failed. The word "skipped" only accurately describes the cases where the service is not installed (`ok: true`) or the command was never invoked (service check threw, or no candidate file existed).
Consider branching on `refreshResult.attempted` to distinguish the two situations:
```typescript
if (!refreshResult.ok) {
const verb = refreshResult.attempted ? "failed" : "skipped";
context?.logGateway?.warn(
`update.run service env refresh ${verb} ${formatControlPlaneActor(actor)} reason=${refreshResult.reason ?? "unknown"}`,
);
}
```
How can I resolve this? If you propose a fix, please make it concise.| return { | ||
| attempted: true, | ||
| ok: false, | ||
| reason: `refresh failed (${candidate}): exit=${res.code}`, | ||
| }; |
There was a problem hiding this comment.
stderr omitted from failure reason
When runCommandWithTimeout returns a non-zero exit code, res.stderr is available but not captured in the reason string. Since this reason flows directly into the warning log, any error message printed by openclaw.mjs gateway install will be silently discarded, making production failures harder to diagnose.
return {
attempted: true,
ok: false,
reason: `refresh failed (${candidate}): exit=${res.code}${res.stderr ? ` stderr=${res.stderr.trim()}` : ""}`,
};Prompt To Fix With AI
This is a comment left during a code review.
Path: src/gateway/server-methods/update.ts
Line: 83-87
Comment:
**`stderr` omitted from failure reason**
When `runCommandWithTimeout` returns a non-zero exit code, `res.stderr` is available but not captured in the `reason` string. Since this `reason` flows directly into the warning log, any error message printed by `openclaw.mjs gateway install` will be silently discarded, making production failures harder to diagnose.
```typescript
return {
attempted: true,
ok: false,
reason: `refresh failed (${candidate}): exit=${res.code}${res.stderr ? ` stderr=${res.stderr.trim()}` : ""}`,
};
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d9188dc99e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| import { assertValidParams } from "./validation.js"; | ||
|
|
||
| const SERVICE_REFRESH_TIMEOUT_MS = 60_000; | ||
| const SERVICE_REFRESH_ARGS = ["gateway", "install", "--force", "--json"] as const; |
There was a problem hiding this comment.
Avoid self-restarting gateway while handling update.run
The refresh command now runs gateway install --force (src/gateway/server-methods/update.ts), but that install path actively restarts the service (src/daemon/systemd.ts:262 calls systemctl --user restart, and src/daemon/launchd.ts:411 bootouts/reloads the LaunchAgent). Because this refresh runs before sentinel writing and before the explicit SIGUSR1 scheduling, calling update.run from the daemonized gateway process can terminate the in-flight handler, which can drop the response/sentinel and bypass the controlled restart flow.
Useful? React with 👍 / 👎.
| ok: boolean; | ||
| reason?: string; | ||
| }> { | ||
| const service = resolveGatewayService(); |
There was a problem hiding this comment.
Catch unsupported service resolution in env refresh path
refreshGatewayServiceEnvFromUpdatedInstall calls resolveGatewayService() outside any try/catch, but resolveGatewayService throws on unsupported platforms (src/daemon/service.ts). That means a successful runGatewayUpdate can now be turned into an unhandled error in update.run on those platforms, which is a regression from the previous behavior where the handler always continued to build and return the update payload.
Useful? React with 👍 / 👎.
d9188dc to
8f8d356
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8f8d356fe3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const res = await runCommandWithTimeout([nodeRunner, candidate, ...SERVICE_REFRESH_ARGS], { | ||
| timeoutMs: SERVICE_REFRESH_TIMEOUT_MS, | ||
| }); |
There was a problem hiding this comment.
Catch service-refresh spawn failures in update.run
refreshGatewayServiceEnvFromUpdatedInstall treats non-zero exits as recoverable, but the runCommandWithTimeout(...) call can also reject on child-process startup errors (see src/process/exec.ts where child.on("error") rejects). In environments where the refresh command cannot be spawned (for example missing node when the gateway runtime is Bun, or ENOENT/EACCES for the resolved entrypoint), this throws out of update.run after a successful update, so the handler can return an UNAVAILABLE error instead of writing the restart sentinel and scheduling the controlled restart.
Useful? React with 👍 / 👎.
|
Closing this PR because the author has more than 10 active PRs in this repo. Please reduce the active PR queue and reopen or resubmit once it is back under the limit. You can close your own PRs to get back under the limit. |
Summary
update.runso post-update runtime reads the latest env valuesValidation
pnpm exec vitest run src/gateway/server-methods/update.test.tsContext
This PR is one focused slice extracted from: