fix(gateway): validate config before restart to prevent crash + macOS permission loss (#35862)#35924
fix(gateway): validate config before restart to prevent crash + macOS permission loss (#35862)#35924lml2468 wants to merge 3 commits intoopenclaw:mainfrom
Conversation
Greptile SummaryThis PR adds pre-flight config validation to Key findings:
The core fix for Confidence Score: 3/5
Last reviewed commit: 8516b73 |
| fail( | ||
| `${params.serviceNoun} restart aborted: config is invalid.\n${configIssues}\nFix the config and retry, or run "openclaw doctor" to repair.`, | ||
| ); |
There was a problem hiding this comment.
The error message says "restart aborted" but this code is in runServiceStart(). A user running openclaw gateway start would see a confusing "Gateway restart aborted: config is invalid." message, suggesting they ran a restart command instead.
| fail( | |
| `${params.serviceNoun} restart aborted: config is invalid.\n${configIssues}\nFix the config and retry, or run "openclaw doctor" to repair.`, | |
| ); | |
| fail( | |
| `${params.serviceNoun} start aborted: config is invalid.\n${configIssues}\nFix the config and retry, or run "openclaw doctor" to repair.`, | |
| ); |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/cli/daemon-cli/lifecycle-core.ts
Line: 197-199
Comment:
The error message says `"restart aborted"` but this code is in `runServiceStart()`. A user running `openclaw gateway start` would see a confusing `"Gateway restart aborted: config is invalid."` message, suggesting they ran a restart command instead.
```suggestion
fail(
`${params.serviceNoun} start aborted: config is invalid.\n${configIssues}\nFix the config and retry, or run "openclaw doctor" to repair.`,
);
```
How can I resolve this? If you propose a fix, please make it concise.
src/cli/daemon-cli/lifecycle-core.ts
Outdated
| // Pre-flight config validation: refuse to restart if config is invalid. | ||
| // A crash on startup after restart would spawn a new process that loses | ||
| // macOS Full Disk Access (TCC permissions are PID-bound). (#35862) |
There was a problem hiding this comment.
The inline comments say "refuse to restart" and "after restart", but this is runServiceStart(). They should describe the start path:
| // Pre-flight config validation: refuse to restart if config is invalid. | |
| // A crash on startup after restart would spawn a new process that loses | |
| // macOS Full Disk Access (TCC permissions are PID-bound). (#35862) | |
| // Pre-flight config validation: refuse to start if config is invalid. | |
| // A crash on startup would spawn a new process that loses | |
| // macOS Full Disk Access (TCC permissions are PID-bound). (#35862) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/cli/daemon-cli/lifecycle-core.ts
Line: 178-180
Comment:
The inline comments say "refuse to restart" and "after restart", but this is `runServiceStart()`. They should describe the start path:
```suggestion
// Pre-flight config validation: refuse to start if config is invalid.
// A crash on startup would spawn a new process that loses
// macOS Full Disk Access (TCC permissions are PID-bound). (#35862)
```
How can I resolve this? If you propose a fix, please make it concise.| describe("runServiceRestart config pre-flight (#35862)", () => { | ||
| let runServiceRestart: typeof import("./lifecycle-core.js").runServiceRestart; | ||
|
|
||
| beforeAll(async () => { | ||
| ({ runServiceRestart } = await import("./lifecycle-core.js")); | ||
| }); | ||
|
|
||
| beforeEach(() => { | ||
| runtimeLogs.length = 0; | ||
| readConfigFileSnapshotMock.mockReset(); | ||
| readConfigFileSnapshotMock.mockResolvedValue({ | ||
| exists: true, | ||
| valid: true, | ||
| config: {}, | ||
| issues: [], | ||
| }); | ||
| loadConfig.mockReset(); | ||
| loadConfig.mockReturnValue({}); | ||
| service.isLoaded.mockClear(); | ||
| service.readCommand.mockClear(); | ||
| service.restart.mockClear(); | ||
| service.isLoaded.mockResolvedValue(true); | ||
| service.readCommand.mockResolvedValue({ environment: {} }); | ||
| service.restart.mockResolvedValue(undefined); | ||
| vi.unstubAllEnvs(); | ||
| vi.stubEnv("OPENCLAW_GATEWAY_TOKEN", ""); | ||
| vi.stubEnv("CLAWDBOT_GATEWAY_TOKEN", ""); | ||
| }); | ||
|
|
||
| it("aborts restart when config is invalid", async () => { | ||
| readConfigFileSnapshotMock.mockResolvedValue({ | ||
| exists: true, | ||
| valid: false, | ||
| config: {}, | ||
| issues: [{ path: "agents.defaults.pdfModel", message: "Unrecognized key" }], | ||
| }); | ||
|
|
||
| await expect( | ||
| runServiceRestart({ | ||
| serviceNoun: "Gateway", | ||
| service, | ||
| renderStartHints: () => [], | ||
| opts: { json: true }, | ||
| }), | ||
| ).rejects.toThrow("__exit__:1"); | ||
|
|
||
| expect(service.restart).not.toHaveBeenCalled(); | ||
| }); | ||
|
|
||
| it("proceeds with restart when config is valid", async () => { | ||
| readConfigFileSnapshotMock.mockResolvedValue({ | ||
| exists: true, | ||
| valid: true, | ||
| config: {}, | ||
| issues: [], | ||
| }); | ||
|
|
||
| const result = await runServiceRestart({ | ||
| serviceNoun: "Gateway", | ||
| service, | ||
| renderStartHints: () => [], | ||
| opts: { json: true }, | ||
| }); | ||
|
|
||
| expect(result).toBe(true); | ||
| expect(service.restart).toHaveBeenCalledTimes(1); | ||
| }); | ||
|
|
||
| it("proceeds with restart when config file does not exist", async () => { | ||
| readConfigFileSnapshotMock.mockResolvedValue({ | ||
| exists: false, | ||
| valid: true, | ||
| config: {}, | ||
| issues: [], | ||
| }); | ||
|
|
||
| const result = await runServiceRestart({ | ||
| serviceNoun: "Gateway", | ||
| service, | ||
| renderStartHints: () => [], | ||
| opts: { json: true }, | ||
| }); | ||
|
|
||
| expect(result).toBe(true); | ||
| expect(service.restart).toHaveBeenCalledTimes(1); | ||
| }); | ||
|
|
||
| it("proceeds with restart when snapshot read throws", async () => { | ||
| readConfigFileSnapshotMock.mockRejectedValue(new Error("read failed")); | ||
|
|
||
| const result = await runServiceRestart({ | ||
| serviceNoun: "Gateway", | ||
| service, | ||
| renderStartHints: () => [], | ||
| opts: { json: true }, | ||
| }); | ||
|
|
||
| expect(result).toBe(true); | ||
| expect(service.restart).toHaveBeenCalledTimes(1); | ||
| }); | ||
| }); |
There was a problem hiding this comment.
The new validation block was added to both runServiceStart and runServiceRestart, but test coverage only exists for runServiceRestart. The missing test cases for runServiceStart allowed the copy-paste error in lifecycle-core.ts line 198 (incorrect "restart aborted" message in the start path) to go undetected.
Consider adding a parallel describe("runServiceStart config pre-flight...") block with the same four test cases (invalid config aborts, valid config proceeds, file not found proceeds, snapshot read throws proceeds) to ensure consistency between both functions.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/cli/daemon-cli/lifecycle-core.config-guard.test.ts
Line: 45-145
Comment:
The new validation block was added to both `runServiceStart` and `runServiceRestart`, but test coverage only exists for `runServiceRestart`. The missing test cases for `runServiceStart` allowed the copy-paste error in lifecycle-core.ts line 198 (incorrect "restart aborted" message in the start path) to go undetected.
Consider adding a parallel `describe("runServiceStart config pre-flight...")` block with the same four test cases (invalid config aborts, valid config proceeds, file not found proceeds, snapshot read throws proceeds) to ensure consistency between both functions.
How can I resolve this? If you propose a fix, please make it concise.… permission loss (openclaw#35862) When 'openclaw gateway restart' is run with an invalid config, the new process crashes on startup due to config validation failure. On macOS, this causes Full Disk Access (TCC) permissions to be lost because the respawned process has a different PID. Add getConfigValidationError() helper and pre-flight config validation in both runServiceRestart() and runServiceStart(). If config is invalid, abort with a clear error message instead of crashing. The config watcher's hot-reload path already had this guard (handleInvalidSnapshot), but the CLI restart/start commands did not. AI-assisted (OpenClaw agent, fully tested)
8516b73 to
251bd68
Compare
…it (openclaw#35862) When an in-process restart (SIGUSR1) triggers a config-triggered restart and the new config is invalid, params.start() throws and the while loop exits, killing the process. On macOS this loses TCC permissions. Wrap params.start() in try/catch: on failure, set server=null, log the error, and wait for the next SIGUSR1 instead of crashing.
Address Greptile review: add test coverage for runServiceStart path. The error message copy-paste issue was already fixed in the DRY refactor (uses params.serviceNoun instead of hardcoded 'restart').
|
Closing this PR because the author has more than 10 active PRs in this repo. Please reduce the active PR queue and reopen or resubmit once it is back under the limit. You can close your own PRs to get back under the limit. |
What
Add pre-flight config validation before
gateway restartandgateway startto prevent crashes from invalid config.Why
Fixes #35862 — When
openclaw gateway restartis run with an invalid config (e.g., unrecognized key likeagents.defaults.pdfModel), the new process crashes on startup due to config validation (.strict()rejection). On macOS, this causes Full Disk Access (TCC) permissions to be lost because launchd respawns a new process with a different PID, which doesn't inherit TCC grants.How
runServiceRestart()andrunServiceStart()inlifecycle-core.tsnow callreadConfigFileSnapshot()before proceedingopenclaw doctorto repairhandleInvalidSnapshotinconfig-reload.ts), but the CLI commands did notTesting
pnpm buildpassespnpm checkpassesFiles changed
src/cli/daemon-cli/lifecycle-core.tsrunServiceRestart+runServiceStartsrc/cli/daemon-cli/lifecycle-core.config-guard.test.ts