Skip to content

Commit ffb1628

Browse files
committed
fix: recover invalid gateway configs
1 parent dafc315 commit ffb1628

19 files changed

Lines changed: 1019 additions & 17 deletions

docs/.i18n/glossary.zh-CN.json

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -374,5 +374,17 @@
374374
{
375375
"source": "Testing",
376376
"target": "测试"
377+
},
378+
{
379+
"source": "/gateway/configuration#strict-validation",
380+
"target": "/gateway/configuration#strict-validation"
381+
},
382+
{
383+
"source": "/gateway/configuration#config-hot-reload",
384+
"target": "/gateway/configuration#config-hot-reload"
385+
},
386+
{
387+
"source": "/cli/config",
388+
"target": "/cli/config"
377389
}
378390
]

docs/cli/config.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -336,6 +336,34 @@ If dry-run fails:
336336
- `Dry run note: skipped <n> exec SecretRef resolvability check(s)`: dry-run skipped exec refs; rerun with `--allow-exec` if you need exec resolvability validation.
337337
- For batch mode, fix failing entries and rerun `--dry-run` before writing.
338338

339+
## Write safety
340+
341+
`openclaw config set` and other OpenClaw-owned config writers validate the full
342+
post-change config before committing it to disk. If the new payload fails schema
343+
validation or looks like a destructive clobber, the active config is left alone
344+
and the rejected payload is saved beside it as `openclaw.json.rejected.*`.
345+
346+
Prefer CLI writes for small edits:
347+
348+
```bash
349+
openclaw config set gateway.reload.mode hybrid --dry-run
350+
openclaw config set gateway.reload.mode hybrid
351+
openclaw config validate
352+
```
353+
354+
If a write is rejected, inspect the saved payload and fix the full config shape:
355+
356+
```bash
357+
CONFIG="$(openclaw config file)"
358+
ls -lt "$CONFIG".rejected.* 2>/dev/null | head
359+
openclaw config validate
360+
```
361+
362+
Direct editor writes are still allowed, but the running Gateway treats them as
363+
untrusted until they validate. Invalid direct edits can be restored from the
364+
last-known-good backup during startup or hot reload. See
365+
[Gateway troubleshooting](/gateway/troubleshooting#gateway-restored-last-known-good-config).
366+
339367
## Subcommands
340368

341369
- `config file`: Print the active config file path (resolved from `OPENCLAW_CONFIG_PATH` or default location).

docs/gateway/configuration.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,17 @@ When validation fails:
9696
- Run `openclaw doctor` to see exact issues
9797
- Run `openclaw doctor --fix` (or `--yes`) to apply repairs
9898

99+
The Gateway also keeps a trusted last-known-good copy after a successful startup. If
100+
`openclaw.json` is later changed outside OpenClaw and no longer validates, startup
101+
and hot reload preserve the broken file as a timestamped `.clobbered.*` snapshot,
102+
restore the last-known-good copy, and log a loud warning with the recovery reason.
103+
The next main-agent turn also receives a system-event warning telling it that the
104+
config was restored and must not be blindly rewritten. Last-known-good promotion
105+
is updated after validated startup and after accepted hot reloads, including
106+
OpenClaw-owned config writes whose persisted file hash still matches the accepted
107+
write. Promotion is skipped when the candidate contains redacted secret
108+
placeholders such as `***` or shortened token values.
109+
99110
## Common tasks
100111

101112
<AccordionGroup>
@@ -494,6 +505,19 @@ When validation fails:
494505

495506
The Gateway watches `~/.openclaw/openclaw.json` and applies changes automatically — no manual restart needed for most settings.
496507

508+
Direct file edits are treated as untrusted until they validate. The watcher waits
509+
for editor temp-write/rename churn to settle, reads the final file, and rejects
510+
invalid external edits by restoring the last-known-good config. OpenClaw-owned
511+
config writes use the same schema gate before writing; destructive clobbers such
512+
as dropping `gateway.mode` or shrinking the file by more than half are rejected
513+
and saved as `.rejected.*` for inspection.
514+
515+
If you see `Config auto-restored from last-known-good` or
516+
`config reload restored last-known-good config` in logs, inspect the matching
517+
`.clobbered.*` file next to `openclaw.json`, fix the rejected payload, then run
518+
`openclaw config validate`. See [Gateway troubleshooting](/gateway/troubleshooting#gateway-restored-last-known-good-config)
519+
for the recovery checklist.
520+
497521
### Reload modes
498522

499523
| Mode | Behavior |

docs/gateway/troubleshooting.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -262,6 +262,63 @@ Related:
262262
- [/gateway/configuration](/gateway/configuration)
263263
- [/gateway/doctor](/gateway/doctor)
264264

265+
## Gateway restored last-known-good config
266+
267+
Use this when the Gateway starts, but logs say it restored `openclaw.json`.
268+
269+
```bash
270+
openclaw logs --follow
271+
openclaw config file
272+
openclaw config validate
273+
openclaw doctor
274+
```
275+
276+
Look for:
277+
278+
- `Config auto-restored from last-known-good`
279+
- `gateway: invalid config was restored from last-known-good backup`
280+
- `config reload restored last-known-good config after invalid-config`
281+
- A timestamped `openclaw.json.clobbered.*` file beside the active config
282+
- A main-agent system event that starts with `Config recovery warning`
283+
284+
What happened:
285+
286+
- The rejected config did not validate during startup or hot reload.
287+
- OpenClaw preserved the rejected payload as `.clobbered.*`.
288+
- The active config was restored from the last validated last-known-good copy.
289+
- The next main-agent turn is warned not to blindly rewrite the rejected config.
290+
291+
Inspect and repair:
292+
293+
```bash
294+
CONFIG="$(openclaw config file)"
295+
ls -lt "$CONFIG".clobbered.* "$CONFIG".rejected.* 2>/dev/null | head
296+
diff -u "$CONFIG" "$(ls -t "$CONFIG".clobbered.* 2>/dev/null | head -n 1)"
297+
openclaw config validate
298+
openclaw doctor
299+
```
300+
301+
Common signatures:
302+
303+
- `.clobbered.*` exists → an external direct edit or startup read was restored.
304+
- `.rejected.*` exists → an OpenClaw-owned config write failed schema or clobber checks before commit.
305+
- `Config write rejected:` → the write tried to drop required shape, shrink the file sharply, or persist invalid config.
306+
- `Config last-known-good promotion skipped` → the candidate contained redacted secret placeholders such as `***`.
307+
308+
Fix options:
309+
310+
1. Keep the restored active config if it is correct.
311+
2. Copy only the intended keys from `.clobbered.*` or `.rejected.*`, then apply them with `openclaw config set` or `config.patch`.
312+
3. Run `openclaw config validate` before restarting.
313+
4. If you edit by hand, keep the full JSON5 config, not just the partial object you wanted to change.
314+
315+
Related:
316+
317+
- [/gateway/configuration#strict-validation](/gateway/configuration#strict-validation)
318+
- [/gateway/configuration#config-hot-reload](/gateway/configuration#config-hot-reload)
319+
- [/cli/config](/cli/config)
320+
- [/gateway/doctor](/gateway/doctor)
321+
265322
## Gateway probe warnings
266323

267324
Use this when `openclaw gateway probe` reaches something, but still prints a warning block.

docs/help/faq.md

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1629,10 +1629,20 @@ for usage/billing and raise limits as needed.
16291629
`config.apply` replaces the **entire config**. If you send a partial object, everything
16301630
else is removed.
16311631

1632+
Current OpenClaw protects many accidental clobbers:
1633+
1634+
- OpenClaw-owned config writes validate the full post-change config before writing.
1635+
- Invalid or destructive OpenClaw-owned writes are rejected and saved as `openclaw.json.rejected.*`.
1636+
- If a direct edit breaks startup or hot reload, the Gateway restores the last-known-good config and saves the rejected file as `openclaw.json.clobbered.*`.
1637+
- The main agent receives a boot warning after recovery so it does not blindly write the bad config again.
1638+
16321639
Recover:
16331640

1634-
- Restore from backup (git or a copied `~/.openclaw/openclaw.json`).
1635-
- If you have no backup, re-run `openclaw doctor` and reconfigure channels/models.
1641+
- Check `openclaw logs --follow` for `Config auto-restored from last-known-good`, `Config write rejected:`, or `config reload restored last-known-good config`.
1642+
- Inspect the newest `openclaw.json.clobbered.*` or `openclaw.json.rejected.*` beside the active config.
1643+
- Keep the active restored config if it works, then copy only the intended keys back with `openclaw config set` or `config.patch`.
1644+
- Run `openclaw config validate` and `openclaw doctor`.
1645+
- If you have no last-known-good or rejected payload, restore from backup, or re-run `openclaw doctor` and reconfigure channels/models.
16361646
- If this was unexpected, file a bug and include your last known config or any backup.
16371647
- A local coding agent can often reconstruct a working config from logs or history.
16381648

@@ -1644,7 +1654,7 @@ for usage/billing and raise limits as needed.
16441654
- Use `config.patch` for partial RPC edits; keep `config.apply` for full-config replacement only.
16451655
- If you are using the owner-only `gateway` tool from an agent run, it will still reject writes to `tools.exec.ask` / `tools.exec.security` (including legacy `tools.bash.*` aliases that normalize to the same protected exec paths).
16461656

1647-
Docs: [Config](/cli/config), [Configure](/cli/configure), [Doctor](/gateway/doctor).
1657+
Docs: [Config](/cli/config), [Configure](/cli/configure), [Gateway troubleshooting](/gateway/troubleshooting#gateway-restored-last-known-good-config), [Doctor](/gateway/doctor).
16481658

16491659
</Accordion>
16501660

src/config/config.ts

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,10 +12,12 @@ export {
1212
readBestEffortConfig,
1313
readSourceConfigBestEffort,
1414
parseConfigJson5,
15+
promoteConfigSnapshotToLastKnownGood,
1516
readConfigFileSnapshot,
1617
readConfigFileSnapshotForWrite,
1718
readSourceConfigSnapshot,
1819
readSourceConfigSnapshotForWrite,
20+
recoverConfigFromLastKnownGood,
1921
resetConfigRuntimeState,
2022
resolveConfigSnapshotHash,
2123
setRuntimeConfigSnapshotRefreshHandler,

src/config/io.audit.ts

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ import { resolveStateDir } from "./paths.js";
33

44
const CONFIG_AUDIT_LOG_FILENAME = "config-audit.jsonl";
55

6-
export type ConfigWriteAuditResult = "rename" | "copy-fallback" | "failed";
6+
export type ConfigWriteAuditResult = "rename" | "copy-fallback" | "failed" | "rejected";
77

88
export type ConfigWriteAuditRecord = {
99
ts: string;
@@ -269,7 +269,7 @@ export function finalizeConfigWriteAuditRecord(params: {
269269
uid: null,
270270
gid: null,
271271
};
272-
const success = params.result !== "failed";
272+
const success = params.result !== "failed" && params.result !== "rejected";
273273
return {
274274
...params.base,
275275
result: params.result,

src/config/io.observe-recovery.test.ts

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,12 @@ import { afterAll, beforeAll, describe, expect, it, vi } from "vitest";
77
import {
88
maybeRecoverSuspiciousConfigRead,
99
maybeRecoverSuspiciousConfigReadSync,
10+
promoteConfigSnapshotToLastKnownGood,
11+
recoverConfigFromLastKnownGood,
12+
resolveLastKnownGoodConfigPath,
1013
type ObserveRecoveryDeps,
1114
} from "./io.observe-recovery.js";
15+
import type { ConfigFileSnapshot } from "./types.js";
1216

1317
describe("config observe recovery", () => {
1418
let fixtureRoot = "";
@@ -33,6 +37,26 @@ describe("config observe recovery", () => {
3337
await fsp.writeFile(configPath, `${JSON.stringify(config, null, 2)}\n`, "utf-8");
3438
}
3539

40+
async function makeSnapshot(configPath: string, config: Record<string, unknown>) {
41+
const raw = `${JSON.stringify(config, null, 2)}\n`;
42+
await fsp.mkdir(path.dirname(configPath), { recursive: true });
43+
await fsp.writeFile(configPath, raw, "utf-8");
44+
return {
45+
path: configPath,
46+
exists: true,
47+
raw,
48+
parsed: config,
49+
sourceConfig: config,
50+
resolved: config,
51+
valid: true,
52+
runtimeConfig: config,
53+
config,
54+
issues: [],
55+
warnings: [],
56+
legacyIssues: [],
57+
} satisfies ConfigFileSnapshot;
58+
}
59+
3660
function makeDeps(
3761
home: string,
3862
warn = vi.fn(),
@@ -158,4 +182,65 @@ describe("config observe recovery", () => {
158182
expect(observe?.lastKnownGoodIno ?? null).toBeNull();
159183
});
160184
});
185+
186+
it("promotes a valid startup config and restores it after an invalid direct edit", async () => {
187+
await withSuiteHome(async (home) => {
188+
const { deps, configPath, auditPath, warn } = makeDeps(home);
189+
const snapshot = await makeSnapshot(configPath, {
190+
gateway: { mode: "local", auth: { mode: "token", token: "secret-token" } },
191+
channels: { discord: { enabled: true, dmPolicy: "pairing" } },
192+
});
193+
194+
await expect(
195+
promoteConfigSnapshotToLastKnownGood({ deps, snapshot, logger: deps.logger }),
196+
).resolves.toBe(true);
197+
await expect(fsp.readFile(resolveLastKnownGoodConfigPath(configPath), "utf-8")).resolves.toBe(
198+
snapshot.raw,
199+
);
200+
201+
const brokenRaw = "{ gateway: { mode: 123 } }\n";
202+
await fsp.writeFile(configPath, brokenRaw, "utf-8");
203+
const restored = await recoverConfigFromLastKnownGood({
204+
deps,
205+
snapshot: {
206+
...snapshot,
207+
raw: brokenRaw,
208+
parsed: { gateway: { mode: 123 } },
209+
valid: false,
210+
issues: [{ path: "gateway.mode", message: "Expected string" }],
211+
},
212+
reason: "test-invalid-config",
213+
});
214+
215+
expect(restored).toBe(true);
216+
await expect(fsp.readFile(configPath, "utf-8")).resolves.toBe(snapshot.raw);
217+
expect(warn).toHaveBeenCalledWith(
218+
expect.stringContaining("Config auto-restored from last-known-good:"),
219+
);
220+
const lines = (await fsp.readFile(auditPath, "utf-8")).trim().split("\n").filter(Boolean);
221+
const observe = lines
222+
.map((line) => JSON.parse(line) as Record<string, unknown>)
223+
.findLast((line) => line.event === "config.observe");
224+
expect(observe?.restoredFromBackup).toBe(true);
225+
expect(observe?.restoredBackupPath).toBe(resolveLastKnownGoodConfigPath(configPath));
226+
});
227+
});
228+
229+
it("refuses to promote redacted secret placeholders", async () => {
230+
await withSuiteHome(async (home) => {
231+
const warn = vi.fn();
232+
const { deps, configPath } = makeDeps(home, warn);
233+
const snapshot = await makeSnapshot(configPath, {
234+
gateway: { mode: "local", auth: { mode: "token", token: "***" } },
235+
});
236+
237+
await expect(
238+
promoteConfigSnapshotToLastKnownGood({ deps, snapshot, logger: deps.logger }),
239+
).resolves.toBe(false);
240+
await expect(fsp.stat(resolveLastKnownGoodConfigPath(configPath))).rejects.toThrow();
241+
expect(warn).toHaveBeenCalledWith(
242+
expect.stringContaining("Config last-known-good promotion skipped"),
243+
);
244+
});
245+
});
161246
});

0 commit comments

Comments
 (0)