Fix/config restore baseline by davidangularme · Pull Request #75568 · openclaw/openclaw

davidangularme · 2026-05-01T09:41:10Z

Summary

Describe the problem and fix in 2–5 bullets:

If this PR fixes a plugin beta-release blocker, title it fix(<plugin-id>): beta blocker - <summary> and link the matching Beta blocker: <plugin-name> - <summary> issue labeled beta-blocker. Contributors cannot label PRs, so the title is the PR-side signal for maintainers and automation.

Problem:
Why it matters:
What changed:
What did NOT change (scope boundary):

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes #
Related #
This PR fixes a bug or regression

Root Cause (if applicable)

For bug fixes or regressions, explain why this happened, not just what changed. Otherwise write N/A. If the cause is unclear, write Unknown.

Root cause:
Missing detection / guardrail:
Contributing context (if known):

Regression Test Plan (if applicable)

For bug fixes or regressions, name the smallest reliable test coverage that should catch this. Otherwise write N/A.

Coverage level that should have caught this:
- Unit test
- Seam / integration test
- End-to-end test
- Existing coverage already sufficient
Target test or file:
Scenario the test should lock in:
Why this is the smallest reliable guardrail:
Existing test that already covers this (if any):
If no new test is added, why not:

User-visible / Behavior Changes

List user-visible changes (including defaults/config).
If none, write None.

Diagram (if applicable)

For UI changes or non-trivial logic flows, include a small ASCII diagram reviewers can scan quickly. Otherwise write N/A.

Before:
[user action] -> [old state]

After:
[user action] -> [new state] -> [result]

Security Impact (required)

New permissions/capabilities? (Yes/No)
Secrets/tokens handling changed? (Yes/No)
New/changed network calls? (Yes/No)
Command/tool execution surface changed? (Yes/No)
Data access scope changed? (Yes/No)
If any Yes, explain risk + mitigation:

Repro + Verification

Environment

OS:
Runtime/container:
Model/provider:
Integration/channel (if any):
Relevant config (redacted):

Steps

Expected

Actual

Evidence

Attach at least one:

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

Verified scenarios:
Edge cases checked:
What you did not verify:

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

Backward compatible? (Yes/No)
Config/env changes? (Yes/No)
Migration needed? (Yes/No)
If yes, exact upgrade steps:

Risks and Mitigations

List only real risks for this PR. Add/remove entries as needed. If none, write None.

Risk:
- Mitigation:

Previously a failed copyFile during suspicious-read recovery was swallowed by a bare catch, still logged as 'Config auto-restored from backup', and the audit record was written with valid: true — making a silent failure look like success. Capture the error, log a distinct failure warning with the message, set valid to restoredFromBackup, and persist restoreErrorCode / restoreErrorMessage on the observe audit record for both async and sync paths.

Review follow-ups on openclaw#70515: add the parallel failure-injection test for maybeRecoverSuspiciousConfigReadSync so the sync path has the same coverage, and move the captured error message inside the suspicious- reason parentheses in the failure warning so the line no longer reads 'failed: <path> (...): <msg>' (double colon) — it now reads 'failed: <path> (..., <msg>)' with a single trailing colon after 'failed'.

clawsweeper · 2026-05-01T09:42:24Z

Thanks for the context here. I swept through the related work, and this is now duplicate or superseded.

Close as duplicate/superseded. The branch repeats the restore-copy audit/logging work from closed #70515, while open canonical PR #75441 already includes the same cherry-picked commits, broader config observe-recovery write-failure coverage, and the required changelog credit.

So I’m closing this here and keeping the remaining discussion on the canonical linked item.

Review details

Best possible solution:

Keep #75441 as the canonical path: it should land or be closed after maintainer review, because it carries this restore-copy audit fix with broader observe-recovery write-failure handling and changelog coverage.

Do we have a high-confidence way to reproduce the issue?

Yes. The failure path is reproducible by injecting copyFile/copyFileSync failure while a valid .bak exists; current main swallows that error and still records a success log/audit, and the proposed tests exercise that branch.

Is this the best way to solve the issue?

No as a standalone PR. The safer maintainable solution is the canonical #75441 branch, which includes the same fix plus adjacent write-failure handling and release-note coverage.

Security review:

Security review cleared: The diff only changes config recovery logging/audit fields and tests; it does not add dependencies, workflow execution, permissions, network access, or secret-handling expansion.

What I checked:

Current main still has the original false-success path: On current main, copyFile/copyFileSync failures in suspicious-read recovery are swallowed, then the code still logs Config auto-restored from backup and writes valid: true in the observe audit record. (src/config/io.observe-recovery.ts:639, e8afaf512e87)
Current audit schema lacks the PR's new error fields: Current main's ConfigObserveAuditRecord ends at restoredBackupPath and has no restoreErrorCode/restoreErrorMessage fields. (src/config/io.audit.ts:227, e8afaf512e87)
This PR repeats the earlier restore-copy audit patch: The patch for this PR adds restoreErrorCode/restoreErrorMessage plus async and sync copy failure tests, matching the closed fix(config): surface backup restore copy failures in audit and logs #70515 work and commits listed in the provided PR context. (src/config/io.observe-recovery.ts:671, 90af89e425ee)
Canonical PR fix(config): log observe recovery write failures #75441 supersedes the work: The open fix(config): log observe recovery write failures #75441 body says it includes and supersedes fix(config): surface backup restore copy failures in audit and logs #70515; its patch cherry-picks the restore-copy audit commits and adds a changelog entry crediting @davidangularme and @sallyom. (CHANGELOG.md:67, 184f022c77a1)
Earlier matching PR is already closed unmerged: GitHub API data for fix(config): surface backup restore copy failures in audit and logs #70515 shows the same author and same restore-copy failure objective, with state closed and merged false. (6376ce2fb00e)

Likely related people:

steipete: GitHub path history shows major config observe-recovery commits including the critical clobber recovery implementation and later helper refactors; local blame on the current shallow checkout also routes the current lines through a recent Peter-authored main commit. (role: introduced behavior and recent maintainer; confidence: high; commits: 7b2c9a6fa3d3, 83801c49f7a3, ad3e4dbcce1a; files: src/config/io.observe-recovery.ts, src/config/io.observe-recovery.test.ts)
sallyom: Open fix(config): log observe recovery write failures #75441 is the canonical maintainer-labeled PR for this exact config recovery audit/logging work, and recent current-main config-audit history also lists @sallyom in review/co-author context. (role: canonical follow-up owner and recent adjacent maintainer; confidence: medium; commits: 4d6a824a36aa, 184f022c77a1, a853c5e8c26f; files: CHANGELOG.md, src/config/io.audit.ts, src/config/io.observe-recovery.ts)
jalehman: Recent merged config recovery work around avoiding plugin-local rollback touched the same observe-recovery area and is likely relevant for review routing if the canonical PR needs domain context. (role: recent config recovery maintainer; confidence: medium; commits: f369939fedd5; files: src/config/io.observe-recovery.ts, src/config/recovery-policy.ts, src/config/io.observe-recovery.test.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against e8afaf512e87.

davidangularme added 2 commits May 1, 2026 12:39

openclaw-barnacle Bot added size: S triage: blank-template Candidate: PR template appears mostly untouched. triage: refactor-only Candidate: refactor/cleanup-only PR without maintainer context. labels May 1, 2026

clawsweeper Bot closed this May 1, 2026

This was referenced May 1, 2026

fix(config): log observe recovery write failures #75441

Merged

fix(config): surface backup restore copy failures in audit and logs #70515

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix/config restore baseline#75568

Fix/config restore baseline#75568
davidangularme wants to merge 2 commits intoopenclaw:mainfrom
davidangularme:fix/config-restore-baseline

davidangularme commented May 1, 2026

Uh oh!

clawsweeper Bot commented May 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

davidangularme commented May 1, 2026

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Root Cause (if applicable)

Regression Test Plan (if applicable)

User-visible / Behavior Changes

Diagram (if applicable)

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification (required)

Review Conversations

Compatibility / Migration

Risks and Mitigations

Uh oh!

clawsweeper Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

clawsweeper Bot commented May 1, 2026 •

edited

Loading