Skip to content

Fix/config restore baseline#75568

Closed
davidangularme wants to merge 2 commits intoopenclaw:mainfrom
davidangularme:fix/config-restore-baseline
Closed

Fix/config restore baseline#75568
davidangularme wants to merge 2 commits intoopenclaw:mainfrom
davidangularme:fix/config-restore-baseline

Conversation

@davidangularme
Copy link
Copy Markdown
Contributor

Summary

Describe the problem and fix in 2–5 bullets:

If this PR fixes a plugin beta-release blocker, title it fix(<plugin-id>): beta blocker - <summary> and link the matching Beta blocker: <plugin-name> - <summary> issue labeled beta-blocker. Contributors cannot label PRs, so the title is the PR-side signal for maintainers and automation.

  • Problem:
  • Why it matters:
  • What changed:
  • What did NOT change (scope boundary):

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #
  • Related #
  • This PR fixes a bug or regression

Root Cause (if applicable)

For bug fixes or regressions, explain why this happened, not just what changed. Otherwise write N/A. If the cause is unclear, write Unknown.

  • Root cause:
  • Missing detection / guardrail:
  • Contributing context (if known):

Regression Test Plan (if applicable)

For bug fixes or regressions, name the smallest reliable test coverage that should catch this. Otherwise write N/A.

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file:
  • Scenario the test should lock in:
  • Why this is the smallest reliable guardrail:
  • Existing test that already covers this (if any):
  • If no new test is added, why not:

User-visible / Behavior Changes

List user-visible changes (including defaults/config).
If none, write None.

Diagram (if applicable)

For UI changes or non-trivial logic flows, include a small ASCII diagram reviewers can scan quickly. Otherwise write N/A.

Before:
[user action] -> [old state]

After:
[user action] -> [new state] -> [result]

Security Impact (required)

  • New permissions/capabilities? (Yes/No)
  • Secrets/tokens handling changed? (Yes/No)
  • New/changed network calls? (Yes/No)
  • Command/tool execution surface changed? (Yes/No)
  • Data access scope changed? (Yes/No)
  • If any Yes, explain risk + mitigation:

Repro + Verification

Environment

  • OS:
  • Runtime/container:
  • Model/provider:
  • Integration/channel (if any):
  • Relevant config (redacted):

Steps

Expected

Actual

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios:
  • Edge cases checked:
  • What you did not verify:

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

  • Backward compatible? (Yes/No)
  • Config/env changes? (Yes/No)
  • Migration needed? (Yes/No)
  • If yes, exact upgrade steps:

Risks and Mitigations

List only real risks for this PR. Add/remove entries as needed. If none, write None.

  • Risk:
    • Mitigation:

Previously a failed copyFile during suspicious-read recovery was swallowed by
a bare catch, still logged as 'Config auto-restored from backup', and the
audit record was written with valid: true — making a silent failure look
like success. Capture the error, log a distinct failure warning with the
message, set valid to restoredFromBackup, and persist restoreErrorCode /
restoreErrorMessage on the observe audit record for both async and sync
paths.
Review follow-ups on openclaw#70515: add the parallel failure-injection test for
maybeRecoverSuspiciousConfigReadSync so the sync path has the same
coverage, and move the captured error message inside the suspicious-
reason parentheses in the failure warning so the line no longer reads
'failed: <path> (...): <msg>' (double colon) — it now reads
'failed: <path> (..., <msg>)' with a single trailing colon after 'failed'.
@openclaw-barnacle openclaw-barnacle Bot added size: S triage: blank-template Candidate: PR template appears mostly untouched. triage: refactor-only Candidate: refactor/cleanup-only PR without maintainer context. labels May 1, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 1, 2026

Thanks for the context here. I swept through the related work, and this is now duplicate or superseded.

Close as duplicate/superseded. The branch repeats the restore-copy audit/logging work from closed #70515, while open canonical PR #75441 already includes the same cherry-picked commits, broader config observe-recovery write-failure coverage, and the required changelog credit.

So I’m closing this here and keeping the remaining discussion on the canonical linked item.

Review details

Best possible solution:

Keep #75441 as the canonical path: it should land or be closed after maintainer review, because it carries this restore-copy audit fix with broader observe-recovery write-failure handling and changelog coverage.

Do we have a high-confidence way to reproduce the issue?

Yes. The failure path is reproducible by injecting copyFile/copyFileSync failure while a valid .bak exists; current main swallows that error and still records a success log/audit, and the proposed tests exercise that branch.

Is this the best way to solve the issue?

No as a standalone PR. The safer maintainable solution is the canonical #75441 branch, which includes the same fix plus adjacent write-failure handling and release-note coverage.

Security review:

Security review cleared: The diff only changes config recovery logging/audit fields and tests; it does not add dependencies, workflow execution, permissions, network access, or secret-handling expansion.

What I checked:

Likely related people:

  • steipete: GitHub path history shows major config observe-recovery commits including the critical clobber recovery implementation and later helper refactors; local blame on the current shallow checkout also routes the current lines through a recent Peter-authored main commit. (role: introduced behavior and recent maintainer; confidence: high; commits: 7b2c9a6fa3d3, 83801c49f7a3, ad3e4dbcce1a; files: src/config/io.observe-recovery.ts, src/config/io.observe-recovery.test.ts)
  • sallyom: Open fix(config): log observe recovery write failures #75441 is the canonical maintainer-labeled PR for this exact config recovery audit/logging work, and recent current-main config-audit history also lists @sallyom in review/co-author context. (role: canonical follow-up owner and recent adjacent maintainer; confidence: medium; commits: 4d6a824a36aa, 184f022c77a1, a853c5e8c26f; files: CHANGELOG.md, src/config/io.audit.ts, src/config/io.observe-recovery.ts)
  • jalehman: Recent merged config recovery work around avoiding plugin-local rollback touched the same observe-recovery area and is likely relevant for review routing if the canonical PR needs domain context. (role: recent config recovery maintainer; confidence: medium; commits: f369939fedd5; files: src/config/io.observe-recovery.ts, src/config/recovery-policy.ts, src/config/io.observe-recovery.test.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against e8afaf512e87.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size: S triage: blank-template Candidate: PR template appears mostly untouched. triage: refactor-only Candidate: refactor/cleanup-only PR without maintainer context.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant