Skip to content

fix(config): serialize async config writes to prevent data loss on startup#40464

Open
sahancava wants to merge 14 commits intoopenclaw:mainfrom
sahancava:fix/40410-config-wipe-on-restart
Open

fix(config): serialize async config writes to prevent data loss on startup#40464
sahancava wants to merge 14 commits intoopenclaw:mainfrom
sahancava:fix/40410-config-wipe-on-restart

Conversation

@sahancava
Copy link
Copy Markdown
Contributor

@sahancava sahancava commented Mar 9, 2026

Summary

Describe the problem and fix in 2–5 bullets:

  • Problem: When OpenClaw Gateway restarts or encounters a connection error, the configuration file (openclaw.json) can be wiped or truncated down to a minimal skeleton (~10 lines), resulting in data loss.
  • Why it matters: This causes the Gateway to fail startup with a "Gateway start blocked" error and forces users to rebuild their configuration manually.
  • What changed:
    • Introduced a sequential configWriteQueue in src/config/io.ts to strictly serialize config disk writes.
    • Refactored the ownerDisplaySecret auto-persist routine to perform an atomic read-modify-write inside the queue lock, rather than computing a merge patch using a stale config snapshot.
    • Replaced .loadConfig() with .readConfigFileSnapshotForWrite() during secrets persistence to avoid permanently baking ephemeral runtime overrides.
    • Captured createConfigIO() and runtimeConfigSnapshot state before entering the write queue to prevent execution-time path or environment shifts.
    • Preserved retry semantics for auto-secrets by utilizing explicit rejection (throw Error) instead of silently returning on invalid snapshots.
    • Prevented queued writes from reviving cleared runtime snapshots by rigorously re-checking live global states (runtimeConfigSnapshot).
  • What did NOT change (scope boundary): The structure of OpenClawConfig, JSON patching mechanisms (merge patch), and general config parsing behavior remain untouched.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

User-visible / Behavior Changes

Users will no longer lose their ~/.openclaw/openclaw.json configuration file when restarting the Gateway or restarting their system. No defaults or config schemas were changed.

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: Linux (Ubuntu/Debian) / macOS
  • Runtime/container: Node.js 24.x
  • Model/provider: N/A
  • Integration/channel (if any): N/A
  • Relevant config (redacted): Any configuration with existing keys (e.g., gateway.port, gateway.auth)

Steps

  1. Wait for Gateway to start with a working, populated OpenClaw configuration file.
  2. Restart the system or restart the OpenClaw Gateway.
  3. Observe the ~/.openclaw/openclaw.json file.

Expected

  • The configuration file properties (such as existing auth configuration and port) remain intact.

Actual

  • Prior to this PR: The configuration file is overwritten with a nearly empty file containing only the ownerDisplaySecret and basic defaults.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

(Note: Added src/config/io.write-config-queue.test.ts to reproduce the race condition exactly. The tests pass with this fix).

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios: Ran targeted tests confirming sequential module-level writes do not corrupt configuration.
  • Edge cases checked: Simulated the exact ownerDisplaySecret pattern with concurrent writeConfigFile calls to verify the stale snapshot issue is fully mitigated by the internal queue lock.
  • What you did not verify: I did not manually verify the Windows scheduling behavior, as the fix resides at the platform-agnostic config file I/O layer.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No
  • If yes, exact upgrade steps: N/A

Failure Recovery (if this breaks)

  • How to disable/revert this change quickly: Revert this specific PR commit.
  • Files/config to restore: Any previously backed up openclaw.json.
  • Known bad symptoms reviewers should watch for: Deadlocks or hangs during early Gateway startup (if the write queue resolves incorrectly).

Risks and Mitigations

  • Risk: The write queue promise (configWriteQueue) could theoretically deadlock if a config write throws a catastrophic unhandled error without resolving.
    • Mitigation: The queue explicitly attaches a .finally() (via .then(success, failure)) to catch any thrown errors and properly advance the queue to the next write operation, ensuring no deadlocks ever occur.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 9, 2026

Greptile Summary

This PR fixes a config data-loss bug (#40410) where openclaw.json was truncated to ~10 lines on Gateway restart. The root cause was a concurrent read-modify-write race: the async ownerDisplaySecret auto-persist routine captured a stale config snapshot at load time, then the resulting write clobbered any concurrent writes (e.g., auth token bootstrap) that had landed on disk in the interim.

What changed:

  • A module-level configWriteQueue promise chain (enqueueConfigWrite) is introduced to serialize all writeConfigFile calls, preventing concurrent rename races.
  • The ownerDisplaySecret auto-persist routine now performs an atomic read-modify-write inside the queue (via freshIo.loadConfig() + freshIo.writeConfigFile()), so it always sees the latest disk state after any prior writes complete.
  • The module-level writeConfigFile export is wrapped in enqueueConfigWrite, so all callers automatically benefit from serialization with zero API changes.
  • A new test file (io.write-config-queue.test.ts) reproduces the original race condition and verifies the fix.

Concerns:

  • The rejection handler fn in configWriteQueue.then(fn, fn) is unreachable dead code because configWriteQueue is unconditionally kept as a resolved promise — consider simplifying to configWriteQueue.then(fn).
  • The ownerDisplaySecret path calls freshIo.writeConfigFile (instance method) which intentionally bypasses runtimeConfigSnapshot refresh and refreshHandler.refresh(). This is safe today but is an undocumented behavioral divergence from the previous code path that could silently break future callers relying on per-write snapshot/refresh guarantees.

Confidence Score: 4/5

  • This PR is safe to merge — it directly fixes a data-loss bug with a well-scoped, backward-compatible change. The queue implementation is correct and deadlock-free.
  • The core fix (serializing writes via enqueueConfigWrite and re-reading inside the queue for the ownerDisplaySecret path) is logically sound and correctly addresses the race condition. The two flagged concerns — the dead rejection handler and the undocumented bypass of runtime snapshot refresh — are documentation/style issues rather than correctness bugs in the current usage. No API contracts or config schemas changed, and the fix is confined to the config I/O layer.
  • Pay close attention to src/config/io.ts lines 826–831 (the ownerDisplaySecret inner-queue write path) if the refreshHandler contract is ever extended to require per-write notification.

Last reviewed commit: f1063a7

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: aa9b3c5bbe

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b5df3f4811

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@openclaw-barnacle openclaw-barnacle bot removed the cli CLI command changes label Mar 9, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1368ae87d6

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@sahancava sahancava force-pushed the fix/40410-config-wipe-on-restart branch from 920eaf4 to cc28c67 Compare March 9, 2026 03:36
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cc28c670d5

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@sahancava
Copy link
Copy Markdown
Contributor Author

The two failing CI checks appear unrelated to this PR.

This PR only changes:

  • src/config/io.ts
  • src/config/io.write-config-queue.test.ts

The failing checks are in other areas (including Windows/secrets-related tests), and I was able to reproduce them independently of this change. I kept this PR scoped to the config write bugfix only.

@sahancava
Copy link
Copy Markdown
Contributor Author

I also updated/synced the branch, and these two failures still appear unrelated to the config write fix in this PR.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9e0dc138c9

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@sahancava sahancava force-pushed the fix/40410-config-wipe-on-restart branch from ed62284 to 0a8155a Compare March 9, 2026 19:16
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 333c48d6ac

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@sahancava sahancava force-pushed the fix/40410-config-wipe-on-restart branch from f196dbf to 03f536e Compare March 10, 2026 00:11
@openclaw-barnacle openclaw-barnacle bot added channel: googlechat Channel integration: googlechat extensions: memory-core Extension: memory-core labels Mar 10, 2026
@sahancava sahancava force-pushed the fix/40410-config-wipe-on-restart branch from 9600691 to f442717 Compare March 10, 2026 03:16
@openclaw-barnacle openclaw-barnacle bot removed channel: googlechat Channel integration: googlechat extensions: memory-core Extension: memory-core labels Mar 10, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fb38add833

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@sahancava sahancava force-pushed the fix/40410-config-wipe-on-restart branch from 6eb520d to f2dad57 Compare March 10, 2026 04:56
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f2dad578a3

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@sahancava sahancava force-pushed the fix/40410-config-wipe-on-restart branch from f2dad57 to 6c50fc4 Compare March 10, 2026 05:06
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6c50fc473b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@sahancava sahancava force-pushed the fix/40410-config-wipe-on-restart branch from 6c50fc4 to a866885 Compare March 10, 2026 05:16
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 568295217f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@sahancava sahancava force-pushed the fix/40410-config-wipe-on-restart branch from 5682952 to 4923f71 Compare March 10, 2026 06:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Config file wiped on Gateway restart

1 participant