Skip to content

Storage resilience: atomic writes, safer temp cleanup, repair/restore tools #7733

@KakashiTech

Description

@KakashiTech

Summary
OpenCode storage could be corrupted on crashes or disk-full mid-write. This proposes atomic writes, a safe repair routine, and a restore path to recover.

Problem

  • Direct writes risk partial/truncated JSON.
  • Leftover temps accumulate; repair lacked coarse-grained coordination and reporting.
  • Operators need dry-run and limits before touching data at scale.

Proposal

  • Atomic write: temp + fsync(file) + rename + fsync(dir).
  • Repair: dry-run, prefix/limits, safe temp cleanup (.oc-*.tmp), skip-when-locked, JSON report.
  • Restore: bring files back from quarantine preserving structure.
  • Tests for dry-run/restore and temp cleanup.

Non-goals

  • Schema-level validation or semantic corruption detection.
  • Cross-FS transactional guarantees (e.g., NFS/Windows beyond documented best effort).
  • Retention policy for quarantine (can be follow-up).

Risks/Trade-offs

  • rename + fsync(dir) adds slight I/O overhead.
  • Repair writes a report file (operationally useful).
  • Try-lock may skip files busy during repair; reported as skipped.
  • Portability: Bun APIs with Node fsync fallback; best-effort on non-POSIX FS.

Verification

  • Unit tests added.
  • Manual: run repair in a sandbox with XDG paths; confirm JSON report and quarantined files.

Open questions

  • Retention policy for quarantine?
  • Global maintenance lock for repair windows?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions