-
Notifications
You must be signed in to change notification settings - Fork 14.7k
Storage resilience: atomic writes, safer temp cleanup, repair/restore tools #7733
Copy link
Copy link
Open
Description
Summary
OpenCode storage could be corrupted on crashes or disk-full mid-write. This proposes atomic writes, a safe repair routine, and a restore path to recover.
Problem
- Direct writes risk partial/truncated JSON.
- Leftover temps accumulate; repair lacked coarse-grained coordination and reporting.
- Operators need dry-run and limits before touching data at scale.
Proposal
- Atomic write: temp + fsync(file) + rename + fsync(dir).
- Repair: dry-run, prefix/limits, safe temp cleanup (.oc-*.tmp), skip-when-locked, JSON report.
- Restore: bring files back from quarantine preserving structure.
- Tests for dry-run/restore and temp cleanup.
Non-goals
- Schema-level validation or semantic corruption detection.
- Cross-FS transactional guarantees (e.g., NFS/Windows beyond documented best effort).
- Retention policy for quarantine (can be follow-up).
Risks/Trade-offs
- rename + fsync(dir) adds slight I/O overhead.
- Repair writes a report file (operationally useful).
- Try-lock may skip files busy during repair; reported as skipped.
- Portability: Bun APIs with Node fsync fallback; best-effort on non-POSIX FS.
Verification
- Unit tests added.
- Manual: run repair in a sandbox with XDG paths; confirm JSON report and quarantined files.
Open questions
- Retention policy for quarantine?
- Global maintenance lock for repair windows?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels