Storage resilience: atomic writes, safer temp cleanup, repair/restore tools

Summary
OpenCode storage could be corrupted on crashes or disk-full mid-write. This proposes atomic writes, a safe repair routine, and a restore path to recover.

Problem
- Direct writes risk partial/truncated JSON.
- Leftover temps accumulate; repair lacked coarse-grained coordination and reporting.
- Operators need dry-run and limits before touching data at scale.

Proposal
- Atomic write: temp + fsync(file) + rename + fsync(dir).
- Repair: dry-run, prefix/limits, safe temp cleanup (.oc-*.tmp), skip-when-locked, JSON report.
- Restore: bring files back from quarantine preserving structure.
- Tests for dry-run/restore and temp cleanup.

Non-goals
- Schema-level validation or semantic corruption detection.
- Cross-FS transactional guarantees (e.g., NFS/Windows beyond documented best effort).
- Retention policy for quarantine (can be follow-up).

Risks/Trade-offs
- rename + fsync(dir) adds slight I/O overhead.
- Repair writes a report file (operationally useful).
- Try-lock may skip files busy during repair; reported as skipped.
- Portability: Bun APIs with Node fsync fallback; best-effort on non-POSIX FS.

Verification
- Unit tests added.
- Manual: run repair in a sandbox with XDG paths; confirm JSON report and quarantined files.

Open questions
- Retention policy for quarantine?
- Global maintenance lock for repair windows?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Storage resilience: atomic writes, safer temp cleanup, repair/restore tools #7733

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Storage resilience: atomic writes, safer temp cleanup, repair/restore tools #7733

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions