fix: add retry logic for Windows EPERM errors in atomic file writes#58163
fix: add retry logic for Windows EPERM errors in atomic file writes#58163tikitoki wants to merge 3 commits intoopenclaw:mainfrom
Conversation
Greptile SummaryThis PR adds Windows-specific exponential-backoff retry logic to Key observations:
Issues worth addressing before or after merge:
Confidence Score: 5/5Safe to merge — the retry logic is correct and bounded; all findings are style/improvement suggestions (P2). All issues found are P2: mkdir placement redundancy, missing jitter via the existing backoff.ts utility, and a confusing (but correct) loop-bound expression. None of these cause incorrect behavior or data loss. The critical path — EPERM detection, temp-file cleanup, bounded retries, and non-Windows fast-fail — is implemented correctly. No files require special attention; all findings are in src/infra/json-files.ts and are non-blocking.
|
| Filename | Overview |
|---|---|
| src/infra/json-files.ts | Wraps the atomic write body in a retry loop with exponential backoff gated on IS_WINDOWS + EPERM. Logic is correct overall but has minor issues: mkdir runs on every retry outside the try-catch, the delay formula duplicates the existing backoff.ts utility (without jitter), and the loop bounds use two different expressions for the same ceiling. |
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/infra/json-files.ts
Line: 45-46
Comment:
**`mkdir` called on every retry, outside the try-catch**
`fs.mkdir` is called at the top of each loop iteration but sits outside the `try/catch` block. This means:
1. It runs redundantly on every retry — `mkdir({ recursive: true })` on an already-existing directory is a no-op but still involves a syscall.
2. If `mkdir` itself throws (e.g., an EPERM on the parent directory), that error bypasses the retry logic entirely.
Consider moving `mkdir` before the loop so it runs only once, or moving it inside the `try` block so EPERM on directory creation is also retried:
```ts
await fs.mkdir(parentDir, mkdirOptions);
for (let attempt = 1; attempt <= MAX_RETRIES + 1; attempt++) {
const tmp = `${filePath}.${randomUUID()}.tmp`;
try {
// ...
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/infra/json-files.ts
Line: 85-88
Comment:
**Existing `backoff.ts` utility not reused; no jitter**
The codebase already ships `computeBackoff` + `sleepWithAbort` in `src/infra/backoff.ts`. The inline formula `2 ** attempt * RETRY_BASE_DELAY_MS` duplicates that logic and omits jitter, which can cause retry storms when multiple concurrent writers all hit EPERM simultaneously and wake up at exactly the same interval.
Reusing the existing helper would both reduce duplication and add randomised spread:
```ts
import { computeBackoff, sleepWithAbort } from "./backoff.js";
const EPERM_BACKOFF: BackoffPolicy = {
initialMs: 50,
maxMs: 1_600,
factor: 2,
jitter: 0.25,
};
// inside catch:
const delay = computeBackoff(EPERM_BACKOFF, attempt);
await sleepWithAbort(delay);
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/infra/json-files.ts
Line: 45-86
Comment:
**Confusing loop bounds — `attempt < MAX_RETRIES + 1` vs `attempt <= MAX_RETRIES`**
The loop guard is `attempt <= MAX_RETRIES + 1` and the retry condition inside `catch` is `attempt < MAX_RETRIES + 1`. These are correct but two different ways to express the same bound, which makes it harder to reason about off-by-one behaviour at a glance.
A clearer pattern:
```ts
for (let attempt = 0; attempt < MAX_RETRIES + 1; attempt++) {
// ...
if (IS_WINDOWS && err.code === "EPERM" && attempt < MAX_RETRIES) {
// retry
}
throw err;
}
```
Or simply rename `MAX_RETRIES` to `MAX_ATTEMPTS` and drive everything off the same constant.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "fix: add retry logic for Windows EPERM e..." | Re-trigger Greptile
src/infra/json-files.ts
Outdated
| for (let attempt = 1; attempt <= MAX_RETRIES + 1; attempt++) { | ||
| await fs.mkdir(parentDir, mkdirOptions); |
There was a problem hiding this comment.
mkdir called on every retry, outside the try-catch
fs.mkdir is called at the top of each loop iteration but sits outside the try/catch block. This means:
- It runs redundantly on every retry —
mkdir({ recursive: true })on an already-existing directory is a no-op but still involves a syscall. - If
mkdiritself throws (e.g., an EPERM on the parent directory), that error bypasses the retry logic entirely.
Consider moving mkdir before the loop so it runs only once, or moving it inside the try block so EPERM on directory creation is also retried:
await fs.mkdir(parentDir, mkdirOptions);
for (let attempt = 1; attempt <= MAX_RETRIES + 1; attempt++) {
const tmp = `${filePath}.${randomUUID()}.tmp`;
try {
// ...Prompt To Fix With AI
This is a comment left during a code review.
Path: src/infra/json-files.ts
Line: 45-46
Comment:
**`mkdir` called on every retry, outside the try-catch**
`fs.mkdir` is called at the top of each loop iteration but sits outside the `try/catch` block. This means:
1. It runs redundantly on every retry — `mkdir({ recursive: true })` on an already-existing directory is a no-op but still involves a syscall.
2. If `mkdir` itself throws (e.g., an EPERM on the parent directory), that error bypasses the retry logic entirely.
Consider moving `mkdir` before the loop so it runs only once, or moving it inside the `try` block so EPERM on directory creation is also retried:
```ts
await fs.mkdir(parentDir, mkdirOptions);
for (let attempt = 1; attempt <= MAX_RETRIES + 1; attempt++) {
const tmp = `${filePath}.${randomUUID()}.tmp`;
try {
// ...
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Moved fs.mkdir outside the retry loop in the latest commit
src/infra/json-files.ts
Outdated
| err.code === "EPERM" && | ||
| attempt < MAX_RETRIES + 1 | ||
| ) { | ||
| const delay = 2 ** attempt * RETRY_BASE_DELAY_MS; |
There was a problem hiding this comment.
Existing
backoff.ts utility not reused; no jitter
The codebase already ships computeBackoff + sleepWithAbort in src/infra/backoff.ts. The inline formula 2 ** attempt * RETRY_BASE_DELAY_MS duplicates that logic and omits jitter, which can cause retry storms when multiple concurrent writers all hit EPERM simultaneously and wake up at exactly the same interval.
Reusing the existing helper would both reduce duplication and add randomised spread:
import { computeBackoff, sleepWithAbort } from "./backoff.js";
const EPERM_BACKOFF: BackoffPolicy = {
initialMs: 50,
maxMs: 1_600,
factor: 2,
jitter: 0.25,
};
// inside catch:
const delay = computeBackoff(EPERM_BACKOFF, attempt);
await sleepWithAbort(delay);Prompt To Fix With AI
This is a comment left during a code review.
Path: src/infra/json-files.ts
Line: 85-88
Comment:
**Existing `backoff.ts` utility not reused; no jitter**
The codebase already ships `computeBackoff` + `sleepWithAbort` in `src/infra/backoff.ts`. The inline formula `2 ** attempt * RETRY_BASE_DELAY_MS` duplicates that logic and omits jitter, which can cause retry storms when multiple concurrent writers all hit EPERM simultaneously and wake up at exactly the same interval.
Reusing the existing helper would both reduce duplication and add randomised spread:
```ts
import { computeBackoff, sleepWithAbort } from "./backoff.js";
const EPERM_BACKOFF: BackoffPolicy = {
initialMs: 50,
maxMs: 1_600,
factor: 2,
jitter: 0.25,
};
// inside catch:
const delay = computeBackoff(EPERM_BACKOFF, attempt);
await sleepWithAbort(delay);
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Reused backoff.ts utility instead of inline retry logic in the latest commit
There was a problem hiding this comment.
fixed in commit: Fix retry logic and backoff delay in json-files.ts
Reuse backoff.ts utility instead of inline retry logic Move fs.mkdir outside the retry loop
fixed #52093
dont know if this is too aggressive to have 5 retries and maximum 3100ms retry time,maybe need core member to tweak a bit