Skip to content

fix: add retry logic for Windows EPERM errors in atomic file writes#58163

Open
tikitoki wants to merge 3 commits intoopenclaw:mainfrom
tikitoki:fix-52093
Open

fix: add retry logic for Windows EPERM errors in atomic file writes#58163
tikitoki wants to merge 3 commits intoopenclaw:mainfrom
tikitoki:fix-52093

Conversation

@tikitoki
Copy link
Copy Markdown

  • Fix Windows EPERM error on atomic file writes
  • Add exponential backoff retry logic for file lock race conditions
  • Handle concurrent writes to pending.json on Windows platforms

fixed #52093
dont know if this is too aggressive to have 5 retries and maximum 3100ms retry time,maybe need core member to tweak a bit

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 31, 2026

Greptile Summary

This PR adds Windows-specific exponential-backoff retry logic to writeTextAtomic in src/infra/json-files.ts to handle EPERM errors that arise from file-lock race conditions (e.g., concurrent writes to pending.json). The core approach — wrapping the atomic write in a bounded loop, sleeping between retries, and cleaning up temp files via the existing finally block — is sound and correct.

Key observations:

  • The retry guard (IS_WINDOWS && err.code === \"EPERM\") is correctly scoped to Windows only; non-Windows paths continue to fail fast.
  • The finally block correctly cleans up the unique-named temp file on every iteration, preventing orphaned files.
  • The retry count (5 retries, ~3,100 ms total) and the bounded loop exit are implemented correctly with no infinite-loop risk.

Issues worth addressing before or after merge:

  • fs.mkdir is called at the top of every loop iteration but sits outside the try/catch, so a transient directory EPERM on retry bypasses the retry logic. Moving it before the loop (or inside the try) would be cleaner.
  • The inline 2 ** attempt * RETRY_BASE_DELAY_MS formula duplicates computeBackoff in src/infra/backoff.ts and omits jitter — which matters when multiple concurrent writers hit EPERM simultaneously and could all wake at the same interval.
  • The loop bound uses two different formulations of the same ceiling (attempt <= MAX_RETRIES + 1 vs attempt < MAX_RETRIES + 1) which makes off-by-one reasoning harder than necessary.

Confidence Score: 5/5

Safe to merge — the retry logic is correct and bounded; all findings are style/improvement suggestions (P2).

All issues found are P2: mkdir placement redundancy, missing jitter via the existing backoff.ts utility, and a confusing (but correct) loop-bound expression. None of these cause incorrect behavior or data loss. The critical path — EPERM detection, temp-file cleanup, bounded retries, and non-Windows fast-fail — is implemented correctly.

No files require special attention; all findings are in src/infra/json-files.ts and are non-blocking.

Important Files Changed

Filename Overview
src/infra/json-files.ts Wraps the atomic write body in a retry loop with exponential backoff gated on IS_WINDOWS + EPERM. Logic is correct overall but has minor issues: mkdir runs on every retry outside the try-catch, the delay formula duplicates the existing backoff.ts utility (without jitter), and the loop bounds use two different expressions for the same ceiling.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/infra/json-files.ts
Line: 45-46

Comment:
**`mkdir` called on every retry, outside the try-catch**

`fs.mkdir` is called at the top of each loop iteration but sits outside the `try/catch` block. This means:

1. It runs redundantly on every retry — `mkdir({ recursive: true })` on an already-existing directory is a no-op but still involves a syscall.
2. If `mkdir` itself throws (e.g., an EPERM on the parent directory), that error bypasses the retry logic entirely.

Consider moving `mkdir` before the loop so it runs only once, or moving it inside the `try` block so EPERM on directory creation is also retried:

```ts
await fs.mkdir(parentDir, mkdirOptions);
for (let attempt = 1; attempt <= MAX_RETRIES + 1; attempt++) {
  const tmp = `${filePath}.${randomUUID()}.tmp`;
  try {
    // ...
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/infra/json-files.ts
Line: 85-88

Comment:
**Existing `backoff.ts` utility not reused; no jitter**

The codebase already ships `computeBackoff` + `sleepWithAbort` in `src/infra/backoff.ts`. The inline formula `2 ** attempt * RETRY_BASE_DELAY_MS` duplicates that logic and omits jitter, which can cause retry storms when multiple concurrent writers all hit EPERM simultaneously and wake up at exactly the same interval.

Reusing the existing helper would both reduce duplication and add randomised spread:

```ts
import { computeBackoff, sleepWithAbort } from "./backoff.js";

const EPERM_BACKOFF: BackoffPolicy = {
  initialMs: 50,
  maxMs: 1_600,
  factor: 2,
  jitter: 0.25,
};

// inside catch:
const delay = computeBackoff(EPERM_BACKOFF, attempt);
await sleepWithAbort(delay);
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/infra/json-files.ts
Line: 45-86

Comment:
**Confusing loop bounds — `attempt < MAX_RETRIES + 1` vs `attempt <= MAX_RETRIES`**

The loop guard is `attempt <= MAX_RETRIES + 1` and the retry condition inside `catch` is `attempt < MAX_RETRIES + 1`. These are correct but two different ways to express the same bound, which makes it harder to reason about off-by-one behaviour at a glance.

A clearer pattern:

```ts
for (let attempt = 0; attempt < MAX_RETRIES + 1; attempt++) {
  // ...
  if (IS_WINDOWS && err.code === "EPERM" && attempt < MAX_RETRIES) {
    // retry
  }
  throw err;
}
```

Or simply rename `MAX_RETRIES` to `MAX_ATTEMPTS` and drive everything off the same constant.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix: add retry logic for Windows EPERM e..." | Re-trigger Greptile

Comment on lines +45 to +46
for (let attempt = 1; attempt <= MAX_RETRIES + 1; attempt++) {
await fs.mkdir(parentDir, mkdirOptions);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 mkdir called on every retry, outside the try-catch

fs.mkdir is called at the top of each loop iteration but sits outside the try/catch block. This means:

  1. It runs redundantly on every retry — mkdir({ recursive: true }) on an already-existing directory is a no-op but still involves a syscall.
  2. If mkdir itself throws (e.g., an EPERM on the parent directory), that error bypasses the retry logic entirely.

Consider moving mkdir before the loop so it runs only once, or moving it inside the try block so EPERM on directory creation is also retried:

await fs.mkdir(parentDir, mkdirOptions);
for (let attempt = 1; attempt <= MAX_RETRIES + 1; attempt++) {
  const tmp = `${filePath}.${randomUUID()}.tmp`;
  try {
    // ...
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/infra/json-files.ts
Line: 45-46

Comment:
**`mkdir` called on every retry, outside the try-catch**

`fs.mkdir` is called at the top of each loop iteration but sits outside the `try/catch` block. This means:

1. It runs redundantly on every retry — `mkdir({ recursive: true })` on an already-existing directory is a no-op but still involves a syscall.
2. If `mkdir` itself throws (e.g., an EPERM on the parent directory), that error bypasses the retry logic entirely.

Consider moving `mkdir` before the loop so it runs only once, or moving it inside the `try` block so EPERM on directory creation is also retried:

```ts
await fs.mkdir(parentDir, mkdirOptions);
for (let attempt = 1; attempt <= MAX_RETRIES + 1; attempt++) {
  const tmp = `${filePath}.${randomUUID()}.tmp`;
  try {
    // ...
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved fs.mkdir outside the retry loop in the latest commit

Comment on lines +85 to +88
err.code === "EPERM" &&
attempt < MAX_RETRIES + 1
) {
const delay = 2 ** attempt * RETRY_BASE_DELAY_MS;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Existing backoff.ts utility not reused; no jitter

The codebase already ships computeBackoff + sleepWithAbort in src/infra/backoff.ts. The inline formula 2 ** attempt * RETRY_BASE_DELAY_MS duplicates that logic and omits jitter, which can cause retry storms when multiple concurrent writers all hit EPERM simultaneously and wake up at exactly the same interval.

Reusing the existing helper would both reduce duplication and add randomised spread:

import { computeBackoff, sleepWithAbort } from "./backoff.js";

const EPERM_BACKOFF: BackoffPolicy = {
  initialMs: 50,
  maxMs: 1_600,
  factor: 2,
  jitter: 0.25,
};

// inside catch:
const delay = computeBackoff(EPERM_BACKOFF, attempt);
await sleepWithAbort(delay);
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/infra/json-files.ts
Line: 85-88

Comment:
**Existing `backoff.ts` utility not reused; no jitter**

The codebase already ships `computeBackoff` + `sleepWithAbort` in `src/infra/backoff.ts`. The inline formula `2 ** attempt * RETRY_BASE_DELAY_MS` duplicates that logic and omits jitter, which can cause retry storms when multiple concurrent writers all hit EPERM simultaneously and wake up at exactly the same interval.

Reusing the existing helper would both reduce duplication and add randomised spread:

```ts
import { computeBackoff, sleepWithAbort } from "./backoff.js";

const EPERM_BACKOFF: BackoffPolicy = {
  initialMs: 50,
  maxMs: 1_600,
  factor: 2,
  jitter: 0.25,
};

// inside catch:
const delay = computeBackoff(EPERM_BACKOFF, attempt);
await sleepWithAbort(delay);
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reused backoff.ts utility instead of inline retry logic in the latest commit

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in commit: Fix retry logic and backoff delay in json-files.ts

Reuse backoff.ts utility instead of inline retry logic
Move fs.mkdir outside the retry loop
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Windows: EPERM error on atomic write to devices/pending.json (file lock race condition)

1 participant