Skip to content

fix: add retry logic for file lock errors (EBUSY, EACCES, EPERM)#1

Merged
techfitmaster merged 5 commits intomainfrom
fix/issue-39446
Mar 8, 2026
Merged

fix: add retry logic for file lock errors (EBUSY, EACCES, EPERM)#1
techfitmaster merged 5 commits intomainfrom
fix/issue-39446

Conversation

@techfitmaster
Copy link
Copy Markdown
Owner

Summary

Handle file lock errors gracefully when .openclaw directory is synced via cloud storage services (OneDrive, Dropbox, Google Drive, Baidu Netdisk, etc.).

When cloud sync services lock files temporarily during upload/download, OpenClaw Gateway was crashing with unhandled EBUSY, EACCES, or EPERM errors.

Changes

  • Add retry logic (up to 3 attempts) with exponential backoff to writeTextAtomic function in src/infra/json-files.ts
  • Retry on EBUSY, EACCES, and EPERM errors
  • This affects all session store writes, transcript writes, and config writes

Testing

  • Manual test: Verified retry logic works with mocked file lock errors

Related Issue

Fixes openclaw#39446

steipete and others added 5 commits March 8, 2026 04:03
Handle file lock errors gracefully when .openclaw directory is synced
via cloud storage services (OneDrive, Dropbox, Google Drive, etc.).
Retry up to 3 times with exponential backoff before failing.

Fixes openclaw#39446
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the stability of file operations by introducing robust retry mechanisms for common file lock errors, which frequently occur with cloud synchronization services. It also streamlines the MiniMax model integration by deprecating and removing an outdated model, ensuring consistency across the application and its documentation. Furthermore, the changes include refinements to the Docker build and testing environment, improving reliability and clarity for development workflows.

Highlights

  • File Operation Robustness: Implemented retry logic with exponential backoff for atomic file write operations to gracefully handle temporary file lock errors (EBUSY, EACCES, EPERM), preventing crashes when cloud storage services interfere.
  • MiniMax Model Deprecation: Removed all references to the deprecated MiniMax-M2.5-Lightning model from built-in provider catalogs, onboarding metadata, documentation, and test configurations, standardizing on MiniMax-M2.5-highspeed.
  • Build and Test Environment Improvements: Enhanced Docker build processes by activating the exact pinned package manager and refined live-test runner scripts to improve source staging and overall reliability of development workflows.
Changelog
  • CHANGELOG.md
    • Added an entry for stopping the advertisement of the removed MiniMax-M2.5-Lightning model in built-in provider catalogs, onboarding metadata, and docs.
Activity
  • The author manually tested the retry logic with mocked file lock errors.
  • This pull request addresses and fixes issue openclaw/openclaw#39446.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several improvements. The main change is the addition of retry logic with backoff for file write operations, which will gracefully handle file lock errors from cloud sync services. This is a solid improvement for robustness.

The PR also includes a number of other changes:

  • Removal of the deprecated MiniMax-M2.5-Lightning model across the codebase.
  • Refinements to the build scripts to control verbosity and reduce duplication.
  • Updates to the Dockerfile and testing scripts to improve the containerized testing workflow.

The changes are well-implemented. I have a couple of suggestions for the retry logic in src/infra/json-files.ts to make it even more robust and align it with the description.

lastError = err as Error;
const errWithCode = err as { code?: string };
if (attempt < MAX_RETRIES - 1 && FILE_LOCK_ERRORS.has(errWithCode.code ?? "")) {
await new Promise((resolve) => setTimeout(resolve, RETRY_DELAY_MS * (attempt + 1)));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The pull request description mentions "exponential backoff", but this implementation uses a linear backoff (100ms, 200ms, etc.). For true exponential backoff, you could use Math.pow(2, attempt). This can be more effective if lock contention lasts longer than a few hundred milliseconds.

Suggested change
await new Promise((resolve) => setTimeout(resolve, RETRY_DELAY_MS * (attempt + 1)));
await new Promise((resolve) => setTimeout(resolve, RETRY_DELAY_MS * (2 ** attempt)));

} finally {
await fs.rm(tmp, { force: true }).catch(() => undefined);
}
throw lastError;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If MAX_RETRIES were configured to be 0, this line would be reached with lastError as undefined, causing an undefined value to be thrown. While MAX_RETRIES is currently 3, it's safer to handle this edge case to prevent future issues.

Suggested change
throw lastError;
throw lastError ?? new Error(`Failed to write file after ${MAX_RETRIES} attempts.`);

@techfitmaster techfitmaster merged commit e84266e into main Mar 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Gateway crashes with EBUSY errors when .openclaw is synced via cloud storage

2 participants