Skip to content

memory-core sync emits raw EAGAIN (errno -11) on iCloud/FileProvider-backed reads instead of retry-with-backoff #85252

@richardmqq

Description

@richardmqq

Summary

memory-core plugin's sync path emits Unknown system error -11: Unknown system error -11, read (errno -11 = EAGAIN on Darwin) when reading mmap-backed memory state files that live on an iCloud Drive / FileProvider-backed path. The error is raw and not retried; the affected operation (session-start, session-startup-catchup, search, watch, session-delta) emits a WARN memory sync failed and aborts.

Environment

  • OpenClaw: 2026.5.20 (/opt/homebrew/lib/node_modules/openclaw)
  • Node: 25.9.0
  • OS: macOS 15 (Darwin 25.4)
  • Memory backend uses ~/.openclaw/memory/*.sqlite plus markdown notes; some workspaces have RMObsidian at ~/Library/Mobile Documents/iCloud~md~obsidian/Documents/RMObsidian/ mounted via iCloud FileProvider.

Observation

14 occurrences on 2026-05-22, clustered around iCloud sync windows after gateway warm-up:

2026-05-22T09:14:22  [WARN] memory sync failed (session-start): Error: Unknown system error -11: Unknown system error -11, read
2026-05-22T09:14:23  [WARN] memory sync failed (session-startup-catchup): ...
2026-05-22T09:15:09  [ERROR] compileMemoryWikiVault → lintMemoryWikiVault: Error: Unknown system error -11: Unknown system error -11, read
2026-05-22T09:31:51  [WARN] memory sync failed (session-delta): ...
2026-05-22T12:41:39  [WARN] memory sync failed (search): ...        (×3)
2026-05-22T13:02:04  [WARN] memory sync failed (session-delta): ...

0 occurrences the previous day (2026-05-21). Today's pattern correlates with a gateway restart at 09:33 plus heavy memory activity at 12:41 (librarian wiki compile) and 13:02 (post stuck-session recovery).

Why this matters

  • compileMemoryWikiVault failures cascade into a broken librarian pipeline (today's librarian wiki ingest could not proceed).
  • session-start failures mean a session may begin without its prior memory snapshot, leading to inconsistent agent state and possible amnesia for the user's reference context.
  • The raw Unknown system error -11 is opaque — EAGAIN on a read() of an iCloud file is recoverable by retry-with-backoff but the current code surfaces it as a hard error.

Suspected cause

read() against an iCloud/FileProvider page that is currently being fetched / verified by the macOS file coordinator returns EAGAIN instead of blocking. Node wraps it as Unknown system error -11. Most local filesystem reads do not produce EAGAIN, so the surrounding code paths assume a read() either succeeds or fails terminally.

Related, not duplicates

  • openclaw/openclaw#68738readPageSummaries: no concurrency limit on fs.readFile, triggers EDEADLK on iCloud/FileProvider-backed vaults (P2). Same family (iCloud-induced read failures), different errno (EDEADLK -35 vs my EAGAIN -11) and different call site (page summaries vs memory-core sync). Same underlying cause class: iCloud FileProvider returning non-success errnos on reads.
  • openclaw/openclaw#84817gateway install --force plugin load failure with Unknown system error -11 under launchd (P1). Same errno surfaces in a different code path. Worth a unified handling story.

Proposed fix shape

Wrap mmap-backed read() calls in memory-core with a bounded retry-with-backoff specifically for EAGAIN:

async function readWithEAGAINRetry(fd, buffer, offset, length, position, opts = {}) {
  const maxRetries = opts.maxRetries ?? 5;
  const baseDelayMs = opts.baseDelayMs ?? 25;
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await fs.read(fd, buffer, offset, length, position);
    } catch (err) {
      const isEAGAIN = err.errno === -11 || err.code === "EAGAIN" || err.code === "EWOULDBLOCK";
      if (!isEAGAIN || attempt === maxRetries - 1) throw err;
      await sleep(baseDelayMs * Math.pow(2, attempt));
    }
  }
}

Apply at every read() in dist/plugin-sdk/.../memory-core/sync.ts and the compileMemoryWikiVault / lintMemoryWikiVault paths.

Bonus: distinguish iCloud-backed paths up front (xattr -p com.apple.metadata:com_apple_backup_excludeItem or getattrlist against the file's volume) and warn on configuration so operators know their memory store is on a non-local FS.

Workaround

None currently — the warns are non-fatal for session-* syncs (next sync usually succeeds), but compileMemoryWikiVault failures interrupt the librarian pipeline and require a retry. Moving the memory store off iCloud is possible but not desirable here (the vault is intentionally synced).

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Normal backlog priority with limited blast radius.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions