Skip to content

server: legacy flock uses LOCK_EX so two 0.1.26 instances can't coexist (should be LOCK_SH) #444

@memtomem

Description

@memtomem

Problem

Running memtomem-server in one Claude Code session blocks all other sessions from connecting. Observed on 0.1.26.

Repro:

  1. Open Claude Code in project A. memtomem connects ✓
  2. Open Claude Code in project B. memtomemFailed to connect
  3. pkill -f memtomem-server; rm -f ~/.memtomem/.server.pid kills everything
  4. Reconnect in B → ✓. Reconnect in A → Failed to connect

Only one session can hold the connection at a time.

Root cause

_try_hold_legacy_flock in server/__init__.py:230-240 uses LOCK_EX (exclusive) on ~/.memtomem/.server.pid, and on failure calls sys.exit(1):

fcntl.flock(legacy_fp, fcntl.LOCK_EX | fcntl.LOCK_NB)

This is asymmetric with the XDG path (line 274-288) which treats EX contention as "another instance running — warn and continue." The legacy path was designed as a #412 B1 cross-version mutex against pre-0.1.25 servers (whose data model could corrupt WAL if run concurrently with a 0.1.26+ server). But because it uses exclusive locking, it also blocks two 0.1.26 instances from coexisting — a side effect, not the intent.

Intended: "0.1.26 ⊗ pre-0.1.25 (no concurrent run)."
Actual: "0.1.26 ⊗ pre-0.1.25" AND "0.1.26 ⊗ 0.1.26" — the second breaks multi-project usage.

Fix

Change the legacy flock to LOCK_SH (shared). Shared locks compose with other shared locks but still conflict with exclusive locks, so:

  • Multiple 0.1.26 instances each hold LOCK_SH → coexist fine.
  • Pre-0.1.25 server (which uses LOCK_EX on the same path, per server: relocate .server.pid to $XDG_RUNTIME_DIR so ~/.memtomem/ stays lazy #412 docstring line 186-187) tries to acquire → fails → pre-0.1.25 exits. ✓ cross-version mutex preserved.
  • Reverse case: 0.1.26 starts when pre-0.1.25 already holds LOCK_EX → 0.1.26's LOCK_SH fails → 0.1.26 exits. ✓ still safe.

Also soften the exit path on flock failure — log a warning and return None (skip), mirroring the XDG path's "proceed with a warning" behavior. The pre-0.1.25 exit side of the mutex is preserved by the pre-0.1.25 process's own LOCK_EX logic.

Test plan

  • New unit test: two in-process fcntl.flock(…, LOCK_SH) on the same legacy file both succeed.
  • New integration test: two memtomem-server subprocesses started back-to-back both reach the ready state, both write pid files, both exit cleanly on SIGTERM.
  • Existing test_server_refuses_when_legacy_lock_held must be updated: the "pre-0.1.25 holder" simulator must use LOCK_EX (which is what real pre-0.1.25 did), and the new server's LOCK_SH attempt must still fail (so the existing mutex test still pins cross-version protection).

Scope

Small. _try_hold_legacy_flock flags + exit-path change + docstring update + 2-3 tests. CHANGELOG entry. Target v0.1.27.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions