fix(server): legacy flock LOCK_EX → LOCK_SH so 0.1.26+ instances coexist (closes #444) by memtomem · Pull Request #445 · memtomem/memtomem

memtomem · 2026-04-24T01:53:50Z

Summary

Closes #444. Two memtomem-server instances (e.g. one per Claude Code session, across multiple projects) can now coexist. Previously the second one was killed by the legacy-flock probe.

Why

_try_hold_legacy_flock in server/__init__.py was intended as a #412 B1 cross-version mutex: stop a new 0.1.26+ server from running concurrently with a pre-0.1.25 server whose WAL model would corrupt shared state. It did this with LOCK_EX | LOCK_NB → sys.exit(1) on failure.

That correctly blocks pre-0.1.25 ⟂ 0.1.26, but as a side effect it also blocks 0.1.26 ⟂ 0.1.26. That's the user-visible bug: open Claude Code in project A → memtomem connects. Open Claude Code in project B → "Failed to connect", and the only way to recover A is to pkill -f memtomem-server; rm -f ~/.memtomem/.server.pid and switch sides.

The XDG path (the authoritative lock for the current generation) already treats contention as "warn and continue" (server/__init__.py:277-288). The legacy path just hadn't been updated to match.

Fix

Switch the legacy flock to LOCK_SH:

Scenario	Legacy flock interaction	Net effect
0.1.26 ⋈ 0.1.26	both take `LOCK_SH`	✅ coexist (the fix)
0.1.26 after pre-0.1.25	our `LOCK_SH` blocked by their `LOCK_EX` → we warn & fall through	✅ XDG path still lets us run, and our SH-less state doesn't break anything for them
pre-0.1.25 after 0.1.26	their `LOCK_EX` blocked by our `LOCK_SH` → they exit on their own detector	✅ cross-version mutex preserved

Also soften the failure path: logger.warning + return None instead of sys.exit(1). The XDG path is the real lock; refusing to start on legacy contention was strictly worse UX than a noisy concurrent start.

Test plan

uv run pytest packages/memtomem/tests/test_server_sigterm.py -v — 9 passed (3 new/modified)
- New unit: test_legacy_lock_sh_allows_multiple_holders pins the fcntl primitive.
- New integration: test_two_post_412_servers_coexist_with_shared_lock spawns two servers (shared HOME, separate XDG_RUNTIME_DIR), both reach pid-file-written state. This is the live repro of server: legacy flock uses LOCK_EX so two 0.1.26 instances can't coexist (should be LOCK_SH) #444.
- Modified: test_server_refuses_when_legacy_lock_held → test_server_warns_but_proceeds_when_legacy_lock_held_exclusively. Semantics flipped — the "refuse" behavior was exactly the bug.
uv run pytest packages/memtomem/tests/ -m "not ollama" — 2253 passed
ruff check / ruff format --check / mypy — clean

Out of scope

Removing the legacy probe entirely (server: relocate .server.pid to $XDG_RUNTIME_DIR so ~/.memtomem/ stays lazy #412 transition end). Still needed as long as 0.1.24 installs are plausible in the field.
Stderr message cleanup for the "pre-0.1.25 install" wording. It's now a warning, not a fatal, and the wording is accurate in the remaining case where it fires.

Follow-up release

Target v0.1.27. Same hotfix shape as 0.1.26.

🤖 Generated with Claude Code

…ist (closes #444) `_try_hold_legacy_flock` took `LOCK_EX` and called `sys.exit(1)` on contention. The intent (#412 B1) was a cross-version mutex against pre-0.1.25 servers whose WAL write model could corrupt shared state. The side effect: two *current-version* servers couldn't coexist either, so only one Claude Code session at a time could connect. Fix: take `LOCK_SH` instead. Shared locks compose with other shared locks but still conflict with exclusive, so: - 0.1.26 ⋈ 0.1.26: both hold `LOCK_SH` → coexist. ✓ user goal. - 0.1.26 starts while pre-0.1.25 holds `LOCK_EX`: our `LOCK_SH` fails, we fall through to the XDG path with a warning (which is the authoritative lock for current-generation anyway). - pre-0.1.25 starts while we hold `LOCK_SH`: pre-0.1.25's own `LOCK_EX` attempt fails → pre-0.1.25 exits on its own concurrent- detection path. ✓ cross-version mutex preserved. Also soften the failure path: log a warning + return `None` instead of `sys.exit(1)`. The XDG flock is the real primary generation lock; aborting in the transition probe was strictly worse UX than a noisy concurrent start. Tests: - Unit: `test_legacy_lock_sh_allows_multiple_holders` pins the fcntl primitive — two `LOCK_SH` compose, a concurrent `LOCK_EX` still fails. - Integration: `test_two_post_412_servers_coexist_with_shared_lock` spawns two `memtomem-server` subprocesses with shared `HOME`, separate `XDG_RUNTIME_DIR`; both reach the pid-file-written state and survive. This is the live repro of #444. - Renamed `test_server_refuses_when_legacy_lock_held` → `test_server_warns_but_proceeds_when_legacy_lock_held_exclusively`; semantics flipped from "must exit non-zero" to "must survive and write XDG pid" when legacy `LOCK_EX` is held, because the old strictness was exactly what #444 observes. Full CI-filter suite: 2253 passed. ruff, mypy clean. Co-Authored-By: Claude <[email protected]>

memtomem merged commit 61c0167 into main Apr 24, 2026
7 checks passed

memtomem deleted the fix/444-legacy-lock-sh branch April 24, 2026 01:55

github-actions Bot locked and limited conversation to collaborators Apr 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(server): legacy flock LOCK_EX → LOCK_SH so 0.1.26+ instances coexist (closes #444)#445

fix(server): legacy flock LOCK_EX → LOCK_SH so 0.1.26+ instances coexist (closes #444)#445
memtomem merged 1 commit intomainfrom
fix/444-legacy-lock-sh

memtomem commented Apr 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

memtomem commented Apr 24, 2026

Summary

Why

Fix

Test plan

Out of scope

Follow-up release

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants