fix(server): legacy flock LOCK_EX → LOCK_SH so 0.1.26+ instances coexist (closes #444)#445
Merged
fix(server): legacy flock LOCK_EX → LOCK_SH so 0.1.26+ instances coexist (closes #444)#445
Conversation
…ist (closes #444) `_try_hold_legacy_flock` took `LOCK_EX` and called `sys.exit(1)` on contention. The intent (#412 B1) was a cross-version mutex against pre-0.1.25 servers whose WAL write model could corrupt shared state. The side effect: two *current-version* servers couldn't coexist either, so only one Claude Code session at a time could connect. Fix: take `LOCK_SH` instead. Shared locks compose with other shared locks but still conflict with exclusive, so: - 0.1.26 ⋈ 0.1.26: both hold `LOCK_SH` → coexist. ✓ user goal. - 0.1.26 starts while pre-0.1.25 holds `LOCK_EX`: our `LOCK_SH` fails, we fall through to the XDG path with a warning (which is the authoritative lock for current-generation anyway). - pre-0.1.25 starts while we hold `LOCK_SH`: pre-0.1.25's own `LOCK_EX` attempt fails → pre-0.1.25 exits on its own concurrent- detection path. ✓ cross-version mutex preserved. Also soften the failure path: log a warning + return `None` instead of `sys.exit(1)`. The XDG flock is the real primary generation lock; aborting in the transition probe was strictly worse UX than a noisy concurrent start. Tests: - Unit: `test_legacy_lock_sh_allows_multiple_holders` pins the fcntl primitive — two `LOCK_SH` compose, a concurrent `LOCK_EX` still fails. - Integration: `test_two_post_412_servers_coexist_with_shared_lock` spawns two `memtomem-server` subprocesses with shared `HOME`, separate `XDG_RUNTIME_DIR`; both reach the pid-file-written state and survive. This is the live repro of #444. - Renamed `test_server_refuses_when_legacy_lock_held` → `test_server_warns_but_proceeds_when_legacy_lock_held_exclusively`; semantics flipped from "must exit non-zero" to "must survive and write XDG pid" when legacy `LOCK_EX` is held, because the old strictness was exactly what #444 observes. Full CI-filter suite: 2253 passed. ruff, mypy clean. Co-Authored-By: Claude <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #444. Two
memtomem-serverinstances (e.g. one per Claude Code session, across multiple projects) can now coexist. Previously the second one was killed by the legacy-flock probe.Why
_try_hold_legacy_flockinserver/__init__.pywas intended as a #412 B1 cross-version mutex: stop a new 0.1.26+ server from running concurrently with a pre-0.1.25 server whose WAL model would corrupt shared state. It did this withLOCK_EX | LOCK_NB→sys.exit(1)on failure.That correctly blocks pre-0.1.25 ⟂ 0.1.26, but as a side effect it also blocks 0.1.26 ⟂ 0.1.26. That's the user-visible bug: open Claude Code in project A → memtomem connects. Open Claude Code in project B → "Failed to connect", and the only way to recover A is to
pkill -f memtomem-server; rm -f ~/.memtomem/.server.pidand switch sides.The XDG path (the authoritative lock for the current generation) already treats contention as "warn and continue" (
server/__init__.py:277-288). The legacy path just hadn't been updated to match.Fix
Switch the legacy flock to
LOCK_SH:LOCK_SHLOCK_SHblocked by theirLOCK_EX→ we warn & fall throughLOCK_EXblocked by ourLOCK_SH→ they exit on their own detectorAlso soften the failure path:
logger.warning+return Noneinstead ofsys.exit(1). The XDG path is the real lock; refusing to start on legacy contention was strictly worse UX than a noisy concurrent start.Test plan
uv run pytest packages/memtomem/tests/test_server_sigterm.py -v— 9 passed (3 new/modified)test_legacy_lock_sh_allows_multiple_holderspins the fcntl primitive.test_two_post_412_servers_coexist_with_shared_lockspawns two servers (sharedHOME, separateXDG_RUNTIME_DIR), both reach pid-file-written state. This is the live repro of server: legacy flock uses LOCK_EX so two 0.1.26 instances can't coexist (should be LOCK_SH) #444.test_server_refuses_when_legacy_lock_held→test_server_warns_but_proceeds_when_legacy_lock_held_exclusively. Semantics flipped — the "refuse" behavior was exactly the bug.uv run pytest packages/memtomem/tests/ -m "not ollama"— 2253 passedruff check/ruff format --check/mypy— cleanOut of scope
.server.pidto $XDG_RUNTIME_DIR so~/.memtomem/stays lazy #412 transition end). Still needed as long as 0.1.24 installs are plausible in the field.Follow-up release
Target v0.1.27. Same hotfix shape as 0.1.26.
🤖 Generated with Claude Code