Skip to content

fix(server): legacy flock LOCK_EX → LOCK_SH so 0.1.26+ instances coexist (closes #444)#445

Merged
memtomem merged 1 commit intomainfrom
fix/444-legacy-lock-sh
Apr 24, 2026
Merged

fix(server): legacy flock LOCK_EX → LOCK_SH so 0.1.26+ instances coexist (closes #444)#445
memtomem merged 1 commit intomainfrom
fix/444-legacy-lock-sh

Conversation

@memtomem
Copy link
Copy Markdown
Owner

Summary

Closes #444. Two memtomem-server instances (e.g. one per Claude Code session, across multiple projects) can now coexist. Previously the second one was killed by the legacy-flock probe.

Why

_try_hold_legacy_flock in server/__init__.py was intended as a #412 B1 cross-version mutex: stop a new 0.1.26+ server from running concurrently with a pre-0.1.25 server whose WAL model would corrupt shared state. It did this with LOCK_EX | LOCK_NBsys.exit(1) on failure.

That correctly blocks pre-0.1.25 ⟂ 0.1.26, but as a side effect it also blocks 0.1.26 ⟂ 0.1.26. That's the user-visible bug: open Claude Code in project A → memtomem connects. Open Claude Code in project B → "Failed to connect", and the only way to recover A is to pkill -f memtomem-server; rm -f ~/.memtomem/.server.pid and switch sides.

The XDG path (the authoritative lock for the current generation) already treats contention as "warn and continue" (server/__init__.py:277-288). The legacy path just hadn't been updated to match.

Fix

Switch the legacy flock to LOCK_SH:

Scenario Legacy flock interaction Net effect
0.1.26 ⋈ 0.1.26 both take LOCK_SH ✅ coexist (the fix)
0.1.26 after pre-0.1.25 our LOCK_SH blocked by their LOCK_EX → we warn & fall through ✅ XDG path still lets us run, and our SH-less state doesn't break anything for them
pre-0.1.25 after 0.1.26 their LOCK_EX blocked by our LOCK_SH → they exit on their own detector ✅ cross-version mutex preserved

Also soften the failure path: logger.warning + return None instead of sys.exit(1). The XDG path is the real lock; refusing to start on legacy contention was strictly worse UX than a noisy concurrent start.

Test plan

  • uv run pytest packages/memtomem/tests/test_server_sigterm.py -v9 passed (3 new/modified)
    • New unit: test_legacy_lock_sh_allows_multiple_holders pins the fcntl primitive.
    • New integration: test_two_post_412_servers_coexist_with_shared_lock spawns two servers (shared HOME, separate XDG_RUNTIME_DIR), both reach pid-file-written state. This is the live repro of server: legacy flock uses LOCK_EX so two 0.1.26 instances can't coexist (should be LOCK_SH) #444.
    • Modified: test_server_refuses_when_legacy_lock_heldtest_server_warns_but_proceeds_when_legacy_lock_held_exclusively. Semantics flipped — the "refuse" behavior was exactly the bug.
  • uv run pytest packages/memtomem/tests/ -m "not ollama"2253 passed
  • ruff check / ruff format --check / mypy — clean

Out of scope

Follow-up release

Target v0.1.27. Same hotfix shape as 0.1.26.

🤖 Generated with Claude Code

…ist (closes #444)

`_try_hold_legacy_flock` took `LOCK_EX` and called `sys.exit(1)` on
contention. The intent (#412 B1) was a cross-version mutex against
pre-0.1.25 servers whose WAL write model could corrupt shared state.
The side effect: two *current-version* servers couldn't coexist either,
so only one Claude Code session at a time could connect.

Fix: take `LOCK_SH` instead. Shared locks compose with other shared
locks but still conflict with exclusive, so:

- 0.1.26 ⋈ 0.1.26: both hold `LOCK_SH` → coexist. ✓ user goal.
- 0.1.26 starts while pre-0.1.25 holds `LOCK_EX`: our `LOCK_SH` fails,
  we fall through to the XDG path with a warning (which is the
  authoritative lock for current-generation anyway).
- pre-0.1.25 starts while we hold `LOCK_SH`: pre-0.1.25's own
  `LOCK_EX` attempt fails → pre-0.1.25 exits on its own concurrent-
  detection path. ✓ cross-version mutex preserved.

Also soften the failure path: log a warning + return `None` instead
of `sys.exit(1)`. The XDG flock is the real primary generation lock;
aborting in the transition probe was strictly worse UX than a noisy
concurrent start.

Tests:
- Unit: `test_legacy_lock_sh_allows_multiple_holders` pins the fcntl
  primitive — two `LOCK_SH` compose, a concurrent `LOCK_EX` still
  fails.
- Integration: `test_two_post_412_servers_coexist_with_shared_lock`
  spawns two `memtomem-server` subprocesses with shared `HOME`,
  separate `XDG_RUNTIME_DIR`; both reach the pid-file-written state
  and survive. This is the live repro of #444.
- Renamed `test_server_refuses_when_legacy_lock_held` →
  `test_server_warns_but_proceeds_when_legacy_lock_held_exclusively`;
  semantics flipped from "must exit non-zero" to "must survive and
  write XDG pid" when legacy `LOCK_EX` is held, because the old
  strictness was exactly what #444 observes.

Full CI-filter suite: 2253 passed. ruff, mypy clean.

Co-Authored-By: Claude <[email protected]>
@memtomem memtomem merged commit 61c0167 into main Apr 24, 2026
7 checks passed
@memtomem memtomem deleted the fix/444-legacy-lock-sh branch April 24, 2026 01:55
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 24, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

server: legacy flock uses LOCK_EX so two 0.1.26 instances can't coexist (should be LOCK_SH)

2 participants