Skip to content

fix(server): unlink legacy .server.pid on atexit and SIGTERM (closes #437)#439

Merged
memtomem merged 1 commit intomainfrom
fix/437-legacy-flock-teardown
Apr 23, 2026
Merged

fix(server): unlink legacy .server.pid on atexit and SIGTERM (closes #437)#439
memtomem merged 1 commit intomainfrom
fix/437-legacy-flock-teardown

Conversation

@memtomem
Copy link
Copy Markdown
Owner

Summary

  • ~/.memtomem/.server.pid now unlinks on both atexit (normal stdin-EOF exit) and SIGTERM, matching the teardown the new $XDG_RUNTIME_DIR/memtomem/server.pid path already has.
  • _install_sigterm_handler becomes variadic (*pid_files: Path) so main() can pass the new and legacy pid files together when _try_hold_legacy_flock acquired the legacy lock.

Fixes #437. Root cause investigation and live repro are in the issue comments.

The bug surfaces as ✘ Failed to connect in Claude Code / claude mcp list after any prior memtomem-server shutdown, because the legacy file remains on disk and parallel MCP probes race on open("a+b") + flock(LOCK_EX|LOCK_NB), then exit via the "likely a pre-0.1.25 install" branch. rm ~/.memtomem/.server.pid was the manual workaround.

Why variadic instead of two separate calls

Two separate signal.signal(SIGTERM, ...) calls would have the second overwrite the first — only the last-registered handler fires. Bundling both paths into one handler is the shape that keeps the invariant ("SIGTERM cleans up every pid file this process owns").

Test plan

  • uv run pytest packages/memtomem/tests/test_server_sigterm.py -v — 7 passed
    • New: test_sigterm_handler_unlinks_all_pid_files (variadic unit)
    • New: test_sigterm_unlinks_legacy_pid_file_end_to_end (subprocess + SIGTERM + legacy pid unlink assertion)
  • uv run pytest packages/memtomem/tests/ -m "not ollama" — 2251 passed
  • uv run ruff check packages/memtomem/src && uv run ruff format --check packages/memtomem/src
  • uv run mypy packages/memtomem/src/memtomem/server/__init__.py — no issues
  • Existing test_server_refuses_when_legacy_lock_held still passes (unchanged contract: when a real holder has the flock, new server must still abort)

Out of scope

🤖 Generated with Claude Code

…437)

Previously ~/.memtomem/.server.pid was held via fcntl.flock for the
process lifetime but the file was never deleted on shutdown. The flock
itself is released by kernel cleanup when the process dies, so a
single next invocation would succeed — but parallel MCP health probes
(e.g. `claude mcp list` scanning multiple servers) can race on
`open("a+b") + flock(LOCK_EX|LOCK_NB)` against the stale file, hitting
the "another memtomem-server holds a lock (likely a pre-0.1.25 install)"
abort branch. The user-visible symptom is a transient "Failed to connect"
in Claude Code that recovers only after manual `rm ~/.memtomem/.server.pid`.

This mirrors the teardown already in place for the new
`$XDG_RUNTIME_DIR/memtomem/server.pid`:

- `atexit.register` gains an `unlink(missing_ok=True)` for the legacy
  path when `_try_hold_legacy_flock` succeeded (LIFO-ordered before the
  flock close so the unlink still targets an owned file).
- `_install_sigterm_handler` is now variadic (`*pid_files`) and `main()`
  passes both the new and legacy pid files so SIGTERM cleans up both.

Tested:
- New unit test (`test_sigterm_handler_unlinks_all_pid_files`) pins the
  variadic-unlink contract.
- New end-to-end test (`test_sigterm_unlinks_legacy_pid_file_end_to_end`)
  spawns `memtomem-server` with an existing `~/.memtomem/`, sends SIGTERM,
  and asserts the legacy pid file is gone. Would fail without this change.
- Existing `test_server_refuses_when_legacy_lock_held` still passes
  (unchanged: when a real holder has the flock, new server must abort).
- Full CI-filter suite (`-m "not ollama"`): 2251 passed.

Co-Authored-By: Claude <[email protected]>
@memtomem memtomem merged commit ae43a06 into main Apr 23, 2026
7 checks passed
@memtomem memtomem deleted the fix/437-legacy-flock-teardown branch April 23, 2026 23:50
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 23, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

server: failed-handshake leaves legacy .server.pid flock locked; reconnects loop

2 participants