server: failed-handshake leaves legacy `.server.pid` flock locked; reconnects loop

## Summary

When a `memtomem-server` child process started by an MCP client (Claude Code) fails its stdio handshake but the process **stays alive**, it continues to hold the legacy `~/.memtomem/.server.pid` flock. Every subsequent reconnect attempt from the client spawns a fresh child that aborts immediately with:

```
error: another memtomem-server holds a lock at /Users/.../.memtomem/.server.pid (likely a pre-0.1.25 install). Stop it before starting a new server; `mm uninstall` will also refuse until it is gone.
```

Manual `ps`+`kill` is required to recover. The error message blames "pre-0.1.25" but the lock holder here is a current-build server — the legacy path just happens to still be the one that gets held.

## Repro (observed live; not yet minimized)

1. Claude Code opens a project with `.mcp.json` pointing at `memtomem-server`.
2. First server child spawns, acquires legacy flock at `~/.memtomem/.server.pid`.
3. Handshake fails for some reason (exact cause TBD — the child stayed up as an orphan), Claude Code's UI shows `✘ failed`.
4. Reconnect → new child hits the flock held by (1) → aborts with the message above.
5. Loop until the user manually `kill`s the orphan.

Reliable trigger path not yet isolated. Filed on one repro because the **recovery story is poor** regardless of how the handshake fails: a dead-on-arrival server handshake shouldn't produce a lock that requires manual cleanup.

## Where (starting points)

- `packages/memtomem/src/memtomem/server/__init__.py:241-255` — `_try_hold_legacy_flock(legacy_server_pid_path())` is acquired early, before the MCP stdio handshake. If the server exits via an unhandled path (or stays alive despite handshake failure) the lock can outlive a useful session.
- `_runtime_paths.py:148-161` — `server_pid_path()` (new `$XDG_RUNTIME_DIR/memtomem/server.pid`) vs `legacy_server_pid_path()` (`~/.memtomem/.server.pid`). Both are held in parallel for migration (#412).

## Suggested investigation (hypothesis)

1. Confirm whether orphan processes here are the original handshake failure or a reconnect storm that the lock itself perpetuates.
2. If a handshake fails, the server should release the legacy flock on teardown. Check the exit-path / signal handling — related to `feedback_asyncio_swallows_systemexit.md` and `feedback_cancelled_error_except_gap.md` experience: `except Exception` misses `CancelledError`, teardowns need `except BaseException` + selective re-raise.
3. Consider replacing the "refuse to start" behavior with a liveness probe: if the PID on the legacy file is no longer a live `memtomem-server` process, take over the lock rather than abort. The current "pre-0.1.25 install" message is misleading when the holder is a current-build orphan.

## Out of scope

- The `mm init` workspace-classification bug that made my `.mcp.json` unconventional in the first place — filed as #436.
- Redesigning the legacy-flock migration (#412 landed that).

## Recovery (for anyone hitting this now)

```bash
pgrep -f memtomem-server          # identify the orphan
kill <pid>
rm -f ~/.memtomem/.server.pid     # only after confirming no live holder
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: failed-handshake leaves legacy `.server.pid` flock locked; reconnects loop #437

Summary

Repro (observed live; not yet minimized)

Where (starting points)

Suggested investigation (hypothesis)

Out of scope

Recovery (for anyone hitting this now)

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

server: failed-handshake leaves legacy .server.pid flock locked; reconnects loop #437

Description

Summary

Repro (observed live; not yet minimized)

Where (starting points)

Suggested investigation (hypothesis)

Out of scope

Recovery (for anyone hitting this now)

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

server: failed-handshake leaves legacy `.server.pid` flock locked; reconnects loop #437