Skip to content

fix(server): swap fcntl→portalocker so memtomem-server runs on Windows (closes #817)#818

Merged
memtomem merged 2 commits intomainfrom
fix/817-windows-server-portalocker
May 6, 2026
Merged

fix(server): swap fcntl→portalocker so memtomem-server runs on Windows (closes #817)#818
memtomem merged 2 commits intomainfrom
fix/817-windows-server-portalocker

Conversation

@memtomem
Copy link
Copy Markdown
Owner

@memtomem memtomem commented May 6, 2026

Summary

Closes #817. Follow-up to #625 — that issue's close note promised this work as a separate PR if a Windows server target ever landed.

memtomem-server (the MCP server entry point) was the last POSIX-only surface in the codebase: PR #652 swapped the three CLI-side fcntl callsites onto portalocker, but deliberately left server/__init__.py's two callsites for a follow-up. With users now wanting to point Windows MCP hosts (Claude Desktop / Code) at memtomem directly, that follow-up is here.

Changes

packages/memtomem/src/memtomem/server/__init__.py — three edits:

  • _try_hold_legacy_flock early-returns on Windows. The legacy flock interlocks with pre-0.1.25 memtomem-server instances, which were Linux-only by construction. Skipping the probe is semantically correct (no Windows-side counterparty exists) and sidesteps the question of whether portalocker's Windows backend selection (MsvcrtLocker vs Win32Locker) uniformly implements LOCK_SH. On POSIX, the LOCK_SH | LOCK_NB lock swaps to portalocker.lock(...) 1:1.
  • main() — XDG-runtime pid lock now uses portalocker.lock(LOCK_EX | LOCK_NB). Exception tuple widened to (portalocker.LockException, BlockingIOError, OSError) to match the canonical pattern in cli/_liveness.py:probe_pid_file.
  • _install_sigterm_handler POSIX-gates the signal.signal(SIGTERM, ...) registration. Python's SIGTERM is a no-op on Windows; on Windows the teardown path is FastMCP's stdin-EOF + atexit.

Tests:

  • test_server_sigterm.py — three signal-capture unit tests now Windows-skipped; new Windows-only test_install_sigterm_handler_is_noop_on_windows pins the no-op contract; integration-skip reasons refreshed; test_contended_server_start_preserves_pid_file_content and test_server_uses_tempdir_fallback_when_xdg_unset now Windows-runnable (simulator on portalocker, env dict carries HOME+USERPROFILE+TMP/TEMP/TMPDIR, uid falls back to 0, mode-bit asserts POSIX-gated).
  • New cross-platform test_server_main_acquires_portalocker_pid_lock runs main() in-process behind aggressive isolation (stubs _install_sigterm_handler and mcp.run, captures atexit.register and runs callbacks LIFO in finally) — regression net against any future "swap portalocker back to fcntl" attempt.
  • test_uninstall_cmd.py::_hold_pid_lock helper now portalocker-based; TestServerAliveRefuses class no longer Windows-skipped.
  • test_no_posix_only_imports.py docstring refreshed — server/__init__.py is now portalocker-only, so the lazy-import exemption is general infrastructure rather than that specific callsite. AST scan logic is unchanged.

Out of scope (separate follow-ups)

  • Windows ODR registration (MSIX manifest / MCPB bundle / odr.exe). That's packaging, not server code — the Windows MCP overview is registration-side concern and orthogonal to making the server actually run on Windows.
  • Flipping the CI Windows job from continue-on-error: true to required. That's gated on the Windows-triage umbrella ci: add Windows runner to CI matrix #634.
  • Windows-specific shutdown via SetConsoleCtrlHandler. Stdin-EOF + atexit covers MCP-host disconnect cleanly; the explicit-kill story can come later if real users need it.

Notes for Windows users

After memtomem-server shutdown on Windows, ~/.memtomem/.server.pid is not created — the legacy-flock probe short-circuits because pre-0.1.25 Windows servers cannot exist. The pid file lives at %LOCALAPPDATA%\Temp\memtomem-0\server.pid and is unlinked via atexit on host disconnect.

Test plan

  • uv run pytest -m "not ollama" packages/memtomem/tests/test_no_posix_only_imports.py packages/memtomem/tests/test_server_sigterm.py packages/memtomem/tests/test_uninstall_cmd.py — 280 passed, 2 skipped (Windows-only)
  • uv run pytest -m "not ollama" packages/memtomem/tests/ — 4228 passed, 9 skipped
  • uv run ruff check packages/memtomem/src && uv run ruff format --check packages/memtomem/src
  • Windows runner CI (informational) — newly Windows-runnable tests should pass under the existing matrix cell.
  • Manual: memtomem-server boots on a real Windows host, MCP host (Claude Desktop / Code) lists tools and answers mem_status, pid file appears at %LOCALAPPDATA%\Temp\memtomem-0\server.pid and is unlinked at shutdown.

🤖 Generated with Claude Code

memtomem and others added 2 commits May 6, 2026 14:52
closes #817)

PR #652 (v0.1.34) made the `mm` CLI Windows-clean by moving three
callsites onto portalocker, but deliberately deferred the MCP server
itself. The two `fcntl.flock` callsites in `server/__init__.py` sat
inside lazy `import fcntl` blocks — keeping `mm` loadable on Windows,
but crashing `memtomem-server` the moment `main()` reached the lock
acquisition path. The #625 close note named this as the explicit
follow-up.

Three changes in `server/__init__.py`:

- `_try_hold_legacy_flock` early-returns on Windows. The legacy flock
  exists solely to interlock with pre-0.1.25 `memtomem-server`
  instances, which were Linux-only by construction (the `mm` CLI did
  not load on Windows until #652 / 0.1.34). Skipping the probe is
  semantically correct — the cross-version mutex has no Windows-side
  counterparty — and sidesteps the question of whether portalocker's
  Windows backend selection (`MsvcrtLocker` vs `Win32Locker`)
  uniformly implements `LOCK_SH`. On POSIX, `LOCK_SH | LOCK_NB`
  swaps to `portalocker.lock(LOCK_SH | LOCK_NB)` 1:1.
- `main()`'s XDG-runtime pid lock swaps `LOCK_EX | LOCK_NB` to
  `portalocker.lock(...)`, with the exception tuple widened to
  `(portalocker.LockException, BlockingIOError, OSError)` to mirror
  the canonical pattern in `cli/_liveness.py:probe_pid_file`.
- `_install_sigterm_handler` POSIX-gates `signal.signal(SIGTERM, ...)`.
  Python's SIGTERM is a no-op on Windows (the C runtime does not
  deliver it), so registering a handler would silently mislead readers
  into thinking shutdown was wired. The Windows path relies on
  FastMCP's stdin-EOF teardown + `atexit` instead.

Test surface adjustments:

- `test_server_sigterm.py`: keep SIGTERM-firing and legacy-flock
  integration tests POSIX-skipped (with refreshed `reason` strings),
  add Windows skips to the three signal-capture unit tests, and add
  a Windows-only `test_install_sigterm_handler_is_noop_on_windows`
  pin so the no-op contract has a regression net.
- `test_contended_server_start_preserves_pid_file_content` is now
  Windows-runnable: simulator switched to portalocker (`"rb+"` open
  for `MsvcrtLocker` compatibility), env dict carries both `HOME`
  and `USERPROFILE`, mode-bit chmods POSIX-gated.
- `test_server_uses_tempdir_fallback_when_xdg_unset` is now
  Windows-runnable: subprocess env sets HOME / USERPROFILE / TMP /
  TEMP / TMPDIR (Windows reads multiple candidates), uid suffix
  collapses to 0 when `os.geteuid` is missing, mode-bit assert
  POSIX-gated. The full reasoning is in the test docstring.
- New cross-platform `test_server_main_acquires_portalocker_pid_lock`
  runs `main()` in-process behind aggressive isolation: stubs
  `_install_sigterm_handler` and `mcp.run`, pins `Path.home()` and
  `XDG_RUNTIME_DIR` to tmp paths, captures `atexit.register` calls
  and runs them LIFO in `finally`. This is the regression net for
  any future "swap portalocker back to fcntl" attempt.
- `test_uninstall_cmd.py::_hold_pid_lock` helper switched to
  portalocker; `TestServerAliveRefuses` class no longer Windows-skipped.
- `test_no_posix_only_imports.py` docstring refreshed — `server/__init__.py`
  is now portalocker-only, so the lazy-import exemption is general
  infrastructure rather than that specific callsite. The AST scan logic
  is unchanged.

Out of scope for this PR (separate follow-ups if needed): Windows
ODR registration via MSIX manifest / MCPB bundle / `odr.exe`, and
flipping the CI Windows job from informational to required (gated on
the broader Windows-triage umbrella #634).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…#818)

PR #818 review: with SIGTERM POSIX-gated, the only graceful teardown
route on Windows is the existing two-step ``atexit`` registration:

    atexit.register(lambda: _lock_fp.close())   # LIFO: runs second
    atexit.register(lambda: pid_file.unlink(...))   # LIFO: runs first

That ordering is correct on POSIX (unlink-while-open keeps the inode
alive until close — defensive against another process replacing the
path between operations) but breaks on Windows: NTFS refuses to delete
an open or locked handle, so ``unlink`` raises ``PermissionError``
(WinError 32). The MCP host disconnects, atexit fires, the unlink is
swallowed by atexit's exception isolation, and a stale ``server.pid``
plus an ignored "Error in atexit._run_exitfuncs" message survive every
clean shutdown — and the next start misreads the leftover file.

Replace both two-call patterns (XDG path and the legacy-flock path)
with a single composite cleanup function that branches on platform:

- POSIX: unlink first, then close. Preserves the ``#437`` invariant
  (delete exactly the inode we own the flock on).
- Windows: close first, then unlink. The close releases the
  portalocker lock and the file handle, after which NTFS lets us
  unlink. The trade-off — a sub-millisecond window where the path
  exists without an active holder — is dominated by ``probe_pid_file``,
  which already opens its own handle to decide alive/dead and so is
  authoritative independent of this race.

The legacy-flock cleanup gets the same composite shape even though
``_try_hold_legacy_flock`` short-circuits to ``None`` on Windows today.
That short-circuit is the only thing keeping the legacy path
Windows-safe; if a future change relaxes it, the same bug shape would
silently re-appear. Using the same pattern in both blocks makes the
defense uniform.

Test pin: ``test_server_main_acquires_portalocker_pid_lock`` now runs
the captured ``atexit`` callbacks and asserts the pid file is gone
afterwards. On POSIX the assertion is satisfied by either the old or
the new ordering, but on Windows the old ordering would raise inside
``unlink`` and leave the file behind — so any future revert to two
``atexit.register`` calls fails this test on Windows CI rather than
shipping a stale-pid bug.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@memtomem memtomem merged commit 9680acb into main May 6, 2026
10 of 11 checks passed
@github-actions github-actions Bot locked and limited conversation to collaborators May 6, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug]: memtomem-server is POSIX-only — fcntl→portalocker swap to enable Windows MCP host

1 participant