Skip to content

fix(tests): unbreak Windows CI on post-#818 server tests (closes #819)#820

Merged
memtomem merged 2 commits intomainfrom
fix/819-windows-ci-followup
May 6, 2026
Merged

fix(tests): unbreak Windows CI on post-#818 server tests (closes #819)#820
memtomem merged 2 commits intomainfrom
fix/819-windows-ci-followup

Conversation

@memtomem
Copy link
Copy Markdown
Owner

@memtomem memtomem commented May 6, 2026

Summary

Closes #819. PR #818's Windows CI cell surfaced 6 failures that POSIX CI didn't catch. Two distinct root causes, both addressed here.

Root cause A — Windows LockFileEx blocks reads from other handles

POSIX flock/fcntl is advisory; reads from any handle proceed unaffected. Windows LockFileEx(LOCKFILE_EXCLUSIVE_LOCK) blocks all I/O on the locked byte range from other handles. So pid_file.read_text() from a fresh handle raises PermissionError (ERROR_LOCK_VIOLATION) while another handle holds the lock.

  • test_contended_server_start_preserves_pid_file_content now reads through the lock-owning holder (holder.seek(0); read()) instead of opening a separate handle. Cross-platform safe.
  • test_server_main_acquires_portalocker_pid_lock can't reach main()'s closure-scoped _lock_fp. Pid-content equality is now POSIX-only; the load-bearing cross-platform assertions remain (exists(), probe_pid_file().alive is True, post-cleanup not exists).

Root cause B — TestServerAliveRefuses class-level un-skip was too coarse

#818 removed the class-level Windows skip because _hold_pid_lock now uses portalocker. The lock mechanics work, but four tests assert POSIX-specific output text (lsof, recorded pid in message) or POSIX-only --force behavior (unlink-while-open per #730). Per-test skipif re-added with reasons pointing at this issue. Other tests in the class continue to run on Windows.

Test plan

  • POSIX targeted run: uv run pytest packages/memtomem/tests/test_server_sigterm.py packages/memtomem/tests/test_uninstall_cmd.py — 42 passed, 2 skipped
  • Ruff clean
  • Windows CI cell — should pass cleanly on this branch

Follow-up

Widening the TestServerAliveRefuses assertions to accept Windows wording (so we keep Windows coverage of the "Server alive → refuse" path) is a separate UX-side concern — would change the uninstall command's user-facing text on Windows or accept platform-specific alternates in the asserts. Out of scope for this fix.

🤖 Generated with Claude Code

memtomem and others added 2 commits May 6, 2026 15:22
…819)

PR #818's Windows CI cell surfaced six failures that POSIX CI hadn't
caught. Two distinct root causes:

1. Windows ``LockFileEx(LOCK_EX)`` blocks reads from *other* handles,
   not just other lock attempts (POSIX ``flock`` is advisory and lets
   reads through). So ``pid_file.read_text()`` from a fresh handle
   raises ``PermissionError`` (ERROR_LOCK_VIOLATION) while the test
   or ``main()`` still holds the lock.

   - ``test_contended_server_start_preserves_pid_file_content`` now
     reads through the lock-owning ``holder`` (``holder.seek(0); read()``)
     instead of opening a separate handle. Cross-platform safe — POSIX
     was happy either way.
   - ``test_server_main_acquires_portalocker_pid_lock`` cannot reach
     ``main()``'s closure-scoped ``_lock_fp`` to read through it. The
     pid-content equality assertion is now POSIX-only; the
     cross-platform load-bearing assertions (``pid_file.exists()``,
     ``probe_pid_file().alive is True``, post-cleanup ``not exists``)
     still run on every OS. ``probe_pid_file`` is already designed to
     treat the Windows lock-blocks-reads case as alive (catches
     ``OSError`` at open), so the alive-probe is the right check there.

2. ``TestServerAliveRefuses`` got its class-level Windows skip removed
   in #818 because ``_hold_pid_lock`` now uses portalocker. The lock
   mechanics work cross-platform, but four cases assert POSIX-specific
   user-facing text or POSIX-specific ``--force`` behavior:

   - ``test_refuses_when_server_alive_at_legacy_path``,
     ``test_refuses_when_server_alive_at_runtime_path``: assert
     ``"Server still running"`` and ``str(os.getpid())`` in output;
     Windows substitutes a Sysinternals/handle.exe hint that omits
     the recorded pid.
   - ``test_refuses_with_unknown_pid_branch_when_pid_file_empty``:
     asserts ``"lsof"``; Windows uses Resource Monitor / handle.exe.
   - ``test_force_overrides_liveness``: asserts ``--force`` succeeds;
     Windows refuses by design (#730 — ``unlink-while-open`` raises
     WinError 32).

   These four get per-test ``skipif(sys.platform == "win32")`` back
   with a reason that points at #819. The other tests in the class
   continue to run on Windows. Widening the assertions to accept
   the Windows wording is a separate UX-side follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…w on #820)

PR #820 review: skipping the four ``TestServerAliveRefuses`` cases on
Windows passes CI but loses regression coverage on the load-bearing
contract this whole branch is supposed to deliver — "the
portalocker-held pid lock is honored as a live writer on Windows."
Skipping the tests means a future change that breaks
``probe_pid_file`` on Windows would slip through.

Change shape:

- ``test_refuses_when_server_alive_at_legacy_path`` and
  ``test_refuses_when_server_alive_at_runtime_path``: drop the
  Windows skip. Common assertions stay (``exit_code == 2``,
  ``"Server still running"`` substring, state preserved). The pid
  assertion is now platform-aware: POSIX still asserts the recorded
  pid is in the output (``lsof`` path), Windows asserts a
  Sysinternals/Resource Monitor hint is present and accepts that the
  recorded pid is absent — ``probe_pid_file`` cannot read the locked
  file on Windows (LockFileEx blocks reads from other handles), so
  ``pid`` resolves to ``None`` and the message routes through the
  unknown-pid branch with Windows-native holder hints. Both branches
  pin "Server alive → uninstall refuses → state preserved", which is
  the actual contract this PR is asserting.

- ``test_refuses_with_unknown_pid_branch_when_pid_file_empty``:
  Windows skip kept. On Windows ``pid=None`` is the default for any
  locked file, so the test's POSIX-specific signal ("empty pid file
  + held lock = the truncate-race fingerprint") cannot be
  distinguished from "any locked file." Keeping it POSIX-only is
  honest about what the test pins.

- ``test_force_overrides_liveness``: stays POSIX-only by design (the
  test asserts ``--force`` succeeds via ``unlink-while-open``, which
  is a POSIX-only contract). Skip reason updated to point at the new
  Windows mirror below.

- New ``test_force_refuses_on_windows_when_pid_locked``: Windows-only
  mirror of ``test_force_overrides_liveness``. Pins the #730
  destructive-CLI contract on the pid-lock side: ``--force`` must
  refuse cleanly (``exit_code == 2``) when NTFS would refuse
  ``unlink-while-open`` (WinError 32), preserving state rather than
  half-wiping it. Complements the existing
  ``test_force_refuses_on_windows_when_writer_alive`` which pins the
  same contract on the DB-lock side.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@memtomem memtomem merged commit 0762a54 into main May 6, 2026
20 of 21 checks passed
@github-actions github-actions Bot locked and limited conversation to collaborators May 6, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug]: Windows CI fails on test_server_sigterm.py + TestServerAliveRefuses after #818

1 participant