ci: matrix infrastructure + macOS gating; Windows informational (#634)#638
Merged
ci: matrix infrastructure + macOS gating; Windows informational (#634)#638
Conversation
…e green) Promote Windows + macOS from implicitly-unsupported to CI-tested by matrixing the `test:` job. The first run is expected to surface latent breakage that's been hiding behind the ubuntu-only blind spot — that's the point. PR-A (#636) already cleared the predictable POSIX literal noise; what surfaces now should be real Windows/macOS issues. ## Per-job rationale Only `test:` gets the matrix. Other jobs stay ubuntu: - `lint` / `typecheck`: platform-agnostic; matrix would just slow PR feedback for no signal. - `test-llm-nightly`: opt-in nightly, not on PR fast path. - `notebooks`: low priority — Jupyter on Windows has its own quirks; defer until base `test:` is green. - `test-golden-path`: the cache key already uses `${{ runner.os }}`, but ONNX/fastembed cross-platform behavior deserves its own focused follow-up. Defer until we know how much wallclock the matrix adds. `fail-fast: false` keeps all 3 cells running so we triage the full picture per PR push instead of ratchet-ing one OS at a time. Action pins (`actions/checkout@v6`, `astral-sh/setup-uv@v7`) preserved to match the rest of the file. `setup-uv` is one major behind latest (v8.1.0); bumping is a separate concern — mixing it here would muddle the red-on-first-run signal. ## Pre-emptive Windows skips on `test_runtime_paths.py` (#637) Three test classes assume POSIX semantics that don't translate cleanly to NTFS: - `TestRuntimeDir` — the shared `_make_safe_xdg` fixture does `os.chmod(xdg, 0o700)`. NTFS reports synthesized POSIX bits that don't match `0o700`, so `_is_safe_dir` rejects the dir and the XDG branch silently falls through to tempdir — breaking tests that assert the XDG path. - `TestEnsureRuntimeDir` — direct mode-bit assertions (`S_IMODE(...) == 0o700`), `os.umask(0o177)` (Windows ignores umask), 0o755-vs-0o700 distinction. - `TestServerPidPath` — also routes through `_make_safe_xdg`. Skipped via `@pytest.mark.skipif(not hasattr(os, "geteuid"), ...)` — the existing convention in this same file (lines 110, 188, 264 use the identical idiom for `geteuid()`-based owner checks). Functionally equivalent to `sys.platform == "win32"` for our purposes; matching the local idiom keeps the file coherent. `TestLegacyServerPidPath` and `TestUidFallback` left unmarked — the latter is already POSIX-gated; the former uses HOME monkeypatch which may or may not work on Windows (let CI tell us, fix in iterate phase if needed). The runtime code itself (`packages/memtomem/src/memtomem/_runtime_paths.py`) already has a Windows-aware path through `tempfile.gettempdir()` + `uid=0`, but its NTFS behavior is unverified — that's what #637 tracks. ## Iterate-on-red expectations Acceptance per #634 allows triaged failures to ship as either fix or skip + tracking sub-issue. As Windows/macOS reds surface on this PR's matrix run, expect follow-up commits with either: - in-place cross-platform fix (cheap cases), or - platform skip + sub-issue, linked from this PR. Don't merge until all 3 cells are green or remaining reds are explicitly tracked. ## Verification - [x] `uv run ruff check ... && uv run ruff format --check ...` clean - [x] `uv run pytest --ignore=packages/memtomem/tests/test_golden_path.py -m "not ollama and not llm"` — 3554 passed locally on macOS dev shell (skipif markers inactive since `hasattr(os, "geteuid")` is True) - [ ] All 3 matrix cells green on the PR's `test` job Refs #634, #637 Co-Authored-By: Claude <[email protected]>
…r guard PR-B's first matrix run (#638) showed Windows fails import-time on 46 test modules with a single root cause: \`memtomem.context._atomic\` unconditionally \`import fcntl\` at module top, and \`fcntl\` is POSIX-only. The error chains through every \`mm\` CLI entry point (\`cli/__init__.py:33\` → \`context_cmd:9\` → \`context.agents:43\` → \`context._atomic:23\`), so even tests that don't touch \`_file_lock\` couldn't be collected. Fixes (in order of severity): 1. Module-level \`import fcntl\` → conditional on \`sys.platform\`, matching the existing in-house pattern in \`indexing/debounce.py\` and \`cli/_liveness.py\`. On Windows we import \`msvcrt\` instead. 2. \`_file_lock\` — POSIX still uses \`fcntl.flock(LOCK_EX)\`. Windows uses \`msvcrt.locking(LK_LOCK, 1)\` for a byte-range lock at offset 0. The semantic differences (advisory vs mandatory, whole-file vs 1-byte) don't matter for our use case — the lockfile is private to this contextmanager, all contenders go through the same code path, so they serialize correctly. Concurrency contract from PR #548 (the 20-thread test pinning sidecar lockfile correctness) is preserved on both platforms. 3. \`atomic_write_bytes\` — \`os.fchmod\` was added to Windows in Python 3.13; CI runs Python 3.12 on Windows, where the call would raise \`AttributeError\`. Guarded with \`hasattr(os, "fchmod")\`. On Windows Python < 3.13 the file is created with default permissions, which NTFS largely ignores beyond the read-only flag anyway — the integrity guarantee (atomic-replace, no half-written file) is unchanged. Other fcntl call sites in src already had the right pattern (\`indexing/debounce.py:155-181\`, \`cli/_liveness.py:40-74\` — both use \`if sys.platform != "win32"\` + lazy import inside the guard, with documented "Windows = no-op" trade-offs). \`server/__init__.py:265,297\` are inside server-startup functions that don't run during test collection, so they don't block the matrix; tracking those for a follow-up if \`mm\` server runs land in scope. Locally: 3554 unit tests still pass on macOS (POSIX path unchanged); CI matrix run on this PR will exercise the Windows branch. Disclosure: this is a drive-by Windows-portability fix surfaced by PR-B's matrix expansion — kept in this PR rather than split off because (a) without it, no Windows test collects, so PR-B's red signal is unreadable; (b) the change is small (~20 lines, one file, no new dependency); (c) it follows an established in-house pattern. Refs #634 Co-Authored-By: Claude <[email protected]>
Reframes PR-B from "close #634" to "land matrix infrastructure + gating ubuntu/macOS green; Windows informational while triage lands": 1. ``continue-on-error: ${{ matrix.os == 'windows-latest' }}`` on the ``test:`` job. ubuntu + macOS remain required-green; the Windows cell runs and reports but doesn't block the PR. Flip to false (or delete the line) when the Windows triage sub-issues close. Per-job inline comment points at #634 (umbrella). 2. Two review-flagged docstring nits in ``_atomic.py``: - ``_file_lock``: call out the ~10s timeout of ``msvcrt.locking(LK_LOCK)`` vs POSIX ``flock(LOCK_EX)``'s indefinite block. Lock window narrow enough that 10s is generous, but heavy I/O contention (AV scans, interactive debuggers) could surface as transient ``OSError``. - ``atomic_write_bytes``: explain why dropping ``os.fchmod`` on Windows Python 3.12 doesn't compromise the user-private intent of ``mode=0o600`` — NTFS ACL inheritance from ``%LOCALAPPDATA%``-style parents already enforces user-only access in practice. No behavior change beyond the matrix gating. Refs #634 Co-Authored-By: Claude <[email protected]>
This was referenced May 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Lands the CI matrix infrastructure for issue #634 — `test:` job now runs on `[ubuntu-latest, windows-latest, macos-latest]` with `fail-fast: false` so all three cells report per push. ubuntu + macOS are gating-green; Windows is informational (`continue-on-error`) while the cross-platform test triage tracked in #637 / #643 / #644 / #645 / #646 / #647 lands incrementally.
PR-A (#636) cleared the predictable POSIX literal noise. This PR adds the matrix, a class-level skip on `test_runtime_paths.py` for the POSIX mode-bit XDG tests (#637), and a portability fix to `memtomem.context._atomic` whose unconditional `import fcntl` blocked Windows test collection on the first matrix run.
#634 stays open as the umbrella; it closes when each Windows triage sub-issue closes and we flip `continue-on-error` off in a tiny follow-up PR.
Why scope was reframed mid-PR
The plan called for "fix in-place if cheap, otherwise platform-skip + sub-issue". On the first matrix push, Windows reported 121 failures across 23 test files — not noise, real Windows-specific bugs spread across 5 distinct categories (path separators, `Path.home()` vs `HOME` env, NTFS mtime resolution, `'home'` fs alias, path canonicalization). Skipping all of them in this PR would have meant module/class-level `skipif` markers across ~10 files plus 5 sub-issues — substantial scope creep for a single PR.
The `continue-on-error` reframe lets the matrix infrastructure ship now, trades "single PR closes #634" for "ubuntu/macOS gating immediately + concrete Windows backlog", and keeps the Windows triage parallelizable across multiple small follow-up PRs.
Changes
1. `.github/workflows/ci.yml` — `test:` job:
```yaml
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, windows-latest, macos-latest]
runs-on: ${{ matrix.os }}
continue-on-error: ${{ matrix.os == 'windows-latest' }}
```
Other jobs stay ubuntu (per the per-job rationale in #634):
2. `packages/memtomem/tests/test_runtime_paths.py` — three class-level skips for POSIX mode-bit semantics that don't translate to NTFS, tracked in #637. Uses `@pytest.mark.skipif(not hasattr(os, "geteuid"), ...)` to match the existing convention at lines 110, 188, 264 of the same file.
3. `packages/memtomem/src/memtomem/context/_atomic.py` — drive-by Windows-portability fix surfaced by the matrix:
Without this fix, Windows test collection failed at import time on 46 modules (`mm` CLI imports → `context.agents` → `context._atomic`), masking the actual cross-platform test failures we wanted to triage.
Iterate-on-red findings
PR's matrix run after the `_atomic.py` fix:
Test plan
/vs\\) #643 / Windows: HOME monkeypatch doesn't reachPath.home()on Windows (uses USERPROFILE) #644 / Windows: config_signature unchanged after write — NTFS mtime resolution / cache #645 / Windows:'home'fs alias + tilde expansion in web fs-list endpoint #646 / Windows: Path canonicalization in indexing —kinddetection + memory_dir comparisons #647References
/vs\\) #643 (path separator assertions)Path.home()on Windows (uses USERPROFILE) #644 (`Path.home()` / HOME monkeypatch)'home'fs alias + tilde expansion in web fs-list endpoint #646 (`'home'` fs alias)kinddetection + memory_dir comparisons #647 (Path canonicalization in indexing)🤖 Generated with Claude Code