Skip to content

ci: matrix infrastructure + macOS gating; Windows informational (#634)#638

Merged
memtomem merged 5 commits intomainfrom
ci/test-matrix-windows-macos
May 1, 2026
Merged

ci: matrix infrastructure + macOS gating; Windows informational (#634)#638
memtomem merged 5 commits intomainfrom
ci/test-matrix-windows-macos

Conversation

@memtomem
Copy link
Copy Markdown
Owner

@memtomem memtomem commented May 1, 2026

Summary

Lands the CI matrix infrastructure for issue #634 — `test:` job now runs on `[ubuntu-latest, windows-latest, macos-latest]` with `fail-fast: false` so all three cells report per push. ubuntu + macOS are gating-green; Windows is informational (`continue-on-error`) while the cross-platform test triage tracked in #637 / #643 / #644 / #645 / #646 / #647 lands incrementally.

PR-A (#636) cleared the predictable POSIX literal noise. This PR adds the matrix, a class-level skip on `test_runtime_paths.py` for the POSIX mode-bit XDG tests (#637), and a portability fix to `memtomem.context._atomic` whose unconditional `import fcntl` blocked Windows test collection on the first matrix run.

#634 stays open as the umbrella; it closes when each Windows triage sub-issue closes and we flip `continue-on-error` off in a tiny follow-up PR.

Why scope was reframed mid-PR

The plan called for "fix in-place if cheap, otherwise platform-skip + sub-issue". On the first matrix push, Windows reported 121 failures across 23 test files — not noise, real Windows-specific bugs spread across 5 distinct categories (path separators, `Path.home()` vs `HOME` env, NTFS mtime resolution, `'home'` fs alias, path canonicalization). Skipping all of them in this PR would have meant module/class-level `skipif` markers across ~10 files plus 5 sub-issues — substantial scope creep for a single PR.

The `continue-on-error` reframe lets the matrix infrastructure ship now, trades "single PR closes #634" for "ubuntu/macOS gating immediately + concrete Windows backlog", and keeps the Windows triage parallelizable across multiple small follow-up PRs.

Changes

1. `.github/workflows/ci.yml` — `test:` job:

```yaml
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, windows-latest, macos-latest]
runs-on: ${{ matrix.os }}
continue-on-error: ${{ matrix.os == 'windows-latest' }}
```

Other jobs stay ubuntu (per the per-job rationale in #634):

Job OS Why
`lint` ubuntu Ruff is platform-agnostic
`typecheck` ubuntu mypy is platform-agnostic
`test` matrix (ubuntu/windows/macos) The actual surface
`test-golden-path` ubuntu (for now) ONNX/fastembed cross-platform deserves its own focused PR; cache key already uses `runner.os`
`test-llm-nightly` ubuntu Opt-in nightly
`notebooks` ubuntu Jupyter+Windows quirks; defer

2. `packages/memtomem/tests/test_runtime_paths.py` — three class-level skips for POSIX mode-bit semantics that don't translate to NTFS, tracked in #637. Uses `@pytest.mark.skipif(not hasattr(os, "geteuid"), ...)` to match the existing convention at lines 110, 188, 264 of the same file.

3. `packages/memtomem/src/memtomem/context/_atomic.py` — drive-by Windows-portability fix surfaced by the matrix:

  • Module-level `import fcntl` → `if sys.platform == "win32": import msvcrt; else: import fcntl` (matches the existing `indexing/debounce.py` and `cli/_liveness.py` pattern).
  • `_file_lock`: POSIX still uses `fcntl.flock`; Windows uses `msvcrt.locking` (mandatory byte-range lock at offset 0). Documented timeout difference (`msvcrt.locking(LK_LOCK)` is ~10s vs `flock(LOCK_EX)` indefinite).
  • `atomic_write_bytes`: `os.fchmod` was added to Windows in Python 3.13; CI runs Python 3.12. Guarded with `hasattr(os, "fchmod")`. NTFS ACL inheritance from `%LOCALAPPDATA%`-style parents preserves the `mode=0o600` user-private intent in practice.

Without this fix, Windows test collection failed at import time on 46 modules (`mm` CLI imports → `context.agents` → `context._atomic`), masking the actual cross-platform test failures we wanted to triage.

Iterate-on-red findings

PR's matrix run after the `_atomic.py` fix:

  • `test (ubuntu-latest)`: ✅ 3554 pass
  • `test (macos-latest)`: ✅ 3554 pass
  • `test (windows-latest)`: ⚠️ 3399 pass / 121 fail / 34 skip — see ci: add Windows runner to CI matrix #634 for the triage breakdown by category and tracker

Test plan

References

🤖 Generated with Claude Code

pandas-studio and others added 3 commits May 1, 2026 13:03
…e green)

Promote Windows + macOS from implicitly-unsupported to CI-tested by
matrixing the `test:` job. The first run is expected to surface
latent breakage that's been hiding behind the ubuntu-only blind spot —
that's the point. PR-A (#636) already cleared the predictable POSIX
literal noise; what surfaces now should be real Windows/macOS issues.

## Per-job rationale

Only `test:` gets the matrix. Other jobs stay ubuntu:

- `lint` / `typecheck`: platform-agnostic; matrix would just slow PR
  feedback for no signal.
- `test-llm-nightly`: opt-in nightly, not on PR fast path.
- `notebooks`: low priority — Jupyter on Windows has its own quirks;
  defer until base `test:` is green.
- `test-golden-path`: the cache key already uses `${{ runner.os }}`,
  but ONNX/fastembed cross-platform behavior deserves its own focused
  follow-up. Defer until we know how much wallclock the matrix adds.

`fail-fast: false` keeps all 3 cells running so we triage the full
picture per PR push instead of ratchet-ing one OS at a time.

Action pins (`actions/checkout@v6`, `astral-sh/setup-uv@v7`) preserved
to match the rest of the file. `setup-uv` is one major behind latest
(v8.1.0); bumping is a separate concern — mixing it here would muddle
the red-on-first-run signal.

## Pre-emptive Windows skips on `test_runtime_paths.py` (#637)

Three test classes assume POSIX semantics that don't translate cleanly
to NTFS:

- `TestRuntimeDir` — the shared `_make_safe_xdg` fixture does
  `os.chmod(xdg, 0o700)`. NTFS reports synthesized POSIX bits that
  don't match `0o700`, so `_is_safe_dir` rejects the dir and the XDG
  branch silently falls through to tempdir — breaking tests that
  assert the XDG path.
- `TestEnsureRuntimeDir` — direct mode-bit assertions
  (`S_IMODE(...) == 0o700`), `os.umask(0o177)` (Windows ignores
  umask), 0o755-vs-0o700 distinction.
- `TestServerPidPath` — also routes through `_make_safe_xdg`.

Skipped via `@pytest.mark.skipif(not hasattr(os, "geteuid"), ...)` —
the existing convention in this same file (lines 110, 188, 264 use
the identical idiom for `geteuid()`-based owner checks). Functionally
equivalent to `sys.platform == "win32"` for our purposes; matching
the local idiom keeps the file coherent.

`TestLegacyServerPidPath` and `TestUidFallback` left unmarked —
the latter is already POSIX-gated; the former uses HOME monkeypatch
which may or may not work on Windows (let CI tell us, fix in iterate
phase if needed).

The runtime code itself
(`packages/memtomem/src/memtomem/_runtime_paths.py`) already has a
Windows-aware path through `tempfile.gettempdir()` + `uid=0`, but its
NTFS behavior is unverified — that's what #637 tracks.

## Iterate-on-red expectations

Acceptance per #634 allows triaged failures to ship as either fix or
skip + tracking sub-issue. As Windows/macOS reds surface on this PR's
matrix run, expect follow-up commits with either:

- in-place cross-platform fix (cheap cases), or
- platform skip + sub-issue, linked from this PR.

Don't merge until all 3 cells are green or remaining reds are
explicitly tracked.

## Verification

- [x] `uv run ruff check ... && uv run ruff format --check ...` clean
- [x] `uv run pytest --ignore=packages/memtomem/tests/test_golden_path.py -m "not ollama and not llm"` — 3554 passed locally on macOS dev shell (skipif markers inactive since `hasattr(os, "geteuid")` is True)
- [ ] All 3 matrix cells green on the PR's `test` job

Refs #634, #637

Co-Authored-By: Claude <[email protected]>
…r guard

PR-B's first matrix run (#638) showed Windows fails import-time on 46
test modules with a single root cause: \`memtomem.context._atomic\`
unconditionally \`import fcntl\` at module top, and \`fcntl\` is
POSIX-only. The error chains through every \`mm\` CLI entry point
(\`cli/__init__.py:33\` → \`context_cmd:9\` → \`context.agents:43\`
→ \`context._atomic:23\`), so even tests that don't touch
\`_file_lock\` couldn't be collected.

Fixes (in order of severity):

1. Module-level \`import fcntl\` → conditional on \`sys.platform\`,
   matching the existing in-house pattern in
   \`indexing/debounce.py\` and \`cli/_liveness.py\`. On Windows we
   import \`msvcrt\` instead.

2. \`_file_lock\` — POSIX still uses \`fcntl.flock(LOCK_EX)\`. Windows
   uses \`msvcrt.locking(LK_LOCK, 1)\` for a byte-range lock at offset 0.
   The semantic differences (advisory vs mandatory, whole-file vs
   1-byte) don't matter for our use case — the lockfile is private to
   this contextmanager, all contenders go through the same code path,
   so they serialize correctly. Concurrency contract from PR #548
   (the 20-thread test pinning sidecar lockfile correctness) is
   preserved on both platforms.

3. \`atomic_write_bytes\` — \`os.fchmod\` was added to Windows in
   Python 3.13; CI runs Python 3.12 on Windows, where the call would
   raise \`AttributeError\`. Guarded with \`hasattr(os, "fchmod")\`.
   On Windows Python < 3.13 the file is created with default
   permissions, which NTFS largely ignores beyond the read-only flag
   anyway — the integrity guarantee (atomic-replace, no
   half-written file) is unchanged.

Other fcntl call sites in src already had the right pattern
(\`indexing/debounce.py:155-181\`, \`cli/_liveness.py:40-74\` —
both use \`if sys.platform != "win32"\` + lazy import inside the
guard, with documented "Windows = no-op" trade-offs).
\`server/__init__.py:265,297\` are inside server-startup functions
that don't run during test collection, so they don't block the
matrix; tracking those for a follow-up if \`mm\` server runs
land in scope.

Locally: 3554 unit tests still pass on macOS (POSIX path
unchanged); CI matrix run on this PR will exercise the Windows
branch.

Disclosure: this is a drive-by Windows-portability fix surfaced by
PR-B's matrix expansion — kept in this PR rather than split off
because (a) without it, no Windows test collects, so PR-B's red
signal is unreadable; (b) the change is small (~20 lines, one
file, no new dependency); (c) it follows an established in-house
pattern.

Refs #634

Co-Authored-By: Claude <[email protected]>
Reframes PR-B from "close #634" to "land matrix infrastructure +
gating ubuntu/macOS green; Windows informational while triage lands":

1. ``continue-on-error: ${{ matrix.os == 'windows-latest' }}`` on the
   ``test:`` job. ubuntu + macOS remain required-green; the Windows
   cell runs and reports but doesn't block the PR. Flip to false (or
   delete the line) when the Windows triage sub-issues close.
   Per-job inline comment points at #634 (umbrella).

2. Two review-flagged docstring nits in ``_atomic.py``:

   - ``_file_lock``: call out the ~10s timeout of
     ``msvcrt.locking(LK_LOCK)`` vs POSIX ``flock(LOCK_EX)``'s
     indefinite block. Lock window narrow enough that 10s is
     generous, but heavy I/O contention (AV scans, interactive
     debuggers) could surface as transient ``OSError``.

   - ``atomic_write_bytes``: explain why dropping ``os.fchmod`` on
     Windows Python 3.12 doesn't compromise the user-private intent
     of ``mode=0o600`` — NTFS ACL inheritance from
     ``%LOCALAPPDATA%``-style parents already enforces user-only
     access in practice.

No behavior change beyond the matrix gating.

Refs #634

Co-Authored-By: Claude <[email protected]>
@memtomem memtomem merged commit ada6c3b into main May 1, 2026
8 of 9 checks passed
@memtomem memtomem deleted the ci/test-matrix-windows-macos branch May 1, 2026 07:23
@github-actions github-actions Bot locked and limited conversation to collaborators May 1, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ci: add Windows runner to CI matrix

2 participants