Skip to content

fix(indexing): use os.sep for memory_dir prefix so Windows paths match (#647)#717

Merged
memtomem merged 2 commits intomainfrom
fix/647-windows-memory-dir-prefix
May 2, 2026
Merged

fix(indexing): use os.sep for memory_dir prefix so Windows paths match (#647)#717
memtomem merged 2 commits intomainfrom
fix/647-windows-memory-dir-prefix

Conversation

@memtomem
Copy link
Copy Markdown
Owner

@memtomem memtomem commented May 2, 2026

Summary

  • Make norm_dir_prefix (and its copy-pasted twin in system.py:remove_memory_dir) append os.sep instead of a hardcoded "/", so the trailing-separator prefix matches norm_path's native-separator output on Windows.
  • Fold the inline prefix.endswith("/") block in system.py into a single norm_dir_prefix(resolved) call so the rule lives in one place.

Why

norm_dir_prefix produces a str.startswith prefix from a path that has been through Path.resolve() — which returns native-separator form (C:\Users\foo on Windows, /Users/foo on POSIX). The hardcoded "/" append yielded a mixed-form prefix C:\Users\foo/ on Windows that no chunk source path could ever match under target.startswith(prefix).

Concretely, on Windows this broke:

  • resolve_owning_memory_dir — every chunk classified as orphan (kind=None).
  • memory_dir_stats — per-dir bucketing returned 0 chunks for every dir.
  • /api/sources route — sources list missed kind attribution.
  • /api/memory-dirs/remove with delete_chunks=true — removed 0 rows.

POSIX was unaffected because os.sep == "/" there; the bug only showed up once the Windows leg of the CI matrix went informational (#634 / PR #638).

Approach

Symmetric, minimal:

# engine.py:norm_dir_prefix
base = norm_path(p)
if not base.endswith(os.sep):
    base += os.sep
return base

norm_path itself is not touched — both the write side (sqlite_backend.upsert_chunks) and the read side (startswith comparisons) round-trip through the same Path.resolve() + Unicode NFC normalisation, so drive-letter case, 8.3 short names, and the \\?\ long-path prefix all collapse to a single canonical form via the OS. No DB migration is needed; the persisted chunks.source_file column is bit-for-bit identical to today on every platform.

This mirrors the bug class fixed by #716 / 82feed1 (fix(config): support Windows backslash paths in categorize_memory_dir), but the analog here is os.sep — not .replace("\\", "/") — because the surrounding comparison surface in engine.py and web/routes/* is already native-form on both sides.

What's not in this PR

  • No norm_path change. Forcing forward-slash inside norm_path would invalidate every existing chunks.source_file row on a populated Windows install (Windows CI is currently informational, but the contract is shared with 16 call sites — not worth changing for marginal benefit).
  • No new tests. The 23 currently-failing Windows tests in test_indexing_engine.py::TestResolveOwningMemoryDir (5), ::TestMemoryDirStats (18), and test_web_routes.py::TestSources / ::TestRemoveMemoryDirChunkCleanup / ::TestUploadsUsage are the validation surface — they construct paths via tmp_path (native form on every OS) and assert via Path equality, so they go green on Windows-CI once the prefix uses os.sep. POSIX stays bytewise green.
  • No test-side path-portability sweep. A handful of substring assertions in test_wiki_cmd_override.py still hardcode /skills/... literals — covered separately as a sibling of test: normalize path separators in 7 cross-platform assertions (#643) #711.

Test plan

  • uv run pytest packages/memtomem/tests/test_indexing_engine.py packages/memtomem/tests/test_web_routes.py -m "not ollama" — 253 passed (POSIX no-op confirmed).
  • uv run ruff check packages/memtomem/src/memtomem/indexing/engine.py packages/memtomem/src/memtomem/web/routes/system.py — clean.
  • uv run ruff format --check on the two changed files — clean.
  • grep -rn 'endswith("/")\|prefix.*+= *"/"' packages/memtomem/src/ --include="*.py" — zero hits remaining.
  • Windows-CI leg: 23 listed failures (TestResolveOwningMemoryDir x5, TestMemoryDirStats x18, TestSources/TestRemoveMemoryDirChunkCleanup/TestUploadsUsage) turn green.

Closes #647. Refs #634 (Windows + macOS CI matrix umbrella).

🤖 Generated with Claude Code

pandas-studio and others added 2 commits May 2, 2026 16:00
#647)

`norm_dir_prefix` and a copy-pasted twin in `system.py` built the
trailing-separator prefix by appending a literal "/", but the input
string comes from `norm_path()` which calls `Path.resolve()` and returns
native-separator form on Windows (e.g. `C:\Users\foo`). The result was
a mixed-form prefix `C:\Users\foo/` that no source path under
`target.startswith(prefix)` could ever match — so on Windows
`resolve_owning_memory_dir`, `memory_dir_stats`, and the
`/api/sources` + `/api/memory-dirs/remove` flows all silently misbehaved
(kind=None, deleted_chunks=0, etc.).

Fix is symmetric and minimal:

- engine.py: `norm_dir_prefix` now appends `os.sep` rather than `"/"`.
  `Path.resolve()` and `os.sep` are both native form, so prefix and
  target share the same separator on every OS. POSIX is bytewise
  unchanged (`os.sep == "/"`).
- system.py: the inline prefix-with-trailing-slash block in the
  `delete_chunks=True` branch of `remove_memory_dir` is folded into a
  call to `norm_dir_prefix(resolved)`, so there is one definition of
  the prefix rule.

`norm_path` itself is not touched. Both write side
(`sqlite_backend.upsert_chunks`) and read side go through the same
`Path.resolve()` + NFC normalisation, so drive-letter case, 8.3 short
names, and the `\\?\` long-path prefix all collapse to a single
canonical form via the OS — no DB migration is needed.

The 23 currently-failing Windows tests in
`test_indexing_engine.py::TestResolveOwningMemoryDir` (5),
`::TestMemoryDirStats` (18), and `test_web_routes.py::TestSources` /
`::TestRemoveMemoryDirChunkCleanup` / `::TestUploadsUsage` are the
validation surface; they go green on Windows-CI without test changes
because they construct paths via `tmp_path` (already native form) and
assert via `Path` equality.

Closes #647.
Refs #634 (Windows + macOS CI matrix umbrella).

Co-Authored-By: Claude <[email protected]>
Follow-up on the #647 review: a POSIX-only dev box can't catch a
regression where someone "tidies" `os.sep` back to a literal `"/"` —
the function still returns the right thing on Linux/macOS (where
`os.sep == "/"`) and only breaks on the Windows CI leg.

Add a parametrized unit test that monkey-patches both `norm_path` (to
emit a fake POSIX or Windows shape) and `os.sep`, then asserts the
appended trailing separator matches `os.sep` for the row's platform.
Mutation-tested locally: reverting `os.sep` to `"/"` in
`norm_dir_prefix` makes the two Windows-shape rows fail on macOS,
exactly as intended (per `feedback_pin_invert_symmetric_assertion.md`
and `feedback_claim_test_parity.md`).

No production change; just the pin.

Co-Authored-By: Claude <[email protected]>
@memtomem memtomem merged commit e25da42 into main May 2, 2026
8 of 9 checks passed
@memtomem memtomem deleted the fix/647-windows-memory-dir-prefix branch May 2, 2026 07:18
@github-actions github-actions Bot locked and limited conversation to collaborators May 2, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Windows: Path canonicalization in indexing — kind detection + memory_dir comparisons

2 participants