fix(indexing): use os.sep for memory_dir prefix so Windows paths match (#647)#717
Merged
fix(indexing): use os.sep for memory_dir prefix so Windows paths match (#647)#717
Conversation
#647) `norm_dir_prefix` and a copy-pasted twin in `system.py` built the trailing-separator prefix by appending a literal "/", but the input string comes from `norm_path()` which calls `Path.resolve()` and returns native-separator form on Windows (e.g. `C:\Users\foo`). The result was a mixed-form prefix `C:\Users\foo/` that no source path under `target.startswith(prefix)` could ever match — so on Windows `resolve_owning_memory_dir`, `memory_dir_stats`, and the `/api/sources` + `/api/memory-dirs/remove` flows all silently misbehaved (kind=None, deleted_chunks=0, etc.). Fix is symmetric and minimal: - engine.py: `norm_dir_prefix` now appends `os.sep` rather than `"/"`. `Path.resolve()` and `os.sep` are both native form, so prefix and target share the same separator on every OS. POSIX is bytewise unchanged (`os.sep == "/"`). - system.py: the inline prefix-with-trailing-slash block in the `delete_chunks=True` branch of `remove_memory_dir` is folded into a call to `norm_dir_prefix(resolved)`, so there is one definition of the prefix rule. `norm_path` itself is not touched. Both write side (`sqlite_backend.upsert_chunks`) and read side go through the same `Path.resolve()` + NFC normalisation, so drive-letter case, 8.3 short names, and the `\\?\` long-path prefix all collapse to a single canonical form via the OS — no DB migration is needed. The 23 currently-failing Windows tests in `test_indexing_engine.py::TestResolveOwningMemoryDir` (5), `::TestMemoryDirStats` (18), and `test_web_routes.py::TestSources` / `::TestRemoveMemoryDirChunkCleanup` / `::TestUploadsUsage` are the validation surface; they go green on Windows-CI without test changes because they construct paths via `tmp_path` (already native form) and assert via `Path` equality. Closes #647. Refs #634 (Windows + macOS CI matrix umbrella). Co-Authored-By: Claude <[email protected]>
Follow-up on the #647 review: a POSIX-only dev box can't catch a regression where someone "tidies" `os.sep` back to a literal `"/"` — the function still returns the right thing on Linux/macOS (where `os.sep == "/"`) and only breaks on the Windows CI leg. Add a parametrized unit test that monkey-patches both `norm_path` (to emit a fake POSIX or Windows shape) and `os.sep`, then asserts the appended trailing separator matches `os.sep` for the row's platform. Mutation-tested locally: reverting `os.sep` to `"/"` in `norm_dir_prefix` makes the two Windows-shape rows fail on macOS, exactly as intended (per `feedback_pin_invert_symmetric_assertion.md` and `feedback_claim_test_parity.md`). No production change; just the pin. Co-Authored-By: Claude <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
norm_dir_prefix(and its copy-pasted twin insystem.py:remove_memory_dir) appendos.sepinstead of a hardcoded"/", so the trailing-separator prefix matchesnorm_path's native-separator output on Windows.prefix.endswith("/")block insystem.pyinto a singlenorm_dir_prefix(resolved)call so the rule lives in one place.Why
norm_dir_prefixproduces astr.startswithprefix from a path that has been throughPath.resolve()— which returns native-separator form (C:\Users\fooon Windows,/Users/fooon POSIX). The hardcoded"/"append yielded a mixed-form prefixC:\Users\foo/on Windows that no chunk source path could ever match undertarget.startswith(prefix).Concretely, on Windows this broke:
resolve_owning_memory_dir— every chunk classified as orphan (kind=None).memory_dir_stats— per-dir bucketing returned 0 chunks for every dir./api/sourcesroute — sources list missedkindattribution./api/memory-dirs/removewithdelete_chunks=true— removed 0 rows.POSIX was unaffected because
os.sep == "/"there; the bug only showed up once the Windows leg of the CI matrix went informational (#634 / PR #638).Approach
Symmetric, minimal:
norm_pathitself is not touched — both the write side (sqlite_backend.upsert_chunks) and the read side (startswithcomparisons) round-trip through the samePath.resolve()+ Unicode NFC normalisation, so drive-letter case, 8.3 short names, and the\\?\long-path prefix all collapse to a single canonical form via the OS. No DB migration is needed; the persistedchunks.source_filecolumn is bit-for-bit identical to today on every platform.This mirrors the bug class fixed by #716 /
82feed1(fix(config): support Windows backslash paths in categorize_memory_dir), but the analog here isos.sep— not.replace("\\", "/")— because the surrounding comparison surface inengine.pyandweb/routes/*is already native-form on both sides.What's not in this PR
norm_pathchange. Forcing forward-slash insidenorm_pathwould invalidate every existingchunks.source_filerow on a populated Windows install (Windows CI is currently informational, but the contract is shared with 16 call sites — not worth changing for marginal benefit).test_indexing_engine.py::TestResolveOwningMemoryDir(5),::TestMemoryDirStats(18), andtest_web_routes.py::TestSources/::TestRemoveMemoryDirChunkCleanup/::TestUploadsUsageare the validation surface — they construct paths viatmp_path(native form on every OS) and assert viaPathequality, so they go green on Windows-CI once the prefix usesos.sep. POSIX stays bytewise green.test_wiki_cmd_override.pystill hardcode/skills/...literals — covered separately as a sibling of test: normalize path separators in 7 cross-platform assertions (#643) #711.Test plan
uv run pytest packages/memtomem/tests/test_indexing_engine.py packages/memtomem/tests/test_web_routes.py -m "not ollama"— 253 passed (POSIX no-op confirmed).uv run ruff check packages/memtomem/src/memtomem/indexing/engine.py packages/memtomem/src/memtomem/web/routes/system.py— clean.uv run ruff format --checkon the two changed files — clean.grep -rn 'endswith("/")\|prefix.*+= *"/"' packages/memtomem/src/ --include="*.py"— zero hits remaining.TestResolveOwningMemoryDirx5,TestMemoryDirStatsx18,TestSources/TestRemoveMemoryDirChunkCleanup/TestUploadsUsage) turn green.Closes #647. Refs #634 (Windows + macOS CI matrix umbrella).
🤖 Generated with Claude Code