You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sibling of #647 (closed via #717) but in a different code path: the search pipeline's source_filter parameter does plain substring matching against the chunk's source_file, with no separator normalisation. On Windows that means a caller-supplied filter like /tmp/keep/ (which a test or a user types in POSIX form) never matches a chunk whose stored source_file is \tmp\keep\policy.md — separators differ, substring fails.
The test sets source_filter="/tmp/keep/" and indexes a chunk with source_file=Path("/tmp/keep/policy.md"). On Windows str(WindowsPath("/tmp/keep/policy.md")) is \tmp\keep\policy.md; the substring "/tmp/keep/" is not in "\tmp\keep\policy.md" → the filter rejects the chunk → empty result.
This is not what test_and_with_source_filter is exercising — it's exercising the AND between validity and source filter, but the test never gets a chance to assert AND-semantics because the source filter alone already rejects everything on Windows.
#717 normalised the trailing separator inside norm_dir_prefix (os.sep instead of a hardcoded "/"). That helper is only used by engine.py:resolve_owning_memory_dir, engine.py:memory_dir_stats, and the delete_chunks=true branch of web/routes/system.py:remove_memory_dir. Search-pipeline source_filter is a separate code path: it bypasses norm_dir_prefix and does its own ad-hoc in str(...) / fnmatch(...) matching.
norm_path itself is symmetric (write side and read side both go through Path.resolve() + NFC, so the stored chunks.source_file is in native form on every OS), so the asymmetry is purely the caller's input — they pass a separator-naive substring/glob.
Fix direction
Two options, equivalent in correctness:
A) Caller-side normalise — the MemSearchInput / CLI / API layer that accepts source_filter should normalise separators to native form before passing in (e.g. source_filter.replace("/", os.sep) when not a glob, or the equivalent for fnmatch patterns). One layer to change, but every entry point has to remember.
B) Match-side normalise (recommended) — in _match_source and expire_chunks, normalise both filter_str and source_path to a single canonical form before comparing. Mirrors PR #716's read-time .replace("\\", "/") pattern in categorize_memory_dir. One helper, all callers covered. Suggested:
def_match_source(filter_str: str, source_path: str) ->bool:
"""Match source_filter: glob when pattern chars present, substring otherwise. Both sides are folded to forward-slash form before comparing so a user-typed `/tmp/keep/` matches a Windows-stored `\\tmp\\keep\\file` (#NNN, sibling of #647). """norm_filter=filter_str.replace("\\", "/")
norm_source=source_path.replace("\\", "/")
ifany(cinnorm_filterforcin ("*", "?", "[")):
returnfnmatch(norm_source, norm_filter)
returnnorm_filterinnorm_source
The decay.py:expire_chunks site should call into the same helper (currently inlined).
Test plan
The Windows-CI failure of test_temporal_validity::test_and_with_source_filter is the existing validation surface — should turn green once _match_source normalises.
Background
Sibling of #647 (closed via #717) but in a different code path: the search pipeline's
source_filterparameter does plain substring matching against the chunk'ssource_file, with no separator normalisation. On Windows that means a caller-supplied filter like/tmp/keep/(which a test or a user types in POSIX form) never matches a chunk whose storedsource_fileis\tmp\keep\policy.md— separators differ, substring fails.Affected code
packages/memtomem/src/memtomem/search/pipeline.py:40-44—_match_source:packages/memtomem/src/memtomem/search/decay.py:111-112—expire_chunks:Both compare a user/caller-supplied substring against a stored native-form path. Neither normalises separators on either side.
Symptom on Windows CI
The test sets
source_filter="/tmp/keep/"and indexes a chunk withsource_file=Path("/tmp/keep/policy.md"). On Windowsstr(WindowsPath("/tmp/keep/policy.md"))is\tmp\keep\policy.md; the substring"/tmp/keep/"is not in"\tmp\keep\policy.md"→ the filter rejects the chunk → empty result.This is not what
test_and_with_source_filteris exercising — it's exercising the AND between validity and source filter, but the test never gets a chance to assert AND-semantics because the source filter alone already rejects everything on Windows.Why this didn't get fixed by #717
#717 normalised the trailing separator inside
norm_dir_prefix(os.sepinstead of a hardcoded"/"). That helper is only used byengine.py:resolve_owning_memory_dir,engine.py:memory_dir_stats, and thedelete_chunks=truebranch ofweb/routes/system.py:remove_memory_dir. Search-pipelinesource_filteris a separate code path: it bypassesnorm_dir_prefixand does its own ad-hocin str(...)/fnmatch(...)matching.norm_pathitself is symmetric (write side and read side both go throughPath.resolve()+ NFC, so the storedchunks.source_fileis in native form on every OS), so the asymmetry is purely the caller's input — they pass a separator-naive substring/glob.Fix direction
Two options, equivalent in correctness:
A) Caller-side normalise — the
MemSearchInput/ CLI / API layer that acceptssource_filtershould normalise separators to native form before passing in (e.g.source_filter.replace("/", os.sep)when not a glob, or the equivalent forfnmatchpatterns). One layer to change, but every entry point has to remember.B) Match-side normalise (recommended) — in
_match_sourceandexpire_chunks, normalise bothfilter_strandsource_pathto a single canonical form before comparing. Mirrors PR #716's read-time.replace("\\", "/")pattern incategorize_memory_dir. One helper, all callers covered. Suggested:The
decay.py:expire_chunkssite should call into the same helper (currently inlined).Test plan
test_temporal_validity::test_and_with_source_filteris the existing validation surface — should turn green once_match_sourcenormalises.TestNormDirPrefix)..replace("\\", "/")on a POSIX path is identity.Refs
kinddetection + memory_dir comparisons #647 — sibling (closed via fix(indexing): use os.sep for memory_dir prefix so Windows paths match (#647) #717), same bug class inengine.norm_dir_prefixos.sepfor trailing separator on the prefix)categorize_memory_dirread-time.replace("\\", "/"))🤖 Generated with Claude Code