feat(search): add temporal-validity filter stage to pipeline (Goal 4)#534
Merged
feat(search): add temporal-validity filter stage to pipeline (Goal 4)#534
Conversation
…Goal 4) Implements RFC §Pipeline integration: chunks tagged with frontmatter ``valid_from`` / ``valid_to`` are now filtered out of search results when the request's ``as_of`` timestamp falls outside their validity window. ``(None, None)`` chunks remain always-valid (opt-in default). Why this PR: - Goal 1+2+3 (#533) wired the metadata through the indexer and made ``bm25_search`` / ``dense_search`` round-trip the new columns. The filter stage was the next atomic step — chunks already carry the window, but nothing was acting on it. - Decoupled from Goal 5+6 (``mem_search(as_of=...)`` + ``mm search --as-of`` + ``mm list`` validity column) so review can focus on the filter semantics alone. Pipeline placement (β position, AND with source/tag filter): ... → cross-encoder rerank → source/tag filter → validity_filter → time-decay → MMR → ... Cache semantics (RFC §Comparison semantics + cache_ttl interaction): - ``as_of_unix=None`` (default) → ``int(time.time())`` fallback; result lands in the existing TTL cache. Up-to-cache_ttl staleness near a date boundary is acceptable because RFC bounds are date-only (24h granularity). - Explicit ``as_of_unix`` → bypasses BOTH cache read and cache write, so historical queries (e.g. "as of last week") never poison the default-path cache slot. Tests added (15 new): - ``TestApplyValidityFilter`` — pure helper unit (inside / before / after / boundary-lower / boundary-upper / half-bounded {lower, upper} / always-valid / order-preserved / mixed-input). - ``TestValidityFilterPipelineWiring`` — pipeline integration via ``SearchPipeline.search`` with AsyncMock storage: default uses ``time.time()``; explicit ``as_of_unix`` filters historical window; AND with source filter; default-path caches; explicit-path bypasses cache read AND write. CLAUDE.md "Search pipeline order" invariant updated to include the new stage. Goal 5+6 (``mem_search(as_of=...)`` API + CLI surfaces) and Goal 7 (Web UI badge) remain in separate PRs per RFC §Implementation sketch. Co-Authored-By: Claude <[email protected]>
Review feedback on PR #534: - Remove ``assert pipeline_mod._apply_validity_filter is not None`` from ``test_default_uses_current_time``. The symbol is imported a few lines above so it cannot be ``None`` — the assertion only verifies its own existence and adds no signal beyond what the chunk-content equality already proves. - Add a docstring note to ``test_default_path_caches_filtered_result`` explaining that pinning ``time.time`` to a constant intentionally disables the TTL boundary; the test asserts the cache reuse path (one storage call across two searches), not TTL expiry. Prevents a reader from misreading the monkeypatch as a TTL exercise. Co-Authored-By: Claude <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements RFC §Pipeline integration of the temporal-validity feature.
Goal 1+2+3 (#533) wired the metadata through the indexer; this PR adds
the actual filter stage so chunks with expired / not-yet-active
windows are dropped from search results.
_apply_validity_filterhelper insearch/pipeline.py—inclusive both ends,
None= unbounded,(None, None)= always-valid.SearchPipeline.search()at the β position: AND-combined withsource/tag filter, before time-decay. Pipeline order:
... → rerank → source/tag filter → validity_filter → time-decay → MMR → ...as_of_unix: int | None = Noneparameter on the pipeline entrypoint.
Nonefalls back toint(time.time()); explicit values bypasscache read AND write.
Cache semantics (Option A — explicit bypass, default cached)
as_of_unix=None(default)as_of_unix=<explicit>Granularity note in the helper docstring documents the cache-TTL ↔
date-boundary trade-off so future readers don't rediscover it.
Scope (atomic PR)
mem_search(as_of=...)MCP API,mm search --as-ofCLI,mm listvalidity column (Goal 5+6), Web UI badge (Goal 7) — each separate PR.
Tests added (15)
TestApplyValidityFilter(10) — pure helper unit:valid_fromonly,valid_toonly)(None, None)TestValidityFilterPipelineWiring(5) —SearchPipeline.searchintegration:as_of_unix=Noneusestime.time()(monkeypatch fixed instant)as_of_unixfilters window correctlysource_filter— chunk must pass bothbm25_searchinvoked once across two calls)as_of_unixbypasses cache read AND writeLocal sweep: 3063 passed, 46 deselected (ollama). ruff clean. mypy
advisory clean for
pipeline.py.Test plan
(manual smoke after merge — Goal 5+6 will add the surface to test it
end-to-end via
mm search --as-of)🤖 Generated with Claude Code