feat(memory): Add MMR re-ranking and temporal decay for hybrid search#13391
feat(memory): Add MMR re-ranking and temporal decay for hybrid search#13391rodrigouroz wants to merge 4 commits intoopenclaw:mainfrom
Conversation
|
|
||
| // Use original score as tiebreaker (higher is better) | ||
| if ( | ||
| mmrScore > bestMMRScore || | ||
| (mmrScore === bestMMRScore && bestItem && candidate.score > bestItem.score) | ||
| ) { |
There was a problem hiding this comment.
MMR tiebreaker never triggers
In mmrRerank, the tie-break condition can’t select a higher-scoring candidate when bestItem is still null, because it’s gated by bestItem && .... If multiple candidates produce the same mmrScore (common early on when selected is empty, or when many items normalize to the same relevance), the first iterated candidate “wins” due to Set iteration order rather than score.
This should compare against bestItem?.score and allow setting bestItem when bestItem is null (e.g., include !bestItem in the tie path) so deterministic, score-based ordering is preserved on ties.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/memory/mmr.ts
Line: 148:153
Comment:
**MMR tiebreaker never triggers**
In `mmrRerank`, the tie-break condition can’t select a higher-scoring candidate when `bestItem` is still `null`, because it’s gated by `bestItem && ...`. If multiple candidates produce the same `mmrScore` (common early on when `selected` is empty, or when many items normalize to the same relevance), the first iterated candidate “wins” due to Set iteration order rather than score.
This should compare against `bestItem?.score` and allow setting `bestItem` when `bestItem` is null (e.g., include `!bestItem` in the tie path) so deterministic, score-based ordering is preserved on ties.
How can I resolve this? If you propose a fix, please make it concise.
CHANGELOG.md
Outdated
| - Memory: add MMR (Maximal Marginal Relevance) re-ranking for hybrid search diversity. Configurable via `memorySearch.query.hybrid.mmr`. Thanks @rodrigouroz. https://docs.clawd.bot/concepts/memory | ||
| - Memory: add opt-in temporal decay for hybrid search scoring, with configurable half-life via `memorySearch.query.hybrid.temporalDecay`. Thanks @rodrigouroz. | ||
|
|
||
| ## 2026.2.9 | ||
|
|
||
| ### Added |
There was a problem hiding this comment.
Broken / inconsistent docs link
The changelog entry links to https://docs.clawd.bot/concepts/memory, but the repo header above uses https://docs.openclaw.ai and the PR description references /concepts/memory under the OpenClaw docs. This link is very likely incorrect and will send users to the wrong place.
Please update the URL to the correct OpenClaw docs domain/path used elsewhere in the repo.
Prompt To Fix With AI
This is a comment left during a code review.
Path: CHANGELOG.md
Line: 9:14
Comment:
**Broken / inconsistent docs link**
The changelog entry links to `https://docs.clawd.bot/concepts/memory`, but the repo header above uses `https://docs.openclaw.ai` and the PR description references `/concepts/memory` under the OpenClaw docs. This link is very likely incorrect and will send users to the wrong place.
Please update the URL to the correct OpenClaw docs domain/path used elsewhere in the repo.
How can I resolve this? If you propose a fix, please make it concise.
Additional Comments (1)
Align Prompt To Fix With AIThis is a comment left during a code review.
Path: src/memory/mmr.test.ts
Line: 916:920
Comment:
**Default config contradicts feature flag**
`DEFAULT_MMR_CONFIG.enabled` is asserted as `true` here, but the rest of the PR (docs/config defaults and `mergeHybridResults` usage) describes MMR as *opt-in* / default disabled. This test will fail if the runtime default is corrected to match config defaults, and it currently bakes in the opposite behavior.
Align `DEFAULT_MMR_CONFIG.enabled` (and this test) with the intended default (`false`) so “opt-in” behavior is consistent across runtime, schema docs, and tests.
How can I resolve this? If you propose a fix, please make it concise. |
1f16e23 to
5747b23
Compare
|
Check code file sizes is failing but I only added 6 and 13 lines, I don't want to tackle a refactor of those files here. |
|
@greptile-apps can you review again? |
| return Promise.all( | ||
| params.results.map(async (entry) => { | ||
| const cacheKey = `${entry.source}:${entry.path}`; | ||
| if (!timestampCache.has(cacheKey)) { | ||
| const timestamp = await extractTimestamp({ | ||
| filePath: entry.path, | ||
| source: entry.source, | ||
| workspaceDir: params.workspaceDir, | ||
| }); | ||
| timestampCache.set(cacheKey, timestamp); | ||
| } |
There was a problem hiding this comment.
timestampCache doesn't work as intended inside Promise.all — all mapper callbacks start concurrently, so entries with the same source:path will all pass the !timestampCache.has(cacheKey) check before any await extractTimestamp() resolves, resulting in duplicate fs.stat calls.
Not a correctness bug (all calls produce the same result), but the cache is effectively a no-op. If deduplication matters, consider collecting unique cache keys first and resolving them sequentially or via a dedup map of promises.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/memory/temporal-decay.ts
Line: 137:147
Comment:
`timestampCache` doesn't work as intended inside `Promise.all` — all mapper callbacks start concurrently, so entries with the same `source:path` will all pass the `!timestampCache.has(cacheKey)` check before any `await extractTimestamp()` resolves, resulting in duplicate `fs.stat` calls.
Not a correctness bug (all calls produce the same result), but the cache is effectively a no-op. If deduplication matters, consider collecting unique cache keys first and resolving them sequentially or via a dedup map of promises.
How can I resolve this? If you propose a fix, please make it concise.
CHANGELOG.md
Outdated
| - Agents: honor tools.exec.safeBins in exec allowlist checks. (#2281) | ||
| - Docs: tighten Fly private deployment steps. (#2289) Thanks @dguido. |
There was a problem hiding this comment.
These two entries appear to be unrelated to this PR and were added to an already-released version section (2026.1.29). Likely a merge/rebase artifact — consider reverting these lines from this PR's diff.
Prompt To Fix With AI
This is a comment left during a code review.
Path: CHANGELOG.md
Line: 413:414
Comment:
These two entries appear to be unrelated to this PR and were added to an already-released version section (`2026.1.29`). Likely a merge/rebase artifact — consider reverting these lines from this PR's diff.
How can I resolve this? If you propose a fix, please make it concise.|
@greptile-apps please check now and if all good update the PR description with your score |
Adds Maximal Marginal Relevance (MMR) re-ranking to hybrid search results. - New mmr.ts with tokenization, Jaccard similarity, and MMR algorithm - Integrated into mergeHybridResults() with optional mmr config - 40 comprehensive tests covering edge cases and diversity behavior - Configurable lambda parameter (default 0.7) to balance relevance vs diversity - Updated CHANGELOG.md and memory docs This helps avoid redundant results when multiple chunks contain similar content.
Exponential decay (half-life configurable, default 30 days) applied
before MMR re-ranking. Dated daily files (memory/YYYY-MM-DD.md) use
filename date; evergreen files (MEMORY.md, topic files) are not
decayed; other sources fall back to file mtime.
Config: memorySearch.query.hybrid.temporalDecay.{enabled, halfLifeDays}
Default: disabled (backwards compatible, opt-in).
- DEFAULT_MMR_CONFIG.enabled = false (opt-in, was incorrectly true) - Tie-break: handle bestItem === null so first candidate always wins - CHANGELOG URL: docs.clawd.bot → docs.openclaw.ai - Tests updated to pass enabled: true explicitly where needed
0f54bb4 to
85a73d4
Compare
bfc1ccb to
f92900f
Compare
|
Closing as superseded by a refreshed PR rebased/ported to current Replacement PR: #18097 |
Summary
Two complementary, opt-in post-processing features for hybrid memory search that improve result quality without changing existing behavior.
MMR Re-ranking (Maximal Marginal Relevance)
Reduces redundancy by penalizing results too similar to already-selected ones.
memorySearch.query.hybrid.mmr.enabled(default:false)memorySearch.query.hybrid.mmr.lambda(default:0.7, range 0–1)Example — query: "home network setup"
Without MMR, top 3 might be three near-identical snippets about the same router config from different daily notes. With MMR, duplicates drop out and diverse results (DNS setup, reference doc) surface instead.
Temporal Decay (Recency Boost)
Exponential score decay based on memory age, so recent notes rank higher.
memorySearch.query.hybrid.temporalDecay.enabled(default:false)memorySearch.query.hybrid.temporalDecay.halfLifeDays(default:30)memory/YYYY-MM-DD.md) use filename dateMEMORY.md, topic files) are not decayedWith half-life=30: today=100%, 7d=84%, 30d=50%, 90d=12.5%
Pipeline
Design decisions
Tests
127 tests passing across 17 test files. New test coverage:
mmr.test.ts— 40 tests (algorithm, diversity, edge cases)temporal-decay.test.ts— 5 tests (decay math, half-life, evergreen files, mtime fallback, integration)Greptile Overview
Greptile Summary
Adds two opt-in post-processing stages to the hybrid memory search pipeline: MMR re-ranking for result diversity (reduces near-duplicate snippets) and temporal decay for recency-aware scoring (exponential decay based on memory age). Both are disabled by default and fully backwards compatible.
src/memory/mmr.ts(MMR algorithm with Jaccard similarity),src/memory/temporal-decay.ts(exponential decay with evergreen file exemption)mergeHybridResultsinhybrid.tschanged from sync to async to accommodate filesystem calls in temporal decaymemory-search.tsconfig resolution with proper validation/clampingWeighted Merge → Temporal Decay → Sort → MMR → Top-K?? -Infinityfallback) and thetimestampCacherace condition (now fixed by cachingPromiseobjects instead of resolved values)Confidence Score: 4/5
mergeHybridResultsis the main surface area for regressions, and existing tests were correctly updated. No critical issues found beyond the previously flagged items which appear addressed.mmr.ts,temporal-decay.ts) are well-tested and the integration points inhybrid.tsandmanager.tsare minimal.