Skip to content

feat(memory): Add MMR re-ranking and temporal decay for hybrid search#13391

Closed
rodrigouroz wants to merge 4 commits intoopenclaw:mainfrom
rodrigouroz:feat/mmr-reranking-clean
Closed

feat(memory): Add MMR re-ranking and temporal decay for hybrid search#13391
rodrigouroz wants to merge 4 commits intoopenclaw:mainfrom
rodrigouroz:feat/mmr-reranking-clean

Conversation

@rodrigouroz
Copy link
Copy Markdown
Contributor

@rodrigouroz rodrigouroz commented Feb 10, 2026

Summary

Two complementary, opt-in post-processing features for hybrid memory search that improve result quality without changing existing behavior.

MMR Re-ranking (Maximal Marginal Relevance)

Reduces redundancy by penalizing results too similar to already-selected ones.

  • Config: memorySearch.query.hybrid.mmr.enabled (default: false)
  • Lambda: memorySearch.query.hybrid.mmr.lambda (default: 0.7, range 0–1)
  • Uses cosine + Jaccard similarity

Example — query: "home network setup"

Without MMR, top 3 might be three near-identical snippets about the same router config from different daily notes. With MMR, duplicates drop out and diverse results (DNS setup, reference doc) surface instead.

Temporal Decay (Recency Boost)

Exponential score decay based on memory age, so recent notes rank higher.

  • Config: memorySearch.query.hybrid.temporalDecay.enabled (default: false)
  • Half-life: memorySearch.query.hybrid.temporalDecay.halfLifeDays (default: 30)
  • Dated files (memory/YYYY-MM-DD.md) use filename date
  • Evergreen files (MEMORY.md, topic files) are not decayed
  • Other sources fall back to file mtime

With half-life=30: today=100%, 7d=84%, 30d=50%, 90d=12.5%

Pipeline

Vector + Keyword → Weighted Merge → Temporal Decay → Sort → MMR → Top-K

Design decisions

  • Both opt-in (default disabled) — fully backwards compatible
  • Both independent — enable one, the other, or both
  • Applied at different pipeline stages (no conflict)
  • Config wired through zod schema, types, field labels/help

Tests

127 tests passing across 17 test files. New test coverage:

  • mmr.test.ts — 40 tests (algorithm, diversity, edge cases)
  • temporal-decay.test.ts — 5 tests (decay math, half-life, evergreen files, mtime fallback, integration)

Greptile Overview

Greptile Summary

Adds two opt-in post-processing stages to the hybrid memory search pipeline: MMR re-ranking for result diversity (reduces near-duplicate snippets) and temporal decay for recency-aware scoring (exponential decay based on memory age). Both are disabled by default and fully backwards compatible.

  • New files: src/memory/mmr.ts (MMR algorithm with Jaccard similarity), src/memory/temporal-decay.ts (exponential decay with evergreen file exemption)
  • mergeHybridResults in hybrid.ts changed from sync to async to accommodate filesystem calls in temporal decay
  • Config wired through zod schema, types, and memory-search.ts config resolution with proper validation/clamping
  • Pipeline order: Weighted Merge → Temporal Decay → Sort → MMR → Top-K
  • Tests: comprehensive coverage for both features including edge cases
  • Previous review threads identified the MMR tiebreaker bug (now fixed via ?? -Infinity fallback) and the timestampCache race condition (now fixed by caching Promise objects instead of resolved values)

Confidence Score: 4/5

  • This PR is safe to merge — both features are opt-in (disabled by default) with no changes to existing behavior.
  • Well-structured, opt-in features with comprehensive test coverage. The code is clean with proper input validation, clamping, and edge case handling. The sync-to-async change in mergeHybridResults is the main surface area for regressions, and existing tests were correctly updated. No critical issues found beyond the previously flagged items which appear addressed.
  • No files require special attention. The core algorithm files (mmr.ts, temporal-decay.ts) are well-tested and the integration points in hybrid.ts and manager.ts are minimal.

@openclaw-barnacle openclaw-barnacle bot added docs Improvements or additions to documentation agents Agent runtime and tooling labels Feb 10, 2026
Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +148 to +153

// Use original score as tiebreaker (higher is better)
if (
mmrScore > bestMMRScore ||
(mmrScore === bestMMRScore && bestItem && candidate.score > bestItem.score)
) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MMR tiebreaker never triggers

In mmrRerank, the tie-break condition can’t select a higher-scoring candidate when bestItem is still null, because it’s gated by bestItem && .... If multiple candidates produce the same mmrScore (common early on when selected is empty, or when many items normalize to the same relevance), the first iterated candidate “wins” due to Set iteration order rather than score.

This should compare against bestItem?.score and allow setting bestItem when bestItem is null (e.g., include !bestItem in the tie path) so deterministic, score-based ordering is preserved on ties.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/memory/mmr.ts
Line: 148:153

Comment:
**MMR tiebreaker never triggers**

In `mmrRerank`, the tie-break condition can’t select a higher-scoring candidate when `bestItem` is still `null`, because it’s gated by `bestItem && ...`. If multiple candidates produce the same `mmrScore` (common early on when `selected` is empty, or when many items normalize to the same relevance), the first iterated candidate “wins” due to Set iteration order rather than score.

This should compare against `bestItem?.score` and allow setting `bestItem` when `bestItem` is null (e.g., include `!bestItem` in the tie path) so deterministic, score-based ordering is preserved on ties.

How can I resolve this? If you propose a fix, please make it concise.

CHANGELOG.md Outdated
Comment on lines 9 to 14
- Memory: add MMR (Maximal Marginal Relevance) re-ranking for hybrid search diversity. Configurable via `memorySearch.query.hybrid.mmr`. Thanks @rodrigouroz. https://docs.clawd.bot/concepts/memory
- Memory: add opt-in temporal decay for hybrid search scoring, with configurable half-life via `memorySearch.query.hybrid.temporalDecay`. Thanks @rodrigouroz.

## 2026.2.9

### Added
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broken / inconsistent docs link

The changelog entry links to https://docs.clawd.bot/concepts/memory, but the repo header above uses https://docs.openclaw.ai and the PR description references /concepts/memory under the OpenClaw docs. This link is very likely incorrect and will send users to the wrong place.

Please update the URL to the correct OpenClaw docs domain/path used elsewhere in the repo.

Prompt To Fix With AI
This is a comment left during a code review.
Path: CHANGELOG.md
Line: 9:14

Comment:
**Broken / inconsistent docs link**

The changelog entry links to `https://docs.clawd.bot/concepts/memory`, but the repo header above uses `https://docs.openclaw.ai` and the PR description references `/concepts/memory` under the OpenClaw docs. This link is very likely incorrect and will send users to the wrong place.

Please update the URL to the correct OpenClaw docs domain/path used elsewhere in the repo.

How can I resolve this? If you propose a fix, please make it concise.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Feb 10, 2026

Additional Comments (1)

src/memory/mmr.test.ts
Default config contradicts feature flag

DEFAULT_MMR_CONFIG.enabled is asserted as true here, but the rest of the PR (docs/config defaults and mergeHybridResults usage) describes MMR as opt-in / default disabled. This test will fail if the runtime default is corrected to match config defaults, and it currently bakes in the opposite behavior.

Align DEFAULT_MMR_CONFIG.enabled (and this test) with the intended default (false) so “opt-in” behavior is consistent across runtime, schema docs, and tests.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/memory/mmr.test.ts
Line: 916:920

Comment:
**Default config contradicts feature flag**

`DEFAULT_MMR_CONFIG.enabled` is asserted as `true` here, but the rest of the PR (docs/config defaults and `mergeHybridResults` usage) describes MMR as *opt-in* / default disabled. This test will fail if the runtime default is corrected to match config defaults, and it currently bakes in the opposite behavior.

Align `DEFAULT_MMR_CONFIG.enabled` (and this test) with the intended default (`false`) so “opt-in” behavior is consistent across runtime, schema docs, and tests.

How can I resolve this? If you propose a fix, please make it concise.

@rodrigouroz rodrigouroz force-pushed the feat/mmr-reranking-clean branch from 1f16e23 to 5747b23 Compare February 10, 2026 13:14
@rodrigouroz
Copy link
Copy Markdown
Contributor Author

Check code file sizes is failing but I only added 6 and 13 lines, I don't want to tackle a refactor of those files here.

@rodrigouroz
Copy link
Copy Markdown
Contributor Author

@greptile-apps can you review again?

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

12 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +137 to +147
return Promise.all(
params.results.map(async (entry) => {
const cacheKey = `${entry.source}:${entry.path}`;
if (!timestampCache.has(cacheKey)) {
const timestamp = await extractTimestamp({
filePath: entry.path,
source: entry.source,
workspaceDir: params.workspaceDir,
});
timestampCache.set(cacheKey, timestamp);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timestampCache doesn't work as intended inside Promise.all — all mapper callbacks start concurrently, so entries with the same source:path will all pass the !timestampCache.has(cacheKey) check before any await extractTimestamp() resolves, resulting in duplicate fs.stat calls.

Not a correctness bug (all calls produce the same result), but the cache is effectively a no-op. If deduplication matters, consider collecting unique cache keys first and resolving them sequentially or via a dedup map of promises.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/memory/temporal-decay.ts
Line: 137:147

Comment:
`timestampCache` doesn't work as intended inside `Promise.all` — all mapper callbacks start concurrently, so entries with the same `source:path` will all pass the `!timestampCache.has(cacheKey)` check before any `await extractTimestamp()` resolves, resulting in duplicate `fs.stat` calls.

Not a correctness bug (all calls produce the same result), but the cache is effectively a no-op. If deduplication matters, consider collecting unique cache keys first and resolving them sequentially or via a dedup map of promises.

How can I resolve this? If you propose a fix, please make it concise.

CHANGELOG.md Outdated
Comment on lines +413 to +414
- Agents: honor tools.exec.safeBins in exec allowlist checks. (#2281)
- Docs: tighten Fly private deployment steps. (#2289) Thanks @dguido.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two entries appear to be unrelated to this PR and were added to an already-released version section (2026.1.29). Likely a merge/rebase artifact — consider reverting these lines from this PR's diff.

Prompt To Fix With AI
This is a comment left during a code review.
Path: CHANGELOG.md
Line: 413:414

Comment:
These two entries appear to be unrelated to this PR and were added to an already-released version section (`2026.1.29`). Likely a merge/rebase artifact — consider reverting these lines from this PR's diff.

How can I resolve this? If you propose a fix, please make it concise.

@rodrigouroz
Copy link
Copy Markdown
Contributor Author

@greptile-apps please check now and if all good update the PR description with your score

@rodrigouroz rodrigouroz changed the title feat(memory): Add MMR re-ranking and temporal decay for hybrid searchFeat/mmr reranking clean feat(memory): Add MMR re-ranking and temporal decay for hybrid search Feb 11, 2026
Adds Maximal Marginal Relevance (MMR) re-ranking to hybrid search results.

- New mmr.ts with tokenization, Jaccard similarity, and MMR algorithm
- Integrated into mergeHybridResults() with optional mmr config
- 40 comprehensive tests covering edge cases and diversity behavior
- Configurable lambda parameter (default 0.7) to balance relevance vs diversity
- Updated CHANGELOG.md and memory docs

This helps avoid redundant results when multiple chunks contain similar content.
Exponential decay (half-life configurable, default 30 days) applied
before MMR re-ranking. Dated daily files (memory/YYYY-MM-DD.md) use
filename date; evergreen files (MEMORY.md, topic files) are not
decayed; other sources fall back to file mtime.

Config: memorySearch.query.hybrid.temporalDecay.{enabled, halfLifeDays}
Default: disabled (backwards compatible, opt-in).
- DEFAULT_MMR_CONFIG.enabled = false (opt-in, was incorrectly true)
- Tie-break: handle bestItem === null so first candidate always wins
- CHANGELOG URL: docs.clawd.bot → docs.openclaw.ai
- Tests updated to pass enabled: true explicitly where needed
@rodrigouroz
Copy link
Copy Markdown
Contributor Author

Closing as superseded by a refreshed PR rebased/ported to current main, with the same feature set and updated validation.

Replacement PR: #18097

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling docs Improvements or additions to documentation size: XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant