feat(memory): semantic response caching with embedding similarity#2029
Merged
feat(memory): semantic response caching with embedding similarity#2029
Conversation
…ching Implement semantic cache alongside exact-match caching to reduce LLM API calls by matching user queries based on embedding similarity (~0.95 threshold) rather than exact text match. Matches improved queries to cached responses from semantically similar previous questions. **Key changes:** - ResponseCache extended with `get_semantic()`, `put_with_embedding()`, `invalidate_embeddings_for_model()`, `cleanup()` methods - CacheCheckResult enum ensures embedding computed once per request (CRIT-01) - SQL WHERE clause filters by embedding_model to prevent cross-model false positives (CRIT-02) - Config: semantic_cache_enabled (default false), semantic_cache_threshold (0.95), semantic_cache_max_candidates (10) - Agent loop: exact-match tried first (sub-ms), semantic as fallback (~150ms) - Tool-call guard: semantic cache skipped for context-sensitive tool responses - Migration 037: adds embedding BLOB, embedding_model, embedding_ts columns - bytemuck zero-copy serialization for Vec<f32> embeddings **Test coverage:** 13 new unit tests, all 5953 existing tests pass. **Security:** LOW risk, parameterized SQL, local threat model, bytemuck safe. **Performance:** Single-embed optimization saves 50–200ms per cache miss. **Backward compat:** Old cache entries (NULL embeddings) still work via exact-match. Closes #1521
This was referenced Mar 20, 2026
bug-ops
added a commit
that referenced
this pull request
Mar 20, 2026
Add 6 tests to response_cache.rs to verify cosine_similarity() and get_semantic() gracefully handle embedding dimension mismatches: - test_semantic_get_dimension_mismatch_returns_none: store dim=3, query dim=2 - test_semantic_get_dimension_mismatch_query_longer: store dim=2, query dim=3 - test_semantic_get_mixed_dimensions_picks_correct_match: mixed dims, verify correct match - test_semantic_get_empty_embedding_skipped: empty embedding (BLOB x'') handling - test_semantic_get_corrupt_blob_skipped: corrupt BLOB graceful skip - test_semantic_get_all_corrupt_returns_none: all candidates corrupt/empty Fixes #2034. Addresses PR #2029 review feedback. All tests use threshold=0.01 to correctly verify that cosine_similarity(mismatch)=0.0 does not produce false hits (0.0 >= 0.0 would be true, so threshold must be > 0). Tested: 5524 tests pass, no regressions. Verified by tester, perf, security, and impl-critic agents.
6 tasks
bug-ops
added a commit
that referenced
this pull request
Mar 20, 2026
#2046) Add 6 tests to response_cache.rs to verify cosine_similarity() and get_semantic() gracefully handle embedding dimension mismatches: - test_semantic_get_dimension_mismatch_returns_none: store dim=3, query dim=2 - test_semantic_get_dimension_mismatch_query_longer: store dim=2, query dim=3 - test_semantic_get_mixed_dimensions_picks_correct_match: mixed dims, verify correct match - test_semantic_get_empty_embedding_skipped: empty embedding (BLOB x'') handling - test_semantic_get_corrupt_blob_skipped: corrupt BLOB graceful skip - test_semantic_get_all_corrupt_returns_none: all candidates corrupt/empty Fixes #2034. Addresses PR #2029 review feedback. All tests use threshold=0.01 to correctly verify that cosine_similarity(mismatch)=0.0 does not produce false hits (0.0 >= 0.0 would be true, so threshold must be > 0). Tested: 5524 tests pass, no regressions. Verified by tester, perf, security, and impl-critic agents.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implement semantic cache alongside exact-match caching to reduce LLM API calls by matching user queries based on embedding similarity rather than exact text match.
Changes
Test Plan
Follow-up Issues (will file)
Closes #1521