feat(memory): semantic response caching with embedding similarity by bug-ops · Pull Request #2029 · bug-ops/zeph

bug-ops · 2026-03-20T12:29:35Z

Summary

Implement semantic cache alongside exact-match caching to reduce LLM API calls by matching user queries based on embedding similarity rather than exact text match.

Changes

ResponseCache extended with semantic_get(), put_with_embedding(), invalidate_embeddings_for_model(), cleanup()
CacheCheckResult enum ensures single embedding generation per request
Config: semantic_cache_enabled, semantic_cache_threshold (0.95), semantic_cache_max_candidates (10)
Agent loop integration: exact-match → semantic fallback → LLM call
Tool-call guard: skips semantic cache for context-sensitive responses
Migration 037: adds embedding BLOB, embedding_model, embedding_ts columns
13 new unit tests, all 5953 tests pass

Test Plan

Follow-up Issues (will file)

PERF-SC-01 (MEDIUM): expires_at not in index
PERF-SC-02 (LOW): max_candidates undocumented recall limit
PERF-SC-03 (LOW): cleanup() not atomic
TC-01 (LOW): corrupted BLOB test
TC-02 (LOW): dimension mismatch test
MINOR-02 (LOW): integration test stubs for Ollama
SEC-01 (LOW): threshold NaN/Inf/range validation

Closes #1521

…ching Implement semantic cache alongside exact-match caching to reduce LLM API calls by matching user queries based on embedding similarity (~0.95 threshold) rather than exact text match. Matches improved queries to cached responses from semantically similar previous questions. **Key changes:** - ResponseCache extended with `get_semantic()`, `put_with_embedding()`, `invalidate_embeddings_for_model()`, `cleanup()` methods - CacheCheckResult enum ensures embedding computed once per request (CRIT-01) - SQL WHERE clause filters by embedding_model to prevent cross-model false positives (CRIT-02) - Config: semantic_cache_enabled (default false), semantic_cache_threshold (0.95), semantic_cache_max_candidates (10) - Agent loop: exact-match tried first (sub-ms), semantic as fallback (~150ms) - Tool-call guard: semantic cache skipped for context-sensitive tool responses - Migration 037: adds embedding BLOB, embedding_model, embedding_ts columns - bytemuck zero-copy serialization for Vec<f32> embeddings **Test coverage:** 13 new unit tests, all 5953 existing tests pass. **Security:** LOW risk, parameterized SQL, local threat model, bytemuck safe. **Performance:** Single-embed optimization saves 50–200ms per cache miss. **Backward compat:** Old cache entries (NULL embeddings) still work via exact-match. Closes #1521

…aching

Add 6 tests to response_cache.rs to verify cosine_similarity() and get_semantic() gracefully handle embedding dimension mismatches: - test_semantic_get_dimension_mismatch_returns_none: store dim=3, query dim=2 - test_semantic_get_dimension_mismatch_query_longer: store dim=2, query dim=3 - test_semantic_get_mixed_dimensions_picks_correct_match: mixed dims, verify correct match - test_semantic_get_empty_embedding_skipped: empty embedding (BLOB x'') handling - test_semantic_get_corrupt_blob_skipped: corrupt BLOB graceful skip - test_semantic_get_all_corrupt_returns_none: all candidates corrupt/empty Fixes #2034. Addresses PR #2029 review feedback. All tests use threshold=0.01 to correctly verify that cosine_similarity(mismatch)=0.0 does not produce false hits (0.0 >= 0.0 would be true, so threshold must be > 0). Tested: 5524 tests pass, no regressions. Verified by tester, perf, security, and impl-critic agents.

#2046) Add 6 tests to response_cache.rs to verify cosine_similarity() and get_semantic() gracefully handle embedding dimension mismatches: - test_semantic_get_dimension_mismatch_returns_none: store dim=3, query dim=2 - test_semantic_get_dimension_mismatch_query_longer: store dim=2, query dim=3 - test_semantic_get_mixed_dimensions_picks_correct_match: mixed dims, verify correct match - test_semantic_get_empty_embedding_skipped: empty embedding (BLOB x'') handling - test_semantic_get_corrupt_blob_skipped: corrupt BLOB graceful skip - test_semantic_get_all_corrupt_returns_none: all candidates corrupt/empty Fixes #2034. Addresses PR #2029 review feedback. All tests use threshold=0.01 to correctly verify that cosine_similarity(mismatch)=0.0 does not produce false hits (0.0 >= 0.0 would be true, so threshold must be > 0). Tested: 5524 tests pass, no regressions. Verified by tester, perf, security, and impl-critic agents.

Merge main into issue-1521-semantic-caching

43b2247

github-actions bot added enhancement New feature or request size/XL Extra large PR (500+ lines) labels Mar 20, 2026

Resolve CHANGELOG merge conflict with main

fd4598b

bug-ops enabled auto-merge (squash) March 20, 2026 12:44

bug-ops added 2 commits March 20, 2026 13:52

fix(memory): remove unused vec2blob helper function

8fb49bd

Merge remote-tracking branch 'origin/main' into issue-1521-semantic-c…

4ebef2e

…aching

bug-ops merged commit 9e332a7 into main Mar 20, 2026
25 checks passed

bug-ops deleted the issue-1521-semantic-caching branch March 20, 2026 13:07

bug-ops mentioned this pull request Mar 20, 2026

test: add dimension mismatch handling tests for semantic cache #2046

Merged

6 tasks

bug-ops mentioned this pull request Mar 20, 2026

docs: document semantic_cache_max_candidates recall tradeoff #2071

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(memory): semantic response caching with embedding similarity#2029

feat(memory): semantic response caching with embedding similarity#2029
bug-ops merged 5 commits intomainfrom
issue-1521-semantic-caching

bug-ops commented Mar 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bug-ops commented Mar 20, 2026

Summary

Changes

Test Plan

Follow-up Issues (will file)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant