feat: implement SQLiteVec storage backend#1196
Open
anocerino-ai wants to merge 1 commit intoMemPalace:developfrom
Open
feat: implement SQLiteVec storage backend#1196anocerino-ai wants to merge 1 commit intoMemPalace:developfrom
anocerino-ai wants to merge 1 commit intoMemPalace:developfrom
Conversation
Add SqliteVecBackend / SqliteVecCollection as a fully working,
zero-dependency alternative to ChromaDB. All core backend protocol
requirements (RFC 001) are met; 104 new tests cover every code path.
Key changes
-----------
mempalace/backends/sqlite_vec.py (new)
- Multi-collection: each collection_name maps to its own SQL table
inside palace.db; _safe_table_name() prevents SQL injection.
- Dynamic embedding dimension: vec virtual table is created lazily on
the first write that contains an embedding; the actual dimension is
detected from len(embedding) and stored. DimensionMismatchError is
raised if a subsequent write arrives with a different dimension.
On reconnect the dimension is read back from sqlite_master so the
vec table is never recreated with the wrong dim.
- ANN over-fetch + post-filter: when sqlite-vec is available the query
fetches n_results * _ANN_OVERFETCH (10x) ANN candidates, applies
Python-side where / where_document filters, and returns the top
n_results survivors. If survivors < n_results the query falls back
to a full brute-force cosine scan so correctness is always guaranteed.
When sqlite-vec is absent the brute-force path is used directly.
mempalace/backends/registry.py
- _register_builtins() registers SqliteVecBackend under "sqlite_vec"
so get_backend("sqlite_vec") and resolve_backend_for_palace() work
out of the box.
mempalace/backends/__init__.py
- SqliteVecBackend and SqliteVecCollection added to public surface and __all__.
tests/test_sqlite_vec.py (new, 104 tests / 2 skipped without sqlite-vec)
- Section 1: pure-Python utility functions (_pack_f32, _unpack_f32,
_cosine_distance, _cosine_brute).
- Section 2: _meta_matches -- all operators ($eq, $ne, $in, $nin, $gt,
$gte, $lt, $lte, $contains, $and, $or, nested).
- Section 3: SqliteVecCollection CRUD (add, upsert, update, delete, get,
count), LIMIT/OFFSET, close/health.
- Section 4: query() brute-force path (no sqlite-vec needed, CI-safe).
- Section 5: SqliteVecBackend lifecycle (caching, close, health).
- Section 6: detect() classmethod.
- Section 7: registry integration.
- Section 8: end-to-end integration (round-trips, concurrency, persistence).
- Section 9: multi-collection isolation and _safe_table_name validation.
- Section 10: dynamic dimension detection and ANN over-fetch behaviour.
This was referenced Apr 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What and why
MemPalace is local-first by design, but ChromaDB is the only available backend. This PR adds
SqliteVecBackend— a fully working second backend built on SQLite, which is already present as a dependency for the knowledge graph.The result: users get a zero-setup, single-file alternative that works completely offline with no extra services. The optional
sqlite-vecextension enables ANN search; without it the backend degrades gracefully to brute-force cosine scan, so it works in any environment including CI.What I built
sqlite_vec.pycollection_nameis its own SQL table insidepalace.db;_safe_table_name()blocks SQL injection at construction timelen(embedding),DimensionMismatchErrorraised on mismatch, dim read back fromsqlite_masteron reconnect — no hardcoded assumptionsn_results × 10candidates via sqlite-vec → apply Python-sidewhere/where_document→ return top N; falls back to full brute-force scan if not enough survive, so correctness is always guaranteedadd()/upsert()/update()/delete()/get()/query()/health()/close()registry.py— backend registered in_register_builtins()soget_backend("sqlite_vec")andresolve_backend_for_palace()work out of the box__init__.py— exported to public surfacetests/test_sqlite_vec.py— 104 tests across 10 sections covering utilities, filter logic, CRUD, query, backend lifecycle,detect(), registry, integration, multi-collection, and dynamic dim + ANN behaviour. The 2 tests requiring the sqlite-vec C extension skip gracefully in CI.Test results