Skip to content

feat: implement SQLiteVec storage backend#1196

Open
anocerino-ai wants to merge 1 commit intoMemPalace:developfrom
anocerino-ai:feat/sqlitevec-backend
Open

feat: implement SQLiteVec storage backend#1196
anocerino-ai wants to merge 1 commit intoMemPalace:developfrom
anocerino-ai:feat/sqlitevec-backend

Conversation

@anocerino-ai
Copy link
Copy Markdown

What and why

MemPalace is local-first by design, but ChromaDB is the only available backend. This PR adds SqliteVecBackend — a fully working second backend built on SQLite, which is already present as a dependency for the knowledge graph.

The result: users get a zero-setup, single-file alternative that works completely offline with no extra services. The optional sqlite-vec extension enables ANN search; without it the backend degrades gracefully to brute-force cosine scan, so it works in any environment including CI.

What I built

sqlite_vec.py

  • Multi-collection: each collection_name is its own SQL table inside palace.db; _safe_table_name() blocks SQL injection at construction time
  • Dynamic embedding dimension: vec table created lazily on first write, dim inferred from len(embedding), DimensionMismatchError raised on mismatch, dim read back from sqlite_master on reconnect — no hardcoded assumptions
  • ANN over-fetch + post-filter: fetch n_results × 10 candidates via sqlite-vec → apply Python-side where / where_document → return top N; falls back to full brute-force scan if not enough survive, so correctness is always guaranteed
  • Full RFC 001 protocol: add() / upsert() / update() / delete() / get() / query() / health() / close()

registry.py — backend registered in _register_builtins() so get_backend("sqlite_vec") and resolve_backend_for_palace() work out of the box

__init__.py — exported to public surface

tests/test_sqlite_vec.py — 104 tests across 10 sections covering utilities, filter logic, CRUD, query, backend lifecycle, detect(), registry, integration, multi-collection, and dynamic dim + ANN behaviour. The 2 tests requiring the sqlite-vec C extension skip gracefully in CI.

Test results

pytest tests/test_sqlite_vec.py              →  104 passed, 2 skipped
pytest tests/ --ignore=tests/benchmarks     →  1395 passed, 7 skipped, 0 failed

Add SqliteVecBackend / SqliteVecCollection as a fully working,
zero-dependency alternative to ChromaDB. All core backend protocol
requirements (RFC 001) are met; 104 new tests cover every code path.

Key changes
-----------
mempalace/backends/sqlite_vec.py (new)
  - Multi-collection: each collection_name maps to its own SQL table
    inside palace.db; _safe_table_name() prevents SQL injection.
  - Dynamic embedding dimension: vec virtual table is created lazily on
    the first write that contains an embedding; the actual dimension is
    detected from len(embedding) and stored.  DimensionMismatchError is
    raised if a subsequent write arrives with a different dimension.
    On reconnect the dimension is read back from sqlite_master so the
    vec table is never recreated with the wrong dim.
  - ANN over-fetch + post-filter: when sqlite-vec is available the query
    fetches n_results * _ANN_OVERFETCH (10x) ANN candidates, applies
    Python-side where / where_document filters, and returns the top
    n_results survivors.  If survivors < n_results the query falls back
    to a full brute-force cosine scan so correctness is always guaranteed.
    When sqlite-vec is absent the brute-force path is used directly.

mempalace/backends/registry.py
  - _register_builtins() registers SqliteVecBackend under "sqlite_vec"
    so get_backend("sqlite_vec") and resolve_backend_for_palace() work
    out of the box.

mempalace/backends/__init__.py
  - SqliteVecBackend and SqliteVecCollection added to public surface and __all__.

tests/test_sqlite_vec.py (new, 104 tests / 2 skipped without sqlite-vec)
  - Section 1: pure-Python utility functions (_pack_f32, _unpack_f32,
    _cosine_distance, _cosine_brute).
  - Section 2: _meta_matches -- all operators ($eq, $ne, $in, $nin, $gt,
    $gte, $lt, $lte, $contains, $and, $or, nested).
  - Section 3: SqliteVecCollection CRUD (add, upsert, update, delete, get,
    count), LIMIT/OFFSET, close/health.
  - Section 4: query() brute-force path (no sqlite-vec needed, CI-safe).
  - Section 5: SqliteVecBackend lifecycle (caching, close, health).
  - Section 6: detect() classmethod.
  - Section 7: registry integration.
  - Section 8: end-to-end integration (round-trips, concurrency, persistence).
  - Section 9: multi-collection isolation and _safe_table_name validation.
  - Section 10: dynamic dimension detection and ANN over-fetch behaviour.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request storage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants