Summary
When MemPalace adds drawers via add_drawer (and similar paths), ChromaDB may persist metadata to chroma.sqlite3 while the HNSW vector index on disk (data_level0.bin, etc.) lags behind or never reaches parity with the metadata segment under default hnsw:sync_threshold (often 1000). Users then observe:
collection.get() / listing shows new documents
collection.query() / semantic search returns no or stale matches for those documents (e.g. similarity 0.0 or unrelated hits)
- On-disk HNSW file timestamps stay old while
chroma.sqlite3 updates
This is consistent with ChromaDB’s design: vectors are buffered and the on-disk HNSW segment is flushed only after enough new vectors accumulate (see Chroma configuration — sync_threshold, batch_size).
Expected behavior
After a successful add_drawer, semantic search should reliably find the new content without requiring manual maintenance, or MemPalace should document that operators must run chromadb-ops / adjust HNSW settings.
Suggested fixes (any one or combination)
- Set a lower default
sync_threshold (e.g. 100) when creating or opening the mempalace_drawers collection, so HNSW is flushed to disk more often for typical incremental adds.
- Expose optional config (env var or
config.json) for hnsw:sync_threshold / hnsw:batch_size so power users can tune persistence vs performance.
- After bulk adds, optionally call into Chroma’s maintenance path (e.g. WAL commit / HNSW rebuild via supported APIs) where applicable for the pinned
chromadb version.
- Document the interaction between SQLite metadata and HNSW persistence, and point to
chromadb-ops (chops db info, chops hnsw rebuild) for recovery if metadata and vector segments diverge.
Environment (example)
- ChromaDB 0.6.x / 1.x (depending on MemPalace release)
PersistentClient against ~/.mempalace/palace/
- Windows + Linux (devcontainer bind mount) — issue reproduced when metadata updated but HNSW files were stale
Additional context
We recovered by exporting all drawers, recreating the collection with sync_threshold=100, and re-importing. A first-class MemPalace-side default or documented knob would prevent that class of failure.
Thank you for MemPalace — happy to help test a PR if useful.
Summary
When MemPalace adds drawers via
add_drawer(and similar paths), ChromaDB may persist metadata tochroma.sqlite3while the HNSW vector index on disk (data_level0.bin, etc.) lags behind or never reaches parity with the metadata segment under defaulthnsw:sync_threshold(often 1000). Users then observe:collection.get()/ listing shows new documentscollection.query()/ semantic search returns no or stale matches for those documents (e.g. similarity0.0or unrelated hits)chroma.sqlite3updatesThis is consistent with ChromaDB’s design: vectors are buffered and the on-disk HNSW segment is flushed only after enough new vectors accumulate (see Chroma configuration —
sync_threshold,batch_size).Expected behavior
After a successful
add_drawer, semantic search should reliably find the new content without requiring manual maintenance, or MemPalace should document that operators must runchromadb-ops/ adjust HNSW settings.Suggested fixes (any one or combination)
sync_threshold(e.g. 100) when creating or opening themempalace_drawerscollection, so HNSW is flushed to disk more often for typical incremental adds.config.json) forhnsw:sync_threshold/hnsw:batch_sizeso power users can tune persistence vs performance.chromadbversion.chromadb-ops(chops db info,chops hnsw rebuild) for recovery if metadata and vector segments diverge.Environment (example)
PersistentClientagainst~/.mempalace/palace/Additional context
We recovered by exporting all drawers, recreating the collection with
sync_threshold=100, and re-importing. A first-class MemPalace-side default or documented knob would prevent that class of failure.Thank you for MemPalace — happy to help test a PR if useful.