ChromaDB: HNSW persistence vs add_drawer — default sync_threshold can leave semantic search stale

## Summary

When MemPalace adds drawers via `add_drawer` (and similar paths), ChromaDB may persist **metadata** to `chroma.sqlite3` while the **HNSW vector index** on disk (`data_level0.bin`, etc.) lags behind or never reaches parity with the metadata segment under default `hnsw:sync_threshold` (often **1000**). Users then observe:

- `collection.get()` / listing shows new documents
- `collection.query()` / semantic search returns **no** or **stale** matches for those documents (e.g. similarity `0.0` or unrelated hits)
- On-disk HNSW file timestamps stay old while `chroma.sqlite3` updates

This is consistent with ChromaDB’s design: vectors are buffered and the on-disk HNSW segment is flushed only after enough **new** vectors accumulate (see [Chroma configuration](https://cookbook.chromadb.dev/core/configuration/) — `sync_threshold`, `batch_size`).

## Expected behavior

After a successful `add_drawer`, **semantic search** should reliably find the new content without requiring manual maintenance, or MemPalace should **document** that operators must run `chromadb-ops` / adjust HNSW settings.

## Suggested fixes (any one or combination)

1. **Set a lower default `sync_threshold`** (e.g. **100**) when creating or opening the `mempalace_drawers` collection, so HNSW is flushed to disk more often for typical incremental adds.
2. **Expose optional config** (env var or `config.json`) for `hnsw:sync_threshold` / `hnsw:batch_size` so power users can tune persistence vs performance.
3. After bulk adds, optionally call into Chroma’s maintenance path (e.g. WAL commit / HNSW rebuild via supported APIs) where applicable for the pinned `chromadb` version.
4. **Document** the interaction between SQLite metadata and HNSW persistence, and point to `chromadb-ops` (`chops db info`, `chops hnsw rebuild`) for recovery if metadata and vector segments diverge.

## Environment (example)

- ChromaDB **0.6.x** / **1.x** (depending on MemPalace release)
- `PersistentClient` against `~/.mempalace/palace/`
- Windows + Linux (devcontainer bind mount) — issue reproduced when metadata updated but HNSW files were stale

## Additional context

We recovered by exporting all drawers, recreating the collection with `sync_threshold=100`, and re-importing. A first-class MemPalace-side default or documented knob would prevent that class of failure.

Thank you for MemPalace — happy to help test a PR if useful.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ChromaDB: HNSW persistence vs add_drawer — default sync_threshold can leave semantic search stale #823

Summary

Expected behavior

Suggested fixes (any one or combination)

Environment (example)

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ChromaDB: HNSW persistence vs add_drawer — default sync_threshold can leave semantic search stale #823

Description

Summary

Expected behavior

Suggested fixes (any one or combination)

Environment (example)

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions