`_fix_blob_seq_ids()` opens live chromadb.sqlite on every client cache miss; post-migration no-op still risks corruption

## What we think we're seeing

`mempalace/backends/chroma.py::_fix_blob_seq_ids()` is invoked every time `_ChromaClientRegistry.get_or_create()` has a cache miss or detects an mtime/inode drift on `chroma.sqlite3` (lines 468 + 489 on current `develop`), and also from the deprecated-ish `make_client()` helper. For a palace that's already been migrated from the 0.6.x BLOB seq_id format, every one of those calls does:

1. `sqlite3.connect(db_path)` against a **live chromadb 1.5.x sqlite file**
2. two `SELECT rowid, seq_id FROM {table} WHERE typeof(seq_id) = 'blob'` queries (per-table early-out)
3. `conn.commit()` (even though nothing was written on the no-op path)

On our fork's 165K-drawer palace, we've observed — across ~400 MCP-server starts and miner invocations — that this pattern correlates with occasional chromadb-Rust-side crashes on the *next* `PersistentClient(...)` call after a successful `_fix_blob_seq_ids` no-op. Not every time, but frequently enough to be a real reliability hit over hours of use. The crash signatures vary (null-pointer in the Rust compactor, "mismatched types" re-emerging, occasional SIGSEGV); the common factor is they happen after a clean `_fix_blob_seq_ids` return.

## Hypothesis

chromadb 1.5.x's Rust side maintains its own in-memory view of the sqlite file and doesn't expect an external process (even in-process Python `sqlite3.connect`) to acquire a lock on the same file mid-session. The post-migration `_fix_blob_seq_ids` call opens, reads, commits, and closes the file from Python, which appears to desync the Rust side's cached state. On subsequent `PersistentClient` init, the Rust compactor fires against stale mental-model-of-sqlite and crashes.

The function **has** to run the first time a 0.6.x palace is migrated (that's what it was added for). After that first successful migration, re-running it is a no-op that costs two SELECTs — but costs us a non-trivial crash rate on chromadb 1.5.x.

## Fork workaround

After `_fix_blob_seq_ids` completes successfully and finds nothing to update (the "no BLOB rows" early-out), the fork writes `<palace_path>/.blob_seq_ids_migrated` as a sentinel. `_get_client()` checks for that sentinel first; if it's present, `_fix_blob_seq_ids` is skipped entirely. The function still runs on palaces without the sentinel (the "needs migration" case), so 0.6.x → 1.5.x migration still works correctly.

Result on our palace: zero crashes of this flavor since the sentinel landed (2026-04-10 → present, ~11 days). Before the sentinel: roughly 1 crash per 10–20 process starts on the same hardware + chromadb combo.

## Asking before filing a PR

I want to flag this before guessing at a fix direction, because I'm not 100% sure of the causation — "crash after a clean `_fix_blob_seq_ids` return" is circumstantial. Two questions:

1. **Has anyone else on chromadb 1.5.x observed this?** If the correlation is specific to our environment (macOS ARM64, Python 3.13, chromadb 1.5.8, 165K-drawer palace), the fork's sentinel is a narrow-value fix. If it reproduces elsewhere, it's a broader reliability patch.
2. **Is the direction "skip after migration" or "don't open sqlite3 from Python at all"?** The sentinel is one option; another is detecting chromadb-version and skipping for 1.5.x (since the BLOB issue was a 0.6.x → 1.x migration artifact and shouldn't appear on palaces that started life on 1.5.x). Either would get the same result with different semantics.

## Adjacent issues

- [#722](https://github.com/MemPalace/mempalace/issues/722) (@sha2fiddy) — migrate produces incomplete 0.6.x schema; different bug, adjacent code path.
- [#832](https://github.com/MemPalace/mempalace/issues/832) (@TengJoe) — "standalone ChromaDB HTTP Client to prevent SQLite concurrency crashes"; same family of concerns (external processes touching the sqlite file), different proposed solution.
- [#1035](https://github.com/MemPalace/mempalace/issues/1035) (@potterdigital) — post-#1010 repair need for 0.6.x palaces; also about migration recovery but different code path.

Happy to draft a PR once there's direction. Code for reference: [fork's `_get_client()`](https://github.com/jphein/mempalace/blob/main/mempalace/backends/chroma.py).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`_fix_blob_seq_ids()` opens live chromadb.sqlite on every client cache miss; post-migration no-op still risks corruption #1090

What we think we're seeing

Hypothesis

Fork workaround

Asking before filing a PR

Adjacent issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

_fix_blob_seq_ids() opens live chromadb.sqlite on every client cache miss; post-migration no-op still risks corruption #1090

Description

What we think we're seeing

Hypothesis

Fork workaround

Asking before filing a PR

Adjacent issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`_fix_blob_seq_ids()` opens live chromadb.sqlite on every client cache miss; post-migration no-op still risks corruption #1090