fix: auto-repair BLOB seq_ids from chromadb 0.6→1.5 migration#664
fix: auto-repair BLOB seq_ids from chromadb 0.6→1.5 migration#664bensig merged 2 commits intoMemPalace:developfrom
Conversation
ChromaDB 0.6.x stored seq_id as big-endian 8-byte BLOBs. The 1.5.x auto-migration doesn't convert existing rows, causing the Rust compactor to crash on init. This detects and repairs BLOB seq_ids in both embeddings and max_seq_id tables before PersistentClient is created. Fixes the "mismatched types; Rust type u64 is not compatible with SQL type BLOB" error for users upgrading from 0.6.x. Co-Authored-By: Claude Opus 4.6 <[email protected]>
There was a problem hiding this comment.
Pull request overview
Adds a startup-time repair step to handle a known ChromaDB SQLite migration issue where seq_id values remain stored as BLOBs, preventing ChromaDB 1.5.x from initializing successfully.
Changes:
- Introduces
_fix_blob_seq_ids()to convert big-endian 8-byte BLOBseq_idvalues to INTEGER inembeddingsandmax_seq_id. - Runs the repair automatically before creating
chromadb.PersistentClient. - Adds backend tests covering conversion, already-integer no-op, and missing-database no-op.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
mempalace/backends/chroma.py |
Adds automatic SQLite repair for BLOB seq_id values prior to Chroma client initialization. |
tests/test_backends.py |
Adds tests validating the repair behavior and ensuring safe no-op conditions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| conn = sqlite3.connect(db_path) | ||
| for table in ("embeddings", "max_seq_id"): | ||
| try: | ||
| rows = conn.execute( | ||
| f"SELECT rowid, seq_id FROM {table} WHERE typeof(seq_id) = 'blob'" | ||
| ).fetchall() | ||
| except sqlite3.OperationalError: | ||
| continue | ||
| if not rows: | ||
| continue | ||
| updates = [(int.from_bytes(blob, byteorder="big"), rowid) for rowid, blob in rows] | ||
| conn.executemany(f"UPDATE {table} SET seq_id = ? WHERE rowid = ?", updates) | ||
| logger.info("Fixed %d BLOB seq_ids in %s", len(updates), table) | ||
| conn.commit() | ||
| conn.close() |
There was a problem hiding this comment.
If any exception occurs after opening the SQLite connection (e.g., during SELECT/update/commit), the connection is never closed because conn.close() is only reached on the happy path. This can leave the DB file locked and cause subsequent PersistentClient initialization to fail. Use a context manager (with sqlite3.connect(...) as conn:) or a finally: conn.close() to guarantee closure (and let the context manager handle commit/rollback correctly).
| conn = sqlite3.connect(db_path) | |
| for table in ("embeddings", "max_seq_id"): | |
| try: | |
| rows = conn.execute( | |
| f"SELECT rowid, seq_id FROM {table} WHERE typeof(seq_id) = 'blob'" | |
| ).fetchall() | |
| except sqlite3.OperationalError: | |
| continue | |
| if not rows: | |
| continue | |
| updates = [(int.from_bytes(blob, byteorder="big"), rowid) for rowid, blob in rows] | |
| conn.executemany(f"UPDATE {table} SET seq_id = ? WHERE rowid = ?", updates) | |
| logger.info("Fixed %d BLOB seq_ids in %s", len(updates), table) | |
| conn.commit() | |
| conn.close() | |
| with sqlite3.connect(db_path) as conn: | |
| for table in ("embeddings", "max_seq_id"): | |
| try: | |
| rows = conn.execute( | |
| f"SELECT rowid, seq_id FROM {table} WHERE typeof(seq_id) = 'blob'" | |
| ).fetchall() | |
| except sqlite3.OperationalError: | |
| continue | |
| if not rows: | |
| continue | |
| updates = [(int.from_bytes(blob, byteorder="big"), rowid) for rowid, blob in rows] | |
| conn.executemany(f"UPDATE {table} SET seq_id = ? WHERE rowid = ?", updates) | |
| logger.info("Fixed %d BLOB seq_ids in %s", len(updates), table) |
There was a problem hiding this comment.
Fixed — switched to context manager. 595991e
| except Exception as e: | ||
| logger.warning("Could not fix BLOB seq_ids: %s", e) |
There was a problem hiding this comment.
The broad except Exception as e: logger.warning(...) drops the traceback, making it hard to diagnose why the repair failed (and the user will likely just see the later Chroma crash). Consider logger.exception(...) and include db_path in the message so failures are actionable.
| except Exception as e: | |
| logger.warning("Could not fix BLOB seq_ids: %s", e) | |
| except Exception: | |
| logger.exception("Could not fix BLOB seq_ids in %s", db_path) |
There was a problem hiding this comment.
Fixed — changed to logger.exception() which preserves the full traceback, and included db_path in the message for easier debugging. 595991e
…repair Addresses Copilot review feedback on MemPalace#664. Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
Merged. Two notes for follow-up:
Neither is blocking — the fix is valuable as-is for the common 0.6→1.5 migration path. |
…emPalace#681-684 new Co-Authored-By: Claude Opus 4.6 <[email protected]>
The BLOB-seq_id migration shim (PR MemPalace#664) ran int.from_bytes(..., 'big') over every BLOB in max_seq_id, including chromadb 1.5.x's own native format (b'\x11\x11' + 6 ASCII digits). That conversion yields a ~1.23e18 integer that silently suppresses every subsequent embeddings_queue write for the affected segment (queue filter is seq_id > start), causing silent drawer-write drops after a 1.5.x upgrade. Two-part fix: 1. Shim narrowing (mempalace/backends/chroma.py) - Drop max_seq_id from the shim loop. chromadb owns that column's format; we don't reinterpret it. - Defense-in-depth: skip rows in embeddings whose seq_id BLOB has the sysdb-10 b'\x11\x11' prefix rather than misconvert. 2. Recovery command (mempalace/repair.py, mempalace/cli.py) - mempalace repair --mode max-seq-id [--segment <uuid>] [--from-sidecar <path>] [--dry-run] [--yes] [--no-backup] - Detects poisoned rows via threshold (seq_id > 2**53). - Default heuristic: MAX(embeddings.seq_id) over the collection owning the poisoned segment. Matches METADATA max exactly; VECTOR segments get a few seq_ids ahead (queue skips an already-indexed window — an acceptable loss vs. resetting to 0 and re-processing everything). - --from-sidecar copies clean values from a pre-corruption sqlite db. - Backs up chroma.sqlite3, closes chroma handles, atomic UPDATEs, post-repair verification that raises MaxSeqIdVerificationError if any row is still above threshold. Tests: 8 new in tests/test_repair.py (detection, heuristic, sidecar, dry-run, segment filter, no-op, backup, rollback-on-verify-failure). 3 new in tests/test_backends.py (max_seq_id untouched by shim, sysdb-10 prefix skipped in embeddings, legacy big-endian u64 BLOBs still convert). Full suite: 1103 passed.
The BLOB-seq_id migration shim (PR MemPalace#664) ran int.from_bytes(..., 'big') over every BLOB in max_seq_id, including chromadb 1.5.x's own native format (b'\x11\x11' + 6 ASCII digits). That conversion yields a ~1.23e18 integer that silently suppresses every subsequent embeddings_queue write for the affected segment (queue filter is seq_id > start), causing silent drawer-write drops after a 1.5.x upgrade. Two-part fix: 1. Shim narrowing (mempalace/backends/chroma.py) - Drop max_seq_id from the shim loop. chromadb owns that column's format; we don't reinterpret it. - Defense-in-depth: skip rows in embeddings whose seq_id BLOB has the sysdb-10 b'\x11\x11' prefix rather than misconvert. 2. Recovery command (mempalace/repair.py, mempalace/cli.py) - mempalace repair --mode max-seq-id [--segment <uuid>] [--from-sidecar <path>] [--dry-run] [--yes] [--no-backup] - Detects poisoned rows via threshold (seq_id > 2**53). - Default heuristic: MAX(embeddings.seq_id) over the collection owning the poisoned segment. Matches METADATA max exactly; VECTOR segments get a few seq_ids ahead (queue skips an already-indexed window — an acceptable loss vs. resetting to 0 and re-processing everything). - --from-sidecar copies clean values from a pre-corruption sqlite db. - Backs up chroma.sqlite3, closes chroma handles, atomic UPDATEs, post-repair verification that raises MaxSeqIdVerificationError if any row is still above threshold. Tests: 8 new in tests/test_repair.py (detection, heuristic, sidecar, dry-run, segment filter, no-op, backup, rollback-on-verify-failure). 3 new in tests/test_backends.py (max_seq_id untouched by shim, sysdb-10 prefix skipped in embeddings, legacy big-endian u64 BLOBs still convert). Full suite: 1103 passed.
The BLOB-seq_id migration shim (PR MemPalace#664) ran int.from_bytes(..., 'big') over every BLOB in max_seq_id, including chromadb 1.5.x's own native format (b'\x11\x11' + 6 ASCII digits). That conversion yields a ~1.23e18 integer that silently suppresses every subsequent embeddings_queue write for the affected segment (queue filter is seq_id > start), causing silent drawer-write drops after a 1.5.x upgrade. Two-part fix: 1. Shim narrowing (mempalace/backends/chroma.py) - Drop max_seq_id from the shim loop. chromadb owns that column's format; we don't reinterpret it. - Defense-in-depth: skip rows in embeddings whose seq_id BLOB has the sysdb-10 b'\x11\x11' prefix rather than misconvert. 2. Recovery command (mempalace/repair.py, mempalace/cli.py) - mempalace repair --mode max-seq-id [--segment <uuid>] [--from-sidecar <path>] [--dry-run] [--yes] [--no-backup] - Detects poisoned rows via threshold (seq_id > 2**53). - Default heuristic: MAX(embeddings.seq_id) over the collection owning the poisoned segment. Matches METADATA max exactly; VECTOR segments get a few seq_ids ahead (queue skips an already-indexed window — an acceptable loss vs. resetting to 0 and re-processing everything). - --from-sidecar copies clean values from a pre-corruption sqlite db. - Backs up chroma.sqlite3, closes chroma handles, atomic UPDATEs, post-repair verification that raises MaxSeqIdVerificationError if any row is still above threshold. Tests: 8 new in tests/test_repair.py (detection, heuristic, sidecar, dry-run, segment filter, no-op, backup, rollback-on-verify-failure). 3 new in tests/test_backends.py (max_seq_id untouched by shim, sysdb-10 prefix skipped in embeddings, legacy big-endian u64 BLOBs still convert). Full suite: 1103 passed.
The BLOB-seq_id migration shim (PR MemPalace#664) ran int.from_bytes(..., 'big') over every BLOB in max_seq_id, including chromadb 1.5.x's own native format (b'\x11\x11' + 6 ASCII digits). That conversion yields a ~1.23e18 integer that silently suppresses every subsequent embeddings_queue write for the affected segment (queue filter is seq_id > start), causing silent drawer-write drops after a 1.5.x upgrade. Two-part fix: 1. Shim narrowing (mempalace/backends/chroma.py) - Drop max_seq_id from the shim loop. chromadb owns that column's format; we don't reinterpret it. - Defense-in-depth: skip rows in embeddings whose seq_id BLOB has the sysdb-10 b'\x11\x11' prefix rather than misconvert. 2. Recovery command (mempalace/repair.py, mempalace/cli.py) - mempalace repair --mode max-seq-id [--segment <uuid>] [--from-sidecar <path>] [--dry-run] [--yes] [--no-backup] - Detects poisoned rows via threshold (seq_id > 2**53). - Default heuristic: MAX(embeddings.seq_id) over the collection owning the poisoned segment. Matches METADATA max exactly; VECTOR segments get a few seq_ids ahead (queue skips an already-indexed window — an acceptable loss vs. resetting to 0 and re-processing everything). - --from-sidecar copies clean values from a pre-corruption sqlite db. - Backs up chroma.sqlite3, closes chroma handles, atomic UPDATEs, post-repair verification that raises MaxSeqIdVerificationError if any row is still above threshold. Tests: 8 new in tests/test_repair.py (detection, heuristic, sidecar, dry-run, segment filter, no-op, backup, rollback-on-verify-failure). 3 new in tests/test_backends.py (max_seq_id untouched by shim, sysdb-10 prefix skipped in embeddings, legacy big-endian u64 BLOBs still convert). Full suite: 1103 passed.
Summary
_fix_blob_seq_ids()tobackends/chroma.pythat detects and repairs BLOB-typedseq_idvalues left behind by the ChromaDB 0.6.x → 1.5.x migrationPersistentClientis created (the Rust compactor fires on init and crashes on BLOB values)embeddingsandmax_seq_idtablesProblem
Users upgrading from chromadb 0.6.x to 1.5.x hit:
mismatched types; Rust type u64 (as SQL type INTEGER) is not compatible with SQL type BLOB. The auto-migration creates the schema but doesn't convert existing BLOB rows.Test plan