Skip to content

fix: auto-repair BLOB seq_ids from chromadb 0.6→1.5 migration#664

Merged
bensig merged 2 commits intoMemPalace:developfrom
jphein:fix/blob-seq-id-migration
Apr 12, 2026
Merged

fix: auto-repair BLOB seq_ids from chromadb 0.6→1.5 migration#664
bensig merged 2 commits intoMemPalace:developfrom
jphein:fix/blob-seq-id-migration

Conversation

@jphein
Copy link
Copy Markdown
Collaborator

@jphein jphein commented Apr 12, 2026

Summary

  • Adds _fix_blob_seq_ids() to backends/chroma.py that detects and repairs BLOB-typed seq_id values left behind by the ChromaDB 0.6.x → 1.5.x migration
  • Runs automatically before PersistentClient is created (the Rust compactor fires on init and crashes on BLOB values)
  • Converts big-endian 8-byte BLOBs to INTEGER in both embeddings and max_seq_id tables
  • No-op when database doesn't exist or seq_ids are already integers

Problem

Users upgrading from chromadb 0.6.x to 1.5.x hit: mismatched types; Rust type u64 (as SQL type INTEGER) is not compatible with SQL type BLOB. The auto-migration creates the schema but doesn't convert existing BLOB rows.

Test plan

  • Test: BLOB seq_ids are converted to correct integers
  • Test: no-op when seq_ids are already integers
  • Test: no-op when chroma.sqlite3 doesn't exist
  • All existing backend tests still pass
  • Lint clean (ruff)

ChromaDB 0.6.x stored seq_id as big-endian 8-byte BLOBs. The 1.5.x
auto-migration doesn't convert existing rows, causing the Rust
compactor to crash on init. This detects and repairs BLOB seq_ids
in both embeddings and max_seq_id tables before PersistentClient
is created.

Fixes the "mismatched types; Rust type u64 is not compatible with
SQL type BLOB" error for users upgrading from 0.6.x.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Copilot AI review requested due to automatic review settings April 12, 2026 03:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a startup-time repair step to handle a known ChromaDB SQLite migration issue where seq_id values remain stored as BLOBs, preventing ChromaDB 1.5.x from initializing successfully.

Changes:

  • Introduces _fix_blob_seq_ids() to convert big-endian 8-byte BLOB seq_id values to INTEGER in embeddings and max_seq_id.
  • Runs the repair automatically before creating chromadb.PersistentClient.
  • Adds backend tests covering conversion, already-integer no-op, and missing-database no-op.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
mempalace/backends/chroma.py Adds automatic SQLite repair for BLOB seq_id values prior to Chroma client initialization.
tests/test_backends.py Adds tests validating the repair behavior and ensuring safe no-op conditions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread mempalace/backends/chroma.py Outdated
Comment on lines +28 to +42
conn = sqlite3.connect(db_path)
for table in ("embeddings", "max_seq_id"):
try:
rows = conn.execute(
f"SELECT rowid, seq_id FROM {table} WHERE typeof(seq_id) = 'blob'"
).fetchall()
except sqlite3.OperationalError:
continue
if not rows:
continue
updates = [(int.from_bytes(blob, byteorder="big"), rowid) for rowid, blob in rows]
conn.executemany(f"UPDATE {table} SET seq_id = ? WHERE rowid = ?", updates)
logger.info("Fixed %d BLOB seq_ids in %s", len(updates), table)
conn.commit()
conn.close()
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If any exception occurs after opening the SQLite connection (e.g., during SELECT/update/commit), the connection is never closed because conn.close() is only reached on the happy path. This can leave the DB file locked and cause subsequent PersistentClient initialization to fail. Use a context manager (with sqlite3.connect(...) as conn:) or a finally: conn.close() to guarantee closure (and let the context manager handle commit/rollback correctly).

Suggested change
conn = sqlite3.connect(db_path)
for table in ("embeddings", "max_seq_id"):
try:
rows = conn.execute(
f"SELECT rowid, seq_id FROM {table} WHERE typeof(seq_id) = 'blob'"
).fetchall()
except sqlite3.OperationalError:
continue
if not rows:
continue
updates = [(int.from_bytes(blob, byteorder="big"), rowid) for rowid, blob in rows]
conn.executemany(f"UPDATE {table} SET seq_id = ? WHERE rowid = ?", updates)
logger.info("Fixed %d BLOB seq_ids in %s", len(updates), table)
conn.commit()
conn.close()
with sqlite3.connect(db_path) as conn:
for table in ("embeddings", "max_seq_id"):
try:
rows = conn.execute(
f"SELECT rowid, seq_id FROM {table} WHERE typeof(seq_id) = 'blob'"
).fetchall()
except sqlite3.OperationalError:
continue
if not rows:
continue
updates = [(int.from_bytes(blob, byteorder="big"), rowid) for rowid, blob in rows]
conn.executemany(f"UPDATE {table} SET seq_id = ? WHERE rowid = ?", updates)
logger.info("Fixed %d BLOB seq_ids in %s", len(updates), table)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — switched to context manager. 595991e

Comment thread mempalace/backends/chroma.py Outdated
Comment on lines +43 to +44
except Exception as e:
logger.warning("Could not fix BLOB seq_ids: %s", e)
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The broad except Exception as e: logger.warning(...) drops the traceback, making it hard to diagnose why the repair failed (and the user will likely just see the later Chroma crash). Consider logger.exception(...) and include db_path in the message so failures are actionable.

Suggested change
except Exception as e:
logger.warning("Could not fix BLOB seq_ids: %s", e)
except Exception:
logger.exception("Could not fix BLOB seq_ids in %s", db_path)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — changed to logger.exception() which preserves the full traceback, and included db_path in the message for easier debugging. 595991e

…repair

Addresses Copilot review feedback on MemPalace#664.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@bensig bensig merged commit abc99f4 into MemPalace:develop Apr 12, 2026
@bensig
Copy link
Copy Markdown
Collaborator

bensig commented Apr 12, 2026

Merged. Two notes for follow-up:

  1. If the migration fails mid-way, the exception is silently swallowed and PersistentClient init proceeds with a potentially incomplete repair. Consider returning a boolean or re-raising so callers can abort.

  2. int.from_bytes(blob, byteorder="big") has no length validation — a malformed BLOB of unexpected length could produce wrong seq_id values. Adding an assert len(blob) == 8 guard would catch this.

Neither is blocking — the fix is valuable as-is for the common 0.6→1.5 migration path.

jphein added a commit to jphein/mempalace that referenced this pull request Apr 12, 2026
@igorls igorls mentioned this pull request Apr 13, 2026
4 tasks
@jphein jphein deleted the fix/blob-seq-id-migration branch April 19, 2026 03:51
sha2fiddy added a commit to sha2fiddy/mempalace that referenced this pull request Apr 23, 2026
The BLOB-seq_id migration shim (PR MemPalace#664) ran int.from_bytes(..., 'big')
over every BLOB in max_seq_id, including chromadb 1.5.x's own native
format (b'\x11\x11' + 6 ASCII digits). That conversion yields a ~1.23e18
integer that silently suppresses every subsequent embeddings_queue write
for the affected segment (queue filter is seq_id > start), causing
silent drawer-write drops after a 1.5.x upgrade.

Two-part fix:

1. Shim narrowing (mempalace/backends/chroma.py)
   - Drop max_seq_id from the shim loop. chromadb owns that column's
     format; we don't reinterpret it.
   - Defense-in-depth: skip rows in embeddings whose seq_id BLOB has the
     sysdb-10 b'\x11\x11' prefix rather than misconvert.

2. Recovery command (mempalace/repair.py, mempalace/cli.py)
   - mempalace repair --mode max-seq-id [--segment <uuid>]
     [--from-sidecar <path>] [--dry-run] [--yes] [--no-backup]
   - Detects poisoned rows via threshold (seq_id > 2**53).
   - Default heuristic: MAX(embeddings.seq_id) over the collection owning
     the poisoned segment. Matches METADATA max exactly; VECTOR segments
     get a few seq_ids ahead (queue skips an already-indexed window — an
     acceptable loss vs. resetting to 0 and re-processing everything).
   - --from-sidecar copies clean values from a pre-corruption sqlite db.
   - Backs up chroma.sqlite3, closes chroma handles, atomic UPDATEs,
     post-repair verification that raises MaxSeqIdVerificationError if
     any row is still above threshold.

Tests: 8 new in tests/test_repair.py (detection, heuristic, sidecar,
dry-run, segment filter, no-op, backup, rollback-on-verify-failure).
3 new in tests/test_backends.py (max_seq_id untouched by shim,
sysdb-10 prefix skipped in embeddings, legacy big-endian u64 BLOBs
still convert). Full suite: 1103 passed.
sha2fiddy added a commit to sha2fiddy/mempalace that referenced this pull request Apr 26, 2026
The BLOB-seq_id migration shim (PR MemPalace#664) ran int.from_bytes(..., 'big')
over every BLOB in max_seq_id, including chromadb 1.5.x's own native
format (b'\x11\x11' + 6 ASCII digits). That conversion yields a ~1.23e18
integer that silently suppresses every subsequent embeddings_queue write
for the affected segment (queue filter is seq_id > start), causing
silent drawer-write drops after a 1.5.x upgrade.

Two-part fix:

1. Shim narrowing (mempalace/backends/chroma.py)
   - Drop max_seq_id from the shim loop. chromadb owns that column's
     format; we don't reinterpret it.
   - Defense-in-depth: skip rows in embeddings whose seq_id BLOB has the
     sysdb-10 b'\x11\x11' prefix rather than misconvert.

2. Recovery command (mempalace/repair.py, mempalace/cli.py)
   - mempalace repair --mode max-seq-id [--segment <uuid>]
     [--from-sidecar <path>] [--dry-run] [--yes] [--no-backup]
   - Detects poisoned rows via threshold (seq_id > 2**53).
   - Default heuristic: MAX(embeddings.seq_id) over the collection owning
     the poisoned segment. Matches METADATA max exactly; VECTOR segments
     get a few seq_ids ahead (queue skips an already-indexed window — an
     acceptable loss vs. resetting to 0 and re-processing everything).
   - --from-sidecar copies clean values from a pre-corruption sqlite db.
   - Backs up chroma.sqlite3, closes chroma handles, atomic UPDATEs,
     post-repair verification that raises MaxSeqIdVerificationError if
     any row is still above threshold.

Tests: 8 new in tests/test_repair.py (detection, heuristic, sidecar,
dry-run, segment filter, no-op, backup, rollback-on-verify-failure).
3 new in tests/test_backends.py (max_seq_id untouched by shim,
sysdb-10 prefix skipped in embeddings, legacy big-endian u64 BLOBs
still convert). Full suite: 1103 passed.
igorls pushed a commit to sha2fiddy/mempalace that referenced this pull request Apr 27, 2026
The BLOB-seq_id migration shim (PR MemPalace#664) ran int.from_bytes(..., 'big')
over every BLOB in max_seq_id, including chromadb 1.5.x's own native
format (b'\x11\x11' + 6 ASCII digits). That conversion yields a ~1.23e18
integer that silently suppresses every subsequent embeddings_queue write
for the affected segment (queue filter is seq_id > start), causing
silent drawer-write drops after a 1.5.x upgrade.

Two-part fix:

1. Shim narrowing (mempalace/backends/chroma.py)
   - Drop max_seq_id from the shim loop. chromadb owns that column's
     format; we don't reinterpret it.
   - Defense-in-depth: skip rows in embeddings whose seq_id BLOB has the
     sysdb-10 b'\x11\x11' prefix rather than misconvert.

2. Recovery command (mempalace/repair.py, mempalace/cli.py)
   - mempalace repair --mode max-seq-id [--segment <uuid>]
     [--from-sidecar <path>] [--dry-run] [--yes] [--no-backup]
   - Detects poisoned rows via threshold (seq_id > 2**53).
   - Default heuristic: MAX(embeddings.seq_id) over the collection owning
     the poisoned segment. Matches METADATA max exactly; VECTOR segments
     get a few seq_ids ahead (queue skips an already-indexed window — an
     acceptable loss vs. resetting to 0 and re-processing everything).
   - --from-sidecar copies clean values from a pre-corruption sqlite db.
   - Backs up chroma.sqlite3, closes chroma handles, atomic UPDATEs,
     post-repair verification that raises MaxSeqIdVerificationError if
     any row is still above threshold.

Tests: 8 new in tests/test_repair.py (detection, heuristic, sidecar,
dry-run, segment filter, no-op, backup, rollback-on-verify-failure).
3 new in tests/test_backends.py (max_seq_id untouched by shim,
sysdb-10 prefix skipped in embeddings, legacy big-endian u64 BLOBs
still convert). Full suite: 1103 passed.
lealvona pushed a commit to lealvona/mempalace that referenced this pull request Apr 29, 2026
The BLOB-seq_id migration shim (PR MemPalace#664) ran int.from_bytes(..., 'big')
over every BLOB in max_seq_id, including chromadb 1.5.x's own native
format (b'\x11\x11' + 6 ASCII digits). That conversion yields a ~1.23e18
integer that silently suppresses every subsequent embeddings_queue write
for the affected segment (queue filter is seq_id > start), causing
silent drawer-write drops after a 1.5.x upgrade.

Two-part fix:

1. Shim narrowing (mempalace/backends/chroma.py)
   - Drop max_seq_id from the shim loop. chromadb owns that column's
     format; we don't reinterpret it.
   - Defense-in-depth: skip rows in embeddings whose seq_id BLOB has the
     sysdb-10 b'\x11\x11' prefix rather than misconvert.

2. Recovery command (mempalace/repair.py, mempalace/cli.py)
   - mempalace repair --mode max-seq-id [--segment <uuid>]
     [--from-sidecar <path>] [--dry-run] [--yes] [--no-backup]
   - Detects poisoned rows via threshold (seq_id > 2**53).
   - Default heuristic: MAX(embeddings.seq_id) over the collection owning
     the poisoned segment. Matches METADATA max exactly; VECTOR segments
     get a few seq_ids ahead (queue skips an already-indexed window — an
     acceptable loss vs. resetting to 0 and re-processing everything).
   - --from-sidecar copies clean values from a pre-corruption sqlite db.
   - Backs up chroma.sqlite3, closes chroma handles, atomic UPDATEs,
     post-repair verification that raises MaxSeqIdVerificationError if
     any row is still above threshold.

Tests: 8 new in tests/test_repair.py (detection, heuristic, sidecar,
dry-run, segment filter, no-op, backup, rollback-on-verify-failure).
3 new in tests/test_backends.py (max_seq_id untouched by shim,
sysdb-10 prefix skipped in embeddings, legacy big-endian u64 BLOBs
still convert). Full suite: 1103 passed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants