fix(repair): decode BLOB embeddings.seq_id in max-seq-id heuristic (#1254)#1288
Merged
fix(repair): decode BLOB embeddings.seq_id in max-seq-id heuristic (#1254)#1288
Conversation
…1254) `_compute_heuristic_seq_id` ran `int(row[0])` directly on the result of `MAX(e.seq_id)`. On palaces where chromadb 1.5.x has been writing seq_ids natively (8-byte big-endian uint64 BLOB), that raises `ValueError: invalid literal for int() with base 10: b'...'` before the dry-run can print, leaving users with no path through the recovery feature added in #1135 — the only documented un-poison route for palaces hit by the original PR #664 shim bug. Decode BLOB return values via `int.from_bytes(val, "big")` and keep the existing `int(val)` path for INTEGER rows. Regression test seeds a BLOB row in `embeddings.seq_id` and asserts the heuristic surfaces the correct integer.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes mempalace repair --mode max-seq-id crashing when MAX(embeddings.seq_id) returns a BLOB on palaces where ChromaDB 1.5.x stores embeddings.seq_id as an 8-byte big-endian uint64 BLOB, ensuring the recovery flow can complete (including dry-run).
Changes:
- Update
_compute_heuristic_seq_idto decodebytes/bytearrayvalues viaint.from_bytes(..., "big")and keep the existing integer path. - Add a regression test that seeds a BLOB
embeddings.seq_idand asserts the heuristic produces the correct integer for both VECTOR and METADATA segments.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
mempalace/repair.py |
Decode BLOB-typed MAX(embeddings.seq_id) results to prevent int(bytes) crashes during heuristic computation. |
tests/test_repair.py |
Add regression coverage for BLOB-typed embeddings.seq_id in the max-seq-id repair heuristic. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
igorls
added a commit
that referenced
this pull request
May 1, 2026
Three fixes landed on develop after the initial release-prep cut and were brought in via the develop merge. Document them in the 3.3.4 Bug Fixes section so the release notes reflect what users will actually receive. - #1287 - HNSW divergence floor scales with hnsw:sync_threshold (resolves a silent-fallback regression introduced by the interaction between #1191 and #1227 in this release) - #1262 - ChromaBackend get_or_create_collection split, fixing the stop-hook SIGSEGV class on legacy palaces with mismatched stored metadata (#1089) - #1288 / #1254 - repair --mode max-seq-id heuristic now decodes BLOB-typed embeddings.seq_id, restoring the un-poison path added in #1135 for palaces where chromadb 1.5.x writes seq_ids natively
xcarbo
added a commit
to xcarbo/mempalace
that referenced
this pull request
May 1, 2026
Catches up xdev-patches with 112 commits from MemPalace/develop, including: - v3.3.4 release - MemPalace#1262/MemPalace#1289 ChromaDB collection-reopen crash fix (relevant to long-running MCP server & mempalace-api) - MemPalace#1287 HNSW divergence floor fix - MemPalace#1288 BLOB seq_id decode in repair - MemPalace#1180 cross-wing tunnels by shared topics - MemPalace#1194 wing-slug normalization for hyphenated dirs Conflict resolution: hooks_cli.py and mcp_server.py both had local patches (6ef44cb route CC transcripts via convo_miner; 3fad61d allow leading dash) that overlap with upstream fixes (MemPalace#1231, MemPalace#1194). Took upstream entirely on those two files — upstream's version handles separate transcript/project ingest, uses _mempalace_python(), and adds _pin_hnsw_threads. The local config.py regex relaxation auto-merged cleanly and is preserved. Safety tag: pre-upstream-merge-20260501-091227 (rollback target). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #1254.
_compute_heuristic_seq_idcallsint(row[0])on the result ofMAX(e.seq_id). On palaces where chromadb 1.5.x has been writing seq_ids natively (8-byte big-endian uint64 BLOB),MAX(...)returns abytesobject andint(b'...')raises:This is raised before the dry-run summary can print, so users have no path through the recovery feature added in #1135 — which is the only documented un-poison route for palaces hit by the original PR #664 shim bug.
Fix
Decode BLOB return values via
int.from_bytes(val, "big")and keep the existingint(val)path for INTEGER rows. The existing_read_sidecar_seq_idsalready explicitly rejects BLOB-typed sidecars (repair.py:592); this change brings the heuristic path's tolerance for BLOBs in line with the on-disk reality of chromadb 1.5.x palaces.Test plan
test_max_seq_id_heuristic_decodes_blob_embeddings_seq_idseeds an 8-byte big-endian BLOB row inembeddings.seq_idand asserts the heuristic surfaces the correct integer for both VECTOR and METADATA segments.ValueError: invalid literal for int() with base 10: b'...').tests/test_repair.pypass with the fix in place.ruff check+ruff format --checkclean (CI-pinned 0.4.x).