Skip to content

fix(repair): decode BLOB embeddings.seq_id in max-seq-id heuristic (#1254)#1288

Merged
igorls merged 1 commit intodevelopfrom
fix/repair-max-seq-id-blob-heuristic
May 1, 2026
Merged

fix(repair): decode BLOB embeddings.seq_id in max-seq-id heuristic (#1254)#1288
igorls merged 1 commit intodevelopfrom
fix/repair-max-seq-id-blob-heuristic

Conversation

@igorls
Copy link
Copy Markdown
Member

@igorls igorls commented May 1, 2026

Summary

Closes #1254.

_compute_heuristic_seq_id calls int(row[0]) on the result of MAX(e.seq_id). On palaces where chromadb 1.5.x has been writing seq_ids natively (8-byte big-endian uint64 BLOB), MAX(...) returns a bytes object and int(b'...') raises:

ValueError: invalid literal for int() with base 10: b'\x00\x00\x00\x00\x00\x00\x2D\xAE'

This is raised before the dry-run summary can print, so users have no path through the recovery feature added in #1135 — which is the only documented un-poison route for palaces hit by the original PR #664 shim bug.

Fix

Decode BLOB return values via int.from_bytes(val, "big") and keep the existing int(val) path for INTEGER rows. The existing _read_sidecar_seq_ids already explicitly rejects BLOB-typed sidecars (repair.py:592); this change brings the heuristic path's tolerance for BLOBs in line with the on-disk reality of chromadb 1.5.x palaces.

val = row[0]
if isinstance(val, (bytes, bytearray)):
    return int.from_bytes(val, "big")
return int(val)

Test plan

  • New regression test test_max_seq_id_heuristic_decodes_blob_embeddings_seq_id seeds an 8-byte big-endian BLOB row in embeddings.seq_id and asserts the heuristic surfaces the correct integer for both VECTOR and METADATA segments.
  • Verified the test fails on develop tip without the fix (ValueError: invalid literal for int() with base 10: b'...').
  • All 36 tests in tests/test_repair.py pass with the fix in place.
  • ruff check + ruff format --check clean (CI-pinned 0.4.x).

…1254)

`_compute_heuristic_seq_id` ran `int(row[0])` directly on the result
of `MAX(e.seq_id)`. On palaces where chromadb 1.5.x has been writing
seq_ids natively (8-byte big-endian uint64 BLOB), that raises
`ValueError: invalid literal for int() with base 10: b'...'` before
the dry-run can print, leaving users with no path through the
recovery feature added in #1135 — the only documented un-poison
route for palaces hit by the original PR #664 shim bug.

Decode BLOB return values via `int.from_bytes(val, "big")` and
keep the existing `int(val)` path for INTEGER rows. Regression
test seeds a BLOB row in `embeddings.seq_id` and asserts the
heuristic surfaces the correct integer.
Copilot AI review requested due to automatic review settings May 1, 2026 01:05
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes mempalace repair --mode max-seq-id crashing when MAX(embeddings.seq_id) returns a BLOB on palaces where ChromaDB 1.5.x stores embeddings.seq_id as an 8-byte big-endian uint64 BLOB, ensuring the recovery flow can complete (including dry-run).

Changes:

  • Update _compute_heuristic_seq_id to decode bytes/bytearray values via int.from_bytes(..., "big") and keep the existing integer path.
  • Add a regression test that seeds a BLOB embeddings.seq_id and asserts the heuristic produces the correct integer for both VECTOR and METADATA segments.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
mempalace/repair.py Decode BLOB-typed MAX(embeddings.seq_id) results to prevent int(bytes) crashes during heuristic computation.
tests/test_repair.py Add regression coverage for BLOB-typed embeddings.seq_id in the max-seq-id repair heuristic.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@igorls igorls merged commit 7bc6090 into develop May 1, 2026
10 checks passed
igorls added a commit that referenced this pull request May 1, 2026
Three fixes landed on develop after the initial release-prep cut and
were brought in via the develop merge. Document them in the 3.3.4
Bug Fixes section so the release notes reflect what users will
actually receive.

- #1287 - HNSW divergence floor scales with hnsw:sync_threshold
  (resolves a silent-fallback regression introduced by the
  interaction between #1191 and #1227 in this release)
- #1262 - ChromaBackend get_or_create_collection split, fixing the
  stop-hook SIGSEGV class on legacy palaces with mismatched stored
  metadata (#1089)
- #1288 / #1254 - repair --mode max-seq-id heuristic now decodes
  BLOB-typed embeddings.seq_id, restoring the un-poison path added
  in #1135 for palaces where chromadb 1.5.x writes seq_ids natively
igorls added a commit that referenced this pull request May 1, 2026
The release was originally cut on 2026-04-27 but did not tag that day.
Three additional bug fixes have been folded in since then (#1262,
#1287, #1288) and the actual tag will happen on 2026-04-30. Update
the header date to match.
xcarbo added a commit to xcarbo/mempalace that referenced this pull request May 1, 2026
Catches up xdev-patches with 112 commits from MemPalace/develop, including:
- v3.3.4 release
- MemPalace#1262/MemPalace#1289 ChromaDB collection-reopen crash fix (relevant to long-running
  MCP server & mempalace-api)
- MemPalace#1287 HNSW divergence floor fix
- MemPalace#1288 BLOB seq_id decode in repair
- MemPalace#1180 cross-wing tunnels by shared topics
- MemPalace#1194 wing-slug normalization for hyphenated dirs

Conflict resolution: hooks_cli.py and mcp_server.py both had local patches
(6ef44cb route CC transcripts via convo_miner; 3fad61d allow leading dash)
that overlap with upstream fixes (MemPalace#1231, MemPalace#1194). Took upstream entirely on
those two files — upstream's version handles separate transcript/project
ingest, uses _mempalace_python(), and adds _pin_hnsw_threads. The local
config.py regex relaxation auto-merged cleanly and is preserved.

Safety tag: pre-upstream-merge-20260501-091227 (rollback target).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

repair --mode max-seq-id crashes with ValueError on BLOB-typed embeddings.seq_id rows

2 participants