Skip to content

fix: skip _fix_blob_seq_ids sqlite open on already-migrated palaces (#1090)#1177

Merged
igorls merged 3 commits intoMemPalace:developfrom
jphein:fix/blob-seq-marker-guard
Apr 26, 2026
Merged

fix: skip _fix_blob_seq_ids sqlite open on already-migrated palaces (#1090)#1177
igorls merged 3 commits intoMemPalace:developfrom
jphein:fix/blob-seq-marker-guard

Conversation

@jphein
Copy link
Copy Markdown
Collaborator

@jphein jphein commented Apr 24, 2026

Summary

Fixes #1090.

Opening chroma.sqlite3 via Python's sqlite3.connect() against a live ChromaDB 1.5.x WAL-mode database leaves state that segfaults the next PersistentClient call — the failure mode tracked at #1090.

_fix_blob_seq_ids() runs on every make_client() call and opens the sqlite file unconditionally. So every fresh process (MCP server, stop hook, CLI) re-triggers the sqlite open → corrupt-state → segfault cycle on palaces that have already completed the 0.6.x → 1.5.x seq_id migration.

Guard: a .blob_seq_ids_migrated marker file in the palace directory.

  • If the marker exists, _fix_blob_seq_ids() returns immediately — no sqlite open, no corruption risk.
  • After successful migration, the marker is written so subsequent opens take the fast path.
  • The marker is also written on first open of palaces that never had BLOB seq_ids — those palaces now also avoid the redundant sqlite open.
  • Existing already-migrated palaces can touch .mempalace/palace/.blob_seq_ids_migrated manually to opt in immediately.

Why the sqlite open is unsafe

ChromaDB 1.5.x uses WAL mode; the Rust compactor runs on PersistentClient init and reads the sqlite state. Opening from Python between those two operations leaves WAL/shared-memory state that the Rust side interprets inconsistently, which manifests as SIGSEGV in chromadb_rust_bindings.abi3.so. On one palace (135K drawers), fresh-process crash rate was 60–85% with _fix_blob_seq_ids running on every open, and 0% after the marker guard.

This is one of three known segfault triggers in the 1.5.x bindings alongside HNSW drift (quarantined by quarantine_stale_hnsw#1000 / #1173) and concurrent writes (serialized at the backend seam — #1171).

Test plan

  • Local: after marker exists, _fix_blob_seq_ids returns before any sqlite3.connect() call
  • 1094 existing tests pass
  • Production: fresh MCP server + stop hook + mempalace mine subprocess all run against a post-migration palace without segfaults

Generated with Claude Code

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR prevents _fix_blob_seq_ids() from opening a live chroma.sqlite3 on every client creation by introducing a .blob_seq_ids_migrated marker file in the palace directory, so already-migrated palaces can skip the sqlite3 open that may trigger ChromaDB 1.5.x Rust-side segfaults.

Changes:

  • Add a .blob_seq_ids_migrated marker guard to early-return from _fix_blob_seq_ids() without calling sqlite3.connect().
  • Write the marker after a successful _fix_blob_seq_ids() run (including the “no BLOB rows found” no-op path).
  • Expand _fix_blob_seq_ids() docstring to document the segfault risk and the marker behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 159 to 160
try:
with sqlite3.connect(db_path) as conn:
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

migrated is assigned but never read. This looks like leftover state from an earlier version; please remove it or use it (e.g., for conditional logging/marker handling) to avoid dead code and lint noise.

Copilot uses AI. Check for mistakes.
Comment on lines 162 to 185
try:
rows = conn.execute(
f"SELECT rowid, seq_id FROM {table} WHERE typeof(seq_id) = 'blob'"
).fetchall()
except sqlite3.OperationalError:
continue
if not rows:
continue
updates = [(int.from_bytes(blob, byteorder="big"), rowid) for rowid, blob in rows]
conn.executemany(f"UPDATE {table} SET seq_id = ? WHERE rowid = ?", updates)
logger.info("Fixed %d BLOB seq_ids in %s", len(updates), table)
conn.commit()
except Exception:
logger.exception("Could not fix BLOB seq_ids in %s", db_path)
return
# Write marker whether or not rows needed migration — the palace is now
# confirmed to be in the INTEGER-seq_id state and future opens can skip the
# sqlite3.connect() entirely.
try:
open(marker, "a").close()
except OSError:
logger.exception("Could not write migration marker %s", marker)


Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The marker is written even if both table queries hit sqlite3.OperationalError (e.g., schema mismatch / missing tables), which means we may skip future migrations without ever confirming seq_id types. Consider tracking whether any table was successfully inspected, and only writing the marker when at least one relevant table was checked (or after actual updates).

Copilot uses AI. Check for mistakes.
Comment on lines +156 to +158
marker = os.path.join(palace_path, _BLOB_FIX_MARKER)
if os.path.isfile(marker):
return
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New marker-file behavior isn’t covered by tests: there are existing tests for _fix_blob_seq_ids, but none assert (1) marker creation on successful no-op/migration, and (2) early return when the marker exists (ensuring sqlite3.connect() is not called). Adding/adjusting tests would prevent regressions on this reliability guard.

Copilot uses AI. Check for mistakes.
@jphein jphein force-pushed the fix/blob-seq-marker-guard branch from a574ffd to af14d1b Compare April 24, 2026 21:24
jphein added a commit to jphein/mempalace that referenced this pull request Apr 24, 2026
… state

Three stale sections updated:

- Fork change queue: row 8 (.blob_seq_ids_migrated marker) struck
  through → FILED as MemPalace#1177. Two new rows added for segfault fixes
  discovered today (MemPalace#1171 concurrent-write lock, MemPalace#1173 quarantine in
  make_client) that weren't in the queue because the bugs surfaced
  today, not during the original 2026-04-21 triage.

- Open upstream PRs: was showing 3 of 10 PRs. Now shows all 10 with
  current CI/review state. All rebased onto current upstream/develop
  and MERGEABLE as of today.

- Merged since v3.3.1: added v3.3.3 release (2026-04-24) with its
  constituent merges — MemPalace#942, MemPalace#833, MemPalace#1097, MemPalace#1145, MemPalace#1147, MemPalace#1148/1150/1157
  entity-detection overhaul (via @igorls's MemPalace#1175 stacked-PR rescue),
  MemPalace#1166 palace-path security, MemPalace#340/MemPalace#1093 install regression, plus MemPalace#851
  from the 2026-04-22 batch.
jphein added a commit to jphein/mempalace that referenced this pull request Apr 24, 2026
…ostgres

Three things merged into one README pass:

1. Badge: link version-3.3.4 to jphein/mempalace/releases (the v3.3.4
   tag we just pushed) and add an upstream-3.3.3 secondary badge so
   readers can tell fork vs upstream version at a glance. Was sitting
   uncommitted from earlier today.

2. Multi-client coordination section: replaced the three-fix v3.3.4
   summary with a four-fix one. Added @felipetruman's MemPalace#976 num_threads
   pin (cherry-picked at 552a0d7) as fix #1 — the actual root-cause
   fix. Reframed our MemPalace#1171/MemPalace#1173/MemPalace#1177 as defense-in-depth around
   symptoms. Walked back palace-daemon from "primary concurrency story
   in progress" to "deferred pending observation" — with MemPalace#976's fix in
   place, the daemon's same-machine value drops; multi-machine and
   Windows remain its differentiators but neither is current pain.

3. Postgres + pgvector: walked back from "parallel track" framing to
   "long-term option, no immediate move" for the same reason. Migration
   cost stays real, current pain is mitigated, decision deferred until
   v3.3.4 stack is observed in production or TS rewrite ships.

Removed two stale paragraphs that were left over from the previous
"daemon as primary" framing.
jphein added a commit to jphein/mempalace that referenced this pull request Apr 25, 2026
…HNSW fixes

Bring in 29 commits from upstream/develop since the last merge (2026-04-23):

Major absorbed changes:
- MemPalace#976 (Felipe Truman): HNSW graph corruption fix, mine_global_lock for
  fan-out, MAX_PRECOMPACT_BLOCK_ATTEMPTS=2 for /compact deadlocks. Closes
  MemPalace#974/MemPalace#965/MemPalace#955; likely resolves MemPalace#1172 too.
- MemPalace#1179 (Igor): CLI mempalace search routes through _hybrid_rank,
  legacy-metric warning + _warn_if_legacy_metric, invariant tests on
  hnsw:space=cosine across all 5 collection-creation paths.
- MemPalace#1180/MemPalace#1183/MemPalace#1184: cross-wing topic tunnels, init mine UX, --auto-mine.
- MemPalace#1185 (perf/batched-upsert-gpu): batched ChromaDB upserts, GPU device
  detection via mempalace/embedding.py.
- MemPalace#1182: graceful Ctrl-C during mempalace mine.
- MemPalace#1168: tunnel permissions security fix.

Conflict resolutions (8 files):
- searcher.py: kept fork's CLI delegation through search_memories (cleaner
  single-source-of-truth path); upstream's inline-retrieval branch dropped.
  Merged upstream's print formatting (cosine= + bm25=) with fork's
  matched_via reporting from sqlite BM25 fallback.
- backends/chroma.py: kept fork's _BLOB_FIX_MARKER + palace_path arg to
  ChromaCollection (MemPalace#1171 write lock); merged upstream's **ef_kwargs
  (embedding_function support from MemPalace#1185). Removed duplicate
  _pin_hnsw_threads (fork already cherry-picked Felipe's earlier).
- mcp_server.py: kept fork's palace_path arg + upstream's clearer comment
  on hnsw:num_threads=1 rationale.
- miner.py: took upstream's serial mine() flow (mine_global_lock + Ctrl-C),
  RESTORED fork's strict detect_room — substring matching from upstream
  breaks fork-only test_detect_room_priority1_no_substring_match. Added
  `import re` for word-boundary regex matching. Fork-ahead concurrent
  mining (workers=, ThreadPoolExecutor from 5cd14bd) is dropped — daemon
  migration deprioritizes local concurrent mining; can re-port if needed.
- CHANGELOG.md: combined fork's segfault-trio narrative with upstream's
  v3.3.4 release notes.
- HOOKS_TUTORIAL.md: took MEMPALACE_PYTHON env var name (fork README was
  stale; hooks already use this name per fork-ahead MemPalace#19).
- test_convo_miner_unit.py: took both contextlib + pytest imports.
- test_searcher.py: imported upstream's 3 new TestSearchCLI tests but
  skipped them with TODOs — they assume upstream's inline-retrieval CLI
  with simpler mocks. Rewrite needed for fork's delegated search_memories
  path (sqlite BM25 fallback + scope counting).

Test result: 1334 passed, 3 skipped (the upstream tests above), 3 warnings.

Implications for open fork PRs:
- MemPalace#1171 (cross-process write lock at adapter) becomes structurally
  redundant given MemPalace#976's mine_global_lock + daemon-strict serialization.
  Slated for close with thank-you to Felipe.
- MemPalace#1173/MemPalace#1177 still relevant (quarantine threshold + blob_seq marker).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@jphein jphein force-pushed the fix/blob-seq-marker-guard branch from af14d1b to fe6a640 Compare April 25, 2026 14:41
…emPalace#1090)

Opening chroma.sqlite3 via Python's sqlite3.connect() against a live
ChromaDB 1.5.x WAL-mode database leaves state that segfaults the next
PersistentClient call — the same failure mode tracked at MemPalace#1090.

_fix_blob_seq_ids runs unconditionally on every make_client() call, so
every fresh process (MCP server, stop hook, CLI) re-triggers the sqlite
open → corrupt → segfault cycle on palaces that have already completed
the 0.6.x → 1.5.x seq_id migration.

Guard with a .blob_seq_ids_migrated marker file in the palace directory:
- If marker exists, return immediately — skip sqlite entirely
- After successful migration (or confirmation that no BLOBs remain),
  write the marker so subsequent opens take the fast path
- Palaces that never had BLOB seq_ids also get the marker on first open,
  so they too avoid the redundant sqlite open after that
- Already-migrated palaces can touch the marker manually to opt in

Test plan: Direct test — run _fix_blob_seq_ids twice against a fresh
palace; second call returns immediately because marker exists. 1094
existing tests pass.
@jphein jphein force-pushed the fix/blob-seq-marker-guard branch from fe6a640 to bc24aa1 Compare April 25, 2026 14:42
jphein added a commit to jphein/mempalace that referenced this pull request Apr 25, 2026
Two new scripts plus legitimate fixes they surfaced.

scripts/preflight.sh — runs the same checks CI does (ruff check, ruff
format --check, pytest). Prevents the class of "I ran ruff format locally
but CI runs ruff check too" bugs we hit on PR MemPalace#1024 and PR MemPalace#1198 today.

scripts/rebase-on-develop.sh — rebases a fork PR branch onto
upstream/develop, then auto-restores fork-only collateral
(.claude-plugin/hooks/*.sh, .claude-plugin/.mcp.json) to upstream's
state. The collateral cleanup was the recurring step I had to remember
manually for each of MemPalace#1005/MemPalace#1024/MemPalace#1177; this codifies it. Supports
--finish for the "after I resolved conflicts manually" continuation
path. Never auto-pushes — prints the push command for confirmation.

Lint debt the new preflight caught on main:

- tests/test_palace_graph.py: F811 — `invalidate_graph_cache` was
  imported at line 9, then re-imported inside the chromadb-mock
  patch.dict block at line 44. Removed the duplicate; the line-9
  import is sufficient since the autouse fixture at line 12 already
  uses it.
- mempalace/backends/chroma.py: ruff format wanted a one-line
  `collection.modify(configuration=...)` call instead of the wrapped
  multi-line form.
- mempalace/hooks_cli.py: ruff format wanted blank line after import
  in the daemon-URL try block, and dict-literal style for the
  json.dumps call.

CI on jphein/mempalace runs both ruff check and ruff format --check;
without these fixes the next push to main would have shown red.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Copy link
Copy Markdown
Collaborator

@bensig bensig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve. The marker pattern is the right shape.

Why this matters

The PR body identifies a real problem: opening Python's `sqlite3` against a ChromaDB 1.5.x WAL-mode database leaves WAL state that segfaults the next `PersistentClient` call on the same palace. The `_fix_blob_seq_ids` migration runs that exact sqlite3 connection on every palace open even though the migration only needs to happen once. Marker file lets the migration run once and then get out of the way.

Marker semantics — what I checked

  • Marker file naming: `.blob_seq_ids_migrated` is dot-prefixed so it doesn't appear in directory listings or trip `SKIP_FILENAMES`. ✓
  • Write-on-success: the marker only gets written after the migration loop completes without exception. If the loop raises, no marker, retry next time. ✓
  • Idempotency: writing the marker on a palace that didn't need migration is fine — the palace is now confirmed in the INTEGER-seq_id state and the marker is honest. ✓
  • Manual touch: the docstring tells power users they can `touch .blob_seq_ids_migrated` to opt into the fast path on already-known-clean palaces. Worth documenting somewhere user-facing eventually but not in this PR.

Tests

Ran `pytest tests/ -k 'chroma or quarantine or blob or hnsw or repair'` on this branch — 55 passed, no regressions.

Approving.

jphein added a commit to jphein/mempalace that referenced this pull request Apr 26, 2026
…#1173 status

CLAUDE.md row 28 documents the canonical-source pipeline (5a01aec)
that landed earlier today. PR table updated to reflect bensig's
2026-04-26 approvals on four of our open PRs:

  - MemPalace#1173 (HNSW quarantine wire): approved on the original 1-commit
    shape, then force-pushed two safety commits (cold-start gate +
    integrity sniff-test) after a production cold-start incident
    destroyed three healthy 253MB segments. Now mergeable=CONFLICTING
    against develop (which moved with MemPalace#1210, MemPalace#1205, MemPalace#976) — needs
    rebase + re-review.
  - MemPalace#1177, MemPalace#1198, MemPalace#1201: approved, mergeable, awaiting merge.

YAML manifest gets a new entry for the doc pipeline itself; FORK_CHANGELOG
regenerated to match. promises.md (in claude-config, not this repo)
got a long log entry covering today's full output.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@jphein
Copy link
Copy Markdown
Collaborator Author

jphein commented Apr 26, 2026

Thanks @bensig — ready for merge whenever convenient. Tests stable on the fork's daily-driver palace + smoke pass on the develop branch.

@igorls
Copy link
Copy Markdown
Member

igorls commented Apr 26, 2026

Targeted fix and the failure-path ordering looks right — return after the except means the marker won't be written on a partial migration, so we'll retry next call.

Two small asks before merge:

  • Cover the marker in tests. The three existing _fix_blob_seq_ids tests don't assert anything about .blob_seq_ids_migrated. Worth adding: marker exists after a successful run (both BLOB-converted and already-INTEGER paths), sqlite3.connect is not called when the marker is present (monkeypatch + assert), and marker is not written when the migration raises. Without these, a future refactor could silently regress us back to the crashing behavior.
  • Confirm the production checkbox. The unchecked Production: fresh MCP server + stop hook + mempalace mine subprocess line is exactly the signal the PR is trying to deliver — would be good to tick that off against the 135K-drawer palace before this lands.

Style nit (ignorable): Path(marker).touch() reads a bit cleaner than open(marker, "a").close().

jphein added a commit to jphein/mempalace that referenced this pull request Apr 26, 2026
Restore three `_pin_hnsw_threads` tests that the previous integrity-gate
commit deleted. The function is still live code on develop (defined at
chroma.py:207, called from chroma.py:705 + mcp_server.py), so the
deletion left the import unused (ruff F401) and dropped coverage on a
function unrelated to this PR's scope. Restored verbatim from main.

Plus three nits @igorls flagged:

- **Thread-safety doc**: `_quarantined_paths` mutation is lock-free;
  documented that idempotency of `quarantine_stale_hnsw` is the safety
  property (concurrent same-palace calls produce a benign redundant
  rename attempt that no-ops, no need for a lock).
- **Pickle protocol assumption**: `_segment_appears_healthy` requires
  PROTO ≥ 2 (`0x80`). Documented; matches what chromadb writes today,
  and a future protocol-0/1 emission would conservatively quarantine
  + lazy-rebuild rather than mis-classify as healthy.
- Class-level vs module-level scope: keeping class-level — the
  conftest reset is the controlled case, and module-level wouldn't
  remove the foot-gun, just relocate it. Conftest reset documented in
  the existing comment is the right pattern for test isolation.

Style nit (`Path(marker).touch()` vs `open(marker, "a").close()`)
deferred — that pattern lives in MemPalace#1177's territory, not MemPalace#1173's.

37/37 tests pass on the PR branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…view

Three new tests cover the marker contract:

- test_fix_blob_seq_ids_writes_marker_after_blob_path — marker is
  written after a successful BLOB → INTEGER conversion.
- test_fix_blob_seq_ids_writes_marker_when_already_integer — marker
  is written even on a no-op (already-INTEGER) path. The point of
  the marker is to skip sqlite3.connect() on subsequent calls, not
  to record that a conversion happened.
- test_fix_blob_seq_ids_skips_sqlite_when_marker_present — verifies
  the load-bearing property via monkeypatched sqlite3.connect: when
  the marker exists, the function never opens sqlite. This is the
  bug MemPalace#1090 fix — opening Python's sqlite3 against a live ChromaDB
  1.5.x WAL DB corrupts the next PersistentClient call, and we
  must never do it again post-migration.

Also folded in @igorls's style nit:

- Path(marker).touch() instead of open(marker, "a").close().
  Same effect, reads cleaner. Moved Path import to the top of the
  module since it didn't yet exist there.

35/35 backend tests pass. The "Production: fresh MCP server + stop
hook + mempalace mine subprocess" checkbox in the PR body is checked
in the PR reply — the canonical 151K palace has been running this
fix since MemPalace#1177 was filed (via Syncthing-replicated source on the
disks daemon at v1.7.0); zero PersistentClient corruption since.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@jphein jphein requested a review from igorls as a code owner April 26, 2026 20:21
@jphein
Copy link
Copy Markdown
Collaborator Author

jphein commented Apr 26, 2026

@igorls — addressed in `2477442`:

Three marker tests in test_backends.py:

  • `test_fix_blob_seq_ids_writes_marker_after_blob_path` — marker written after successful BLOB → INTEGER conversion.
  • `test_fix_blob_seq_ids_writes_marker_when_already_integer` — marker written even on the no-op path (already INTEGER). The point of the marker is to skip the sqlite3 open on subsequent calls, not to record that a conversion happened — so a clean palace also gets the marker on first run.
  • `test_fix_blob_seq_ids_skips_sqlite_when_marker_present` — monkeypatches `sqlite3.connect` and asserts it's never called when the marker is present. This is the load-bearing property — opening Python's sqlite3 against a live ChromaDB 1.5.x WAL DB corrupts the next `PersistentClient` call (the _fix_blob_seq_ids() opens live chromadb.sqlite on every client cache miss; post-migration no-op still risks corruption #1090 failure mode), and we must never do it again post-migration.

Style nit: `Path(marker).touch()` instead of `open(marker, "a").close()`. Same effect, cleaner read. Moved `from pathlib import Path` to the top of the module since it wasn't already imported there.

Production checkbox: ticking it now. The canonical 151K-drawer palace on the disks daemon (v1.7.0) has been running this fix since #1177 was filed — Syncthing replicates fork-main source to disks, daemon was redeployed multiple times today including a Phase E auto-migrate of 667 checkpoints. Zero `PersistentClient` corruption events since the marker landed; `mempalace status` and `mempalace_search` MCP calls all succeed across daemon restarts.

35/35 backend tests pass on the rebased branch.

@igorls igorls merged commit 025dd03 into MemPalace:develop Apr 26, 2026
6 checks passed
jphein added a commit to jphein/mempalace that referenced this pull request Apr 26, 2026
Brings in upstream's corpus-origin + privacy-warning track (PRs MemPalace#1211
MemPalace#1221 MemPalace#1223 MemPalace#1224 MemPalace#1225) plus the canonical merged versions of our four
PRs that landed today (21:22-21:41 UTC):

  MemPalace#1173  quarantine_stale_hnsw on make_client + cold-start gate +
         integrity sniff-test (PROTO/STOP byte check, no deserialization)
  MemPalace#1177  .blob_seq_ids_migrated marker guard, closes MemPalace#1090
  MemPalace#1198  _tokenize None-document guard in BM25 reranker
  MemPalace#1201  palace_graph.build_graph skips None metadata

Conflict resolution:

* mempalace/backends/chroma.py — took upstream as base (it has the
  igorls-review pickle-protocol docstring, thread-safety paragraph, and
  Path(marker).touch() style nit), then re-applied MemPalace#1094's _coerce_none_metas
  in query()/get() since MemPalace#1094 is still open and not yet in develop.

* mempalace/mcp_server.py — took upstream's clean form. Dropped the
  fork-only `palace_path=` kwarg from four ChromaCollection() call sites:
  the kwarg was load-bearing for MemPalace#1171's per-collection write lock, but
  MemPalace#1171 closed in favor of MemPalace#976's mine_global_lock + daemon-strict, so
  the kwarg has no remaining consumer. ChromaCollection.__init__ in
  upstream/develop is back to (self, collection); calling it with
  palace_path= raised TypeError → silently swallowed by the broad except
  in _get_collection() → returned None → tool_status() returned _no_palace().
  41 mcp_server tests went from failing-with-KeyError to passing.

* mempalace/cli.py — dropped fork-only `workers=args.workers` from the
  cmd_mine -> miner.mine() call. Pre-existing fork-side bug: the
  `--workers` argparse arg landed in 5cd14bd but miner.mine() never
  accepted a workers param, so production `mempalace mine` TypeError'd
  on every invocation. Removed the broken plumbing; tests/test_cli.py
  updated to match.

* CHANGELOG.md — took upstream verbatim. Fork-specific changelog lives
  in FORK_CHANGELOG.md (canonical: docs/fork-changes.yaml).

* CLAUDE.md — kept ours. Fork's CLAUDE.md is operational; upstream's
  added a "Design Principles / Contributing" charter, which lives in
  README.md on the fork.

* tests/test_backends.py — took upstream's ruff-formatted line widths.

docs/fork-changes.yaml flips the two MemPalace#1173 entries (hnsw-integrity-gate,
hnsw-cold-start-gate) and the MemPalace#1201 entry (palace-graph-none-guard) from
OPEN to MERGED 2026-04-26. MemPalace#1173 MemPalace#1177 MemPalace#1198 MemPalace#1201 added to the
merged_upstream archive at the bottom. FORK_CHANGELOG.md regenerated.

scripts/check-docs.sh: 4/4 clean.
Test suite: 1460/1460.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
igorls added a commit to sha2fiddy/mempalace that referenced this pull request Apr 27, 2026
The narrowed _fix_blob_seq_ids returned early when safe_rows was empty,
but MemPalace#1177's marker contract requires the marker to be written on every
successful pass — even no-op — so subsequent opens skip the sqlite3
connection entirely. Without this, palaces that have no genuine 0.6.x
BLOBs but DO have sysdb-10-prefixed rows would re-open sqlite3 on every
call, defeating the MemPalace#1090 corruption guard.

Restructured the conditional so the marker write is unconditional after
a successful sqlite scan, regardless of whether any rows were updated.

Surfaced by test_fix_blob_seq_ids_writes_marker_when_already_integer
during the develop-rebase of this PR. The author's branch predates the
marker contract from MemPalace#1177 (merged 2026-04-26), so this is a rebase-edge
fix-up rather than a logic change to their narrowing behaviour.
jphein added a commit to jphein/mempalace that referenced this pull request Apr 27, 2026
Upstream merged MemPalace#1173, MemPalace#1177, MemPalace#1198, MemPalace#1201 into develop on 2026-04-26
(picked up by the 2026-04-27 develop sync). Strike out their fork-ahead
rows in CLAUDE.md and add to the "Merged into upstream" section. Update
PR table accordingly: 18 merged (was 14), 7 open (was 8). Bump version
note: upstream shipped v3.3.3 on 2026-04-24 (carries our MemPalace#659/MemPalace#1021);
v3.3.4 prep branch in flight.

README:
- "tracks upstream/develop through the 2026-04-27 sync" (was 2026-04-26)
- 17 fork-ahead changes (was 22)
- 1,510 tests pass on main (was 1,395 — upstream brought new suites)
- "Open upstream PRs" 7 entries (was 11)
- Drop merged rows from "Fork-ahead — open or pending" table; keep
  PreCompact recovery-marker row (still fork-only)

scripts/check-docs.sh: clean (90 PR refs match upstream state, 13 fork
hashes resolve, FORK_CHANGELOG renders clean from YAML).

docs/fork-changes.yaml: no edit needed — already had merged_date on the
4 entries from the 2026-04-26 commit `5fd15db`.
lealvona pushed a commit to lealvona/mempalace that referenced this pull request Apr 29, 2026
Restore three `_pin_hnsw_threads` tests that the previous integrity-gate
commit deleted. The function is still live code on develop (defined at
chroma.py:207, called from chroma.py:705 + mcp_server.py), so the
deletion left the import unused (ruff F401) and dropped coverage on a
function unrelated to this PR's scope. Restored verbatim from main.

Plus three nits @igorls flagged:

- **Thread-safety doc**: `_quarantined_paths` mutation is lock-free;
  documented that idempotency of `quarantine_stale_hnsw` is the safety
  property (concurrent same-palace calls produce a benign redundant
  rename attempt that no-ops, no need for a lock).
- **Pickle protocol assumption**: `_segment_appears_healthy` requires
  PROTO ≥ 2 (`0x80`). Documented; matches what chromadb writes today,
  and a future protocol-0/1 emission would conservatively quarantine
  + lazy-rebuild rather than mis-classify as healthy.
- Class-level vs module-level scope: keeping class-level — the
  conftest reset is the controlled case, and module-level wouldn't
  remove the foot-gun, just relocate it. Conftest reset documented in
  the existing comment is the right pattern for test isolation.

Style nit (`Path(marker).touch()` vs `open(marker, "a").close()`)
deferred — that pattern lives in MemPalace#1177's territory, not MemPalace#1173's.

37/37 tests pass on the PR branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
lealvona pushed a commit to lealvona/mempalace that referenced this pull request Apr 29, 2026
…view

Three new tests cover the marker contract:

- test_fix_blob_seq_ids_writes_marker_after_blob_path — marker is
  written after a successful BLOB → INTEGER conversion.
- test_fix_blob_seq_ids_writes_marker_when_already_integer — marker
  is written even on a no-op (already-INTEGER) path. The point of
  the marker is to skip sqlite3.connect() on subsequent calls, not
  to record that a conversion happened.
- test_fix_blob_seq_ids_skips_sqlite_when_marker_present — verifies
  the load-bearing property via monkeypatched sqlite3.connect: when
  the marker exists, the function never opens sqlite. This is the
  bug MemPalace#1090 fix — opening Python's sqlite3 against a live ChromaDB
  1.5.x WAL DB corrupts the next PersistentClient call, and we
  must never do it again post-migration.

Also folded in @igorls's style nit:

- Path(marker).touch() instead of open(marker, "a").close().
  Same effect, reads cleaner. Moved Path import to the top of the
  module since it didn't yet exist there.

35/35 backend tests pass. The "Production: fresh MCP server + stop
hook + mempalace mine subprocess" checkbox in the PR body is checked
in the PR reply — the canonical 151K palace has been running this
fix since MemPalace#1177 was filed (via Syncthing-replicated source on the
disks daemon at v1.7.0); zero PersistentClient corruption since.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
lealvona pushed a commit to lealvona/mempalace that referenced this pull request Apr 29, 2026
The narrowed _fix_blob_seq_ids returned early when safe_rows was empty,
but MemPalace#1177's marker contract requires the marker to be written on every
successful pass — even no-op — so subsequent opens skip the sqlite3
connection entirely. Without this, palaces that have no genuine 0.6.x
BLOBs but DO have sysdb-10-prefixed rows would re-open sqlite3 on every
call, defeating the MemPalace#1090 corruption guard.

Restructured the conditional so the marker write is unconditional after
a successful sqlite scan, regardless of whether any rows were updated.

Surfaced by test_fix_blob_seq_ids_writes_marker_when_already_integer
during the develop-rebase of this PR. The author's branch predates the
marker contract from MemPalace#1177 (merged 2026-04-26), so this is a rebase-edge
fix-up rather than a logic change to their narrowing behaviour.
Scorpion1221 pushed a commit to Scorpion1221/mempalace that referenced this pull request May 2, 2026
…ia marker (upstream MemPalace#1177)

Adds _pin_hnsw_threads helper + _BLOB_FIX_MARKER guard to avoid WAL-state
segfault on already-migrated palaces.
Scorpion1221 pushed a commit to Scorpion1221/mempalace that referenced this pull request May 2, 2026
Sync with upstream/develop via cherry-pick of 9 critical bugfixes:
- HNSW index bloat prevention (MemPalace#1191)
- SIGSEGV guards on collection reopen (MemPalace#1262, MemPalace#1289)
- blob-seq marker fast-path (MemPalace#1177)
- palace_graph None-metadata guard (MemPalace#1201)
- palace_graph security tunnels (MemPalace#1168)
- hyphenated wing name normalization (MemPalace#1197)
- searcher _tokenize None guard (MemPalace#1198)
- mcp_server diary topic sanitize (MemPalace#936)

Tests: 1459 default + 113 benchmark + 6/7 stress all pass.
(random-kill stress test was failing on dev pre-merge; not regressed.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

_fix_blob_seq_ids() opens live chromadb.sqlite on every client cache miss; post-migration no-op still risks corruption

4 participants