Skip to content

fix: serialize ChromaCollection writes via flock at the backend seam#1171

Closed
jphein wants to merge 1 commit intoMemPalace:developfrom
jphein:fix/concurrent-write-lock-v2
Closed

fix: serialize ChromaCollection writes via flock at the backend seam#1171
jphein wants to merge 1 commit intoMemPalace:developfrom
jphein:fix/concurrent-write-lock-v2

Conversation

@jphein
Copy link
Copy Markdown
Collaborator

@jphein jphein commented Apr 24, 2026

Summary

Claude Code spawns one mcp_server.py process per open terminal. Stop hooks spawn additional short-lived writers (diary writes, mempalace mine subprocesses). All open independent PersistentClient instances against the same palace directory. ChromaDB has no inter-process write locking — concurrent col.add/upsert/update/delete from N processes corrupts the HNSW segment, causing the next read to SIGSEGV in chromadb_rust_bindings.abi3.so.

Fix: _palace_write_lock(palace_path) — a contextmanager using fcntl.flock(LOCK_EX) on $palace/.write.lock — wraps all four write methods of ChromaCollection:

  • add
  • upsert
  • update
  • delete

Why the backend seam, not mcp_server

The first version of this PR put the lock in mcp_server.py around its four write call sites. That missed the mempalace mine subprocess the stop hook fires — mining writes go through convo_miner.py / miner.py / palace.py and bypass mcp_server.py entirely. A concurrent mine + MCP write would still corrupt the HNSW segment even with the mcp_server-level lock.

RFC 001 (#995) made ChromaCollection the single boundary for all ChromaDB writes. Putting the lock there means every caller — mcp_server, miner, convo_miner, palace, tests, anything else — goes through it automatically. No caller needs to know the lock exists.

API change

ChromaCollection.__init__ now accepts a palace_path keyword argument so the lock knows where to put .write.lock. ChromaBackend.get_collection and the legacy create_collection shim pass it through; ChromaCollections constructed without a path (some tests) skip locking — same behavior as before this PR.

Why a stopgap rather than the full fix

Three longer-term solutions make this unnecessary at the architecture level:

  • palace-daemon (@rboarescu) — FastAPI gateway with asyncio read/write/mine semaphores between all clients and the palace. Solves same-machine and multi-machine coordination at the right layer. This flock fix is the same-machine case only.
  • Backend plugin discovery (RFC 001) — a daemon-aware backend could handle concurrency via the gateway without touching ChromaCollection at all.
  • TS rewrite — per maintainer discussion in Discord, an official TypeScript implementation with a new spec is coming, and the Python will be updated to match. This fix is intentionally minimal so it doesn't complicate that transition.

The intent is to stop active data corruption for users running multiple terminals today, not to pre-empt the architectural work above.

Platform notes

fcntl is Unix-only. On Windows the context manager degrades to a no-op (yield without locking) — same behavior as before this fix. palace-daemon is the recommended path for multi-client setups on any platform.

Test plan

  • 1094 tests pass locally against upstream/develop base
  • Manual: open 3+ Claude Code terminals against the same palace, fire stop hooks simultaneously (which triggers both MCP diary_write and a mempalace mine subprocess) — verify no SIGSEGV

Generated with Claude Code

Copilot AI review requested due to automatic review settings April 24, 2026 15:52
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds cross-process serialization for ChromaDB write operations in the MCP server to prevent HNSW index corruption when multiple mcp_server.py processes write concurrently to the same palace directory.

Changes:

  • Introduces _palace_write_lock() context manager using flock(LOCK_EX) on $palace/.write.lock.
  • Wraps ChromaDB write calls in tool_add_drawer, tool_delete_drawer, tool_update_drawer, and tool_diary_write with the new lock.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread mempalace/mcp_server.py Outdated
Comment on lines +164 to +190
@contextmanager
def _palace_write_lock():
"""Cross-process exclusive write lock for ChromaDB writes.

Claude Code spawns one mcp_server process per terminal, and stop hooks spawn
additional short-lived processes — all pointing at the same palace directory.
ChromaDB's PersistentClient has no inter-process locking, so concurrent writes
from N processes corrupt the HNSW segment, causing the next read to SIGSEGV
in chromadb_rust_bindings. Serializing all writes with flock(LOCK_EX) on a
lock file in the palace directory prevents the corruption.

flock is safe here: the lock is released automatically if the holding process
dies mid-write (no permanent deadlock on crash), and it has no overhead when
uncontested (single-process scenario).
"""
lock_path = os.path.join(_config.palace_path, ".write.lock")
try:
os.makedirs(_config.palace_path, exist_ok=True)
except OSError:
pass
with open(lock_path, "a") as _lf:
fcntl.flock(_lf.fileno(), fcntl.LOCK_EX)
try:
yield
finally:
fcntl.flock(_lf.fileno(), fcntl.LOCK_UN)

Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_palace_write_lock() currently always attempts fcntl.flock(). If flock isn’t available/supported (Windows, some filesystems), this will raise and break all write tools. To match the intended behavior, handle unsupported platforms/operations inside the context manager (e.g., detect os.name == "nt" or catch AttributeError/OSError from flock) and just yield without locking (or implement a Windows lock).

Copilot uses AI. Check for mistakes.
Comment thread mempalace/mcp_server.py Outdated
Comment on lines +164 to +190
@contextmanager
def _palace_write_lock():
"""Cross-process exclusive write lock for ChromaDB writes.

Claude Code spawns one mcp_server process per terminal, and stop hooks spawn
additional short-lived processes — all pointing at the same palace directory.
ChromaDB's PersistentClient has no inter-process locking, so concurrent writes
from N processes corrupt the HNSW segment, causing the next read to SIGSEGV
in chromadb_rust_bindings. Serializing all writes with flock(LOCK_EX) on a
lock file in the palace directory prevents the corruption.

flock is safe here: the lock is released automatically if the holding process
dies mid-write (no permanent deadlock on crash), and it has no overhead when
uncontested (single-process scenario).
"""
lock_path = os.path.join(_config.palace_path, ".write.lock")
try:
os.makedirs(_config.palace_path, exist_ok=True)
except OSError:
pass
with open(lock_path, "a") as _lf:
fcntl.flock(_lf.fileno(), fcntl.LOCK_EX)
try:
yield
finally:
fcntl.flock(_lf.fileno(), fcntl.LOCK_UN)

Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New behavior (_palace_write_lock() wrapping Chroma write paths) isn’t covered by the existing tests/test_mcp_server.py write-tool tests. Consider adding a unit test that monkeypatches mcp_server.fcntl.flock to assert it’s called for tool_add_drawer/tool_update_drawer/tool_delete_drawer/tool_diary_write, and (after adding a Windows/unsupported-platform fallback) a test that the context manager becomes a no-op when locking isn’t available.

Copilot uses AI. Check for mistakes.
Comment thread mempalace/mcp_server.py Outdated
Comment on lines +45 to +51
import argparse # noqa: E402 (deferred until after stdio protection above)
import fcntl # noqa: E402
import json # noqa: E402
import logging # noqa: E402
import hashlib # noqa: E402
import time # noqa: E402
from contextlib import contextmanager # noqa: E402
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import fcntl is unconditional, which will raise ModuleNotFoundError on Windows at import time. This contradicts the PR description (“Windows … degrades to a no-op”) and would prevent mcp_server.py from starting on Windows. Consider guarding the import (try/except) and making _palace_write_lock() a no-op (or using msvcrt.locking) when os.name == "nt".

Copilot uses AI. Check for mistakes.
@igorls igorls added bug Something isn't working storage area/mcp MCP server and tools labels Apr 24, 2026
@jphein jphein force-pushed the fix/concurrent-write-lock-v2 branch 2 times, most recently from 4a3ca8d to daedfb1 Compare April 24, 2026 20:41
@jphein jphein changed the title fix: cross-process write lock prevents HNSW corruption from concurrent MCP servers fix: serialize ChromaCollection writes via flock at the backend seam Apr 24, 2026
Claude Code spawns one mcp_server.py process per open terminal; stop hooks
spawn additional short-lived writers (diary writes, `mempalace mine`
subprocesses). All open independent PersistentClient instances against the
same palace directory. ChromaDB has no inter-process write locking —
concurrent col.add/upsert/update/delete from N processes corrupts the HNSW
segment, causing the next read to SIGSEGV in chromadb_rust_bindings.

Fix: `_palace_write_lock(palace_path)` — a contextmanager using
fcntl.flock(LOCK_EX) on `$palace/.write.lock` — wraps all four write
methods of ChromaCollection (add, upsert, update, delete). Because every
caller (mcp_server, miner, convo_miner, palace) reaches ChromaDB through
this adapter, all writes serialize automatically across processes.

RFC 001 made ChromaCollection the single boundary for all ChromaDB writes,
which is the correct home for concurrency control. No caller needs to
know the lock exists.

ChromaCollection now accepts palace_path in its constructor; passed by
ChromaBackend.get_collection and the legacy create_collection shim.
Adapters constructed without a path (some tests) skip locking, same
behavior as before.

flock auto-releases on process death — a mid-write crash cannot deadlock
future writers.

On Windows, fcntl is unavailable — yields without locking. Windows users
running multiple MCP server processes against the same palace remain
exposed to the underlying ChromaDB concurrency issue; palace-daemon,
which provides proper asyncio read/write/mine semaphores, is the
recommended path for multi-client setups on any platform.

Test plan: 1094 tests pass locally.
@jphein jphein force-pushed the fix/concurrent-write-lock-v2 branch from daedfb1 to 5a82018 Compare April 24, 2026 21:24
jphein added a commit to jphein/mempalace that referenced this pull request Apr 24, 2026
… state

Three stale sections updated:

- Fork change queue: row 8 (.blob_seq_ids_migrated marker) struck
  through → FILED as MemPalace#1177. Two new rows added for segfault fixes
  discovered today (MemPalace#1171 concurrent-write lock, MemPalace#1173 quarantine in
  make_client) that weren't in the queue because the bugs surfaced
  today, not during the original 2026-04-21 triage.

- Open upstream PRs: was showing 3 of 10 PRs. Now shows all 10 with
  current CI/review state. All rebased onto current upstream/develop
  and MERGEABLE as of today.

- Merged since v3.3.1: added v3.3.3 release (2026-04-24) with its
  constituent merges — MemPalace#942, MemPalace#833, MemPalace#1097, MemPalace#1145, MemPalace#1147, MemPalace#1148/1150/1157
  entity-detection overhaul (via @igorls's MemPalace#1175 stacked-PR rescue),
  MemPalace#1166 palace-path security, MemPalace#340/MemPalace#1093 install regression, plus MemPalace#851
  from the 2026-04-22 batch.
jphein added a commit to jphein/mempalace that referenced this pull request Apr 24, 2026
Cherry-picks the critical chroma.py fix from @felipetruman's MemPalace#976
(without the bundled mine_global_lock and precompact-attempt-cap
changes, which we don't need — mine_global_lock is covered by our
MemPalace#1171 backend-seam flock, and our silent_save path bypasses the
block-mode precompact deadlock MemPalace#976's fix #3 addresses).

Root cause: ChromaDB's multi-threaded `ParallelFor` HNSW insert path
races in `repairConnectionsForUpdate` / `addPoint`, corrupting the
graph under any concurrent add (including a single mine's concurrent
batches). Manifests as SIGSEGV (MemPalace#974) or runaway link_lists.bin writes
(MemPalace#965 — 437 GB observed on JP's palace, 1.5 TB on the Nobara install
in MemPalace#976's report).

The num_threads=1 pin disables ParallelFor, serializing inserts within
a single process. MemPalace ingests drawers one at a time anyway, so
this was never a throughput win — only a latent correctness hazard.

Three call sites patched:
- `ChromaBackend.get_collection` (create path): add metadata pin + call
  `_pin_hnsw_threads()` on every return
- `ChromaBackend.create_collection`: add metadata pin
- `mcp_server._get_collection`: add metadata pin + call
  `_pin_hnsw_threads()` on every return (direct chromadb call, bypasses
  the backend adapter)
- `cli.cmd_compact` new_col creation: add metadata pin

`_pin_hnsw_threads()` exists because ChromaDB 1.5.x doesn't persist the
modified HNSW config to disk, so it must re-apply on every open.

Credits @felipetruman for the root-cause diagnosis and fix design in
MemPalace#976. Posted corroborating reproduction data (437 GB bloat on 135K
drawers, identical failure mode) on that PR earlier today at
MemPalace#976 (comment).
This fork-local cherry-pick gets JP protected now; expect this to
become a no-op when MemPalace#976 lands upstream and we merge develop→main.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
jphein added a commit to jphein/mempalace that referenced this pull request Apr 24, 2026
…ostgres

Three things merged into one README pass:

1. Badge: link version-3.3.4 to jphein/mempalace/releases (the v3.3.4
   tag we just pushed) and add an upstream-3.3.3 secondary badge so
   readers can tell fork vs upstream version at a glance. Was sitting
   uncommitted from earlier today.

2. Multi-client coordination section: replaced the three-fix v3.3.4
   summary with a four-fix one. Added @felipetruman's MemPalace#976 num_threads
   pin (cherry-picked at 552a0d7) as fix #1 — the actual root-cause
   fix. Reframed our MemPalace#1171/MemPalace#1173/MemPalace#1177 as defense-in-depth around
   symptoms. Walked back palace-daemon from "primary concurrency story
   in progress" to "deferred pending observation" — with MemPalace#976's fix in
   place, the daemon's same-machine value drops; multi-machine and
   Windows remain its differentiators but neither is current pain.

3. Postgres + pgvector: walked back from "parallel track" framing to
   "long-term option, no immediate move" for the same reason. Migration
   cost stays real, current pain is mitigated, decision deferred until
   v3.3.4 stack is observed in production or TS rewrite ships.

Removed two stale paragraphs that were left over from the previous
"daemon as primary" framing.
@jphein
Copy link
Copy Markdown
Collaborator Author

jphein commented Apr 25, 2026

Closing in favor of #976 (Felipe Truman, merged 2026-04-25) plus daemon-driven serialization at a higher layer.

After living with both approaches in production (134K-drawer palace, multiple Claude Code instances + MCP servers + hooks), we found:

  • mine_global_lock from fix: HNSW graph corruption, PreCompact deadlock, mine fan-out (closes #974, #965, #955) #976 correctly covers the fan-out pattern (multiple mempalace mine subprocesses) which is the dominant write-conflict source for typical installs. That's the right layer — the lock is at the writer that actually has fan-out, not below at every individual add/upsert/update/delete.
  • For multi-process MCP + hooks we ended up routing all writes through palace-daemon with FastAPI semaphores; once the daemon is the single writer, an in-process flock at the ChromaCollection adapter is just defense-in-depth that doesn't pay for its complexity.
  • fcntl.flock at the adapter seam as proposed in this PR works correctly but is structurally redundant once either of those patterns is in place. The Unix-only constraint and the extra surface area for tests-without-palace_path made the maintenance cost > value.

Thanks to @felipetruman for the cleaner solution at the right layer, and to everyone who reviewed this PR — feedback shaped the daemon-strict architecture we ended up adopting.

@jphein jphein closed this Apr 25, 2026
jphein added a commit to jphein/mempalace that referenced this pull request Apr 25, 2026
…HNSW fixes

Bring in 29 commits from upstream/develop since the last merge (2026-04-23):

Major absorbed changes:
- MemPalace#976 (Felipe Truman): HNSW graph corruption fix, mine_global_lock for
  fan-out, MAX_PRECOMPACT_BLOCK_ATTEMPTS=2 for /compact deadlocks. Closes
  MemPalace#974/MemPalace#965/MemPalace#955; likely resolves MemPalace#1172 too.
- MemPalace#1179 (Igor): CLI mempalace search routes through _hybrid_rank,
  legacy-metric warning + _warn_if_legacy_metric, invariant tests on
  hnsw:space=cosine across all 5 collection-creation paths.
- MemPalace#1180/MemPalace#1183/MemPalace#1184: cross-wing topic tunnels, init mine UX, --auto-mine.
- MemPalace#1185 (perf/batched-upsert-gpu): batched ChromaDB upserts, GPU device
  detection via mempalace/embedding.py.
- MemPalace#1182: graceful Ctrl-C during mempalace mine.
- MemPalace#1168: tunnel permissions security fix.

Conflict resolutions (8 files):
- searcher.py: kept fork's CLI delegation through search_memories (cleaner
  single-source-of-truth path); upstream's inline-retrieval branch dropped.
  Merged upstream's print formatting (cosine= + bm25=) with fork's
  matched_via reporting from sqlite BM25 fallback.
- backends/chroma.py: kept fork's _BLOB_FIX_MARKER + palace_path arg to
  ChromaCollection (MemPalace#1171 write lock); merged upstream's **ef_kwargs
  (embedding_function support from MemPalace#1185). Removed duplicate
  _pin_hnsw_threads (fork already cherry-picked Felipe's earlier).
- mcp_server.py: kept fork's palace_path arg + upstream's clearer comment
  on hnsw:num_threads=1 rationale.
- miner.py: took upstream's serial mine() flow (mine_global_lock + Ctrl-C),
  RESTORED fork's strict detect_room — substring matching from upstream
  breaks fork-only test_detect_room_priority1_no_substring_match. Added
  `import re` for word-boundary regex matching. Fork-ahead concurrent
  mining (workers=, ThreadPoolExecutor from 5cd14bd) is dropped — daemon
  migration deprioritizes local concurrent mining; can re-port if needed.
- CHANGELOG.md: combined fork's segfault-trio narrative with upstream's
  v3.3.4 release notes.
- HOOKS_TUTORIAL.md: took MEMPALACE_PYTHON env var name (fork README was
  stale; hooks already use this name per fork-ahead MemPalace#19).
- test_convo_miner_unit.py: took both contextlib + pytest imports.
- test_searcher.py: imported upstream's 3 new TestSearchCLI tests but
  skipped them with TODOs — they assume upstream's inline-retrieval CLI
  with simpler mocks. Rewrite needed for fork's delegated search_memories
  path (sqlite BM25 fallback + scope counting).

Test result: 1334 passed, 3 skipped (the upstream tests above), 3 warnings.

Implications for open fork PRs:
- MemPalace#1171 (cross-process write lock at adapter) becomes structurally
  redundant given MemPalace#976's mine_global_lock + daemon-strict serialization.
  Slated for close with thank-you to Felipe.
- MemPalace#1173/MemPalace#1177 still relevant (quarantine threshold + blob_seq marker).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
jphein added a commit to jphein/mempalace that referenced this pull request Apr 25, 2026
After today's work:

- Merged upstream/develop (29 commits), absorbing MemPalace#976 (HNSW + mine_global_lock + PreCompact attempt-cap), MemPalace#1179 (CLI BM25), MemPalace#1180/MemPalace#1182/MemPalace#1183/MemPalace#1185 features.
- Filed PR MemPalace#1198 (`_tokenize` None guard) upstream.
- Closed PR MemPalace#1171 (adapter-level flock) — superseded by MemPalace#976 + daemon-strict.
- palace-daemon migration commits (c09582c + 0e97b19) make the daemon the
  single canonical writer; in-tree adapter flock no longer carries its weight.

README:
- Lead paragraph: 2026-04-21 → 2026-04-25 sync date, 165,632 → 151,420
  drawer count, daemon-on-disks topology, listed absorbed upstream PRs.
- Fork change queue: dropped MemPalace#1171 row, added MemPalace#1198 row.
- "Multi-client coordination" section: rewrote — palace-daemon is now the
  fork's primary answer (was "deferred"), MemPalace#1171 retired, narrative shifted
  to defense-in-depth around MemPalace#976 + daemon.
- Two-layer memory model: storage path is now palace-daemon HTTP, not
  ~/.mempalace/palace.
- Ecosystem palace-daemon row: marks integration as complete (commits
  cited).
- Open upstream PRs table: refreshed to MemPalace org URLs, added MemPalace#1198
  row, removed MemPalace#1171, added 2026-04-25 develop sync to merged list, added
  MemPalace#1171 to closed list with rationale.
- Test count 1096 → 1334.

CLAUDE.md:
- Row 20 (MemPalace#1198): "Upstream PR pending" → "Filed as MemPalace#1198 on 2026-04-24".
- Row MemPalace#1171 in PR table: open → closed (with MemPalace#976 supersession).
- Added MemPalace#1198 row to PR table.
- Top-of-section count: "14 merged, 7 open, 9 closed" → "14 merged, 7 open
  (including MemPalace#1198), 10 closed (added MemPalace#1171)" + sync date 2026-04-25.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@jphein jphein deleted the fix/concurrent-write-lock-v2 branch April 25, 2026 14:53
jphein added a commit to jphein/mempalace that referenced this pull request Apr 25, 2026
Stop-hook auto-save diary entries (topic=checkpoint, text starting
"CHECKPOINT:") were dominating mempalace_search results — they're
short, word-dense, and outrank substantive content under cosine
similarity. The actual decisions/conversations they summarize get
buried below the summary entries.

Validated against JP's 151K-drawer palace via RLM smoke test on
2026-04-25: default search returned 5 CHECKPOINT entries when asked
about "palace-daemon migration architecture," producing a confident
but misleading RLM answer that quoted user-prompt fragments instead
of the actual decision (MemPalace#1171 closed, MemPalace#976 absorbed, daemon-strict
mode added). Filtering checkpoints at retrieval flips this — now
the same query surfaces technical-room drawers with file paths and
code snippets.

Changes:

mempalace/searcher.py
- New `kind` parameter on search_memories / search:
  - "content" (default): excludes topic IN ("checkpoint", "auto-save")
  - "checkpoint": only checkpoints (recovery/audit)
  - "all": disables filter (pre-2026-04-25 behavior)
- build_where_filter accepts the kind argument and composes it with
  wing/room via $and. Uses ChromaDB's $nin/$in operators (RFC 001
  validated supported).
- Defense-in-depth post-filter on text prefix: catches drawers
  written without topic metadata (legacy data, partial writes,
  metadata=None upstream-MemPalace#1020 territory) so the exclusion is
  bulletproof even when the where clause can't see the field.
- _CHECKPOINT_TOPICS constant: ("checkpoint", "auto-save"). The
  second name is a legacy synonym from older palace-daemon hook.py
  callers; both are filtered until canonical hook clients have
  fully rolled out.
- Result dicts now surface `topic` so consumers (familiar.realm.watch's
  deterministic pipeline, RLM tools, debugging humans) can route
  on it.

mempalace/mcp_server.py
- mempalace_search MCP tool accepts `kind` param with input-schema
  enum ["content", "checkpoint", "all"].

mempalace/layers.py
- Layer2/Layer3 callers pass kind="all" explicitly to preserve their
  pre-existing contract (their tests verify exact where-clause
  shape; revisit whether wake-up paths should default-exclude
  checkpoints in a separate PR).

tests
- TestCheckpointFilter (8 cases): unit tests for build_where_filter
  across all kind values, integration tests for search_memories
  text-prefix defense-in-depth, topic field surfacing, ValueError
  on invalid kind.
- 1345 passed, 0 failed, 0 skipped.

Companion change in palace-daemon repo: clients/hook.py wrote
topic="auto-save" (one-off legacy); standardized on
topic=CHECKPOINT_TOPIC ("checkpoint") so all hook paths agree.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
jphein added a commit to jphein/mempalace that referenced this pull request Apr 26, 2026
Restructure from feature-list to thesis-shape per the 2026-04-26 evening
conversation. Skeleton at scratch/readme-overhaul-skeleton.md; this is
the prose pass.

Major changes:

- New lede surfaces the empirical result (632/3 → 974/1267 token
  convergence) in the first 200 words instead of burying it in a
  table cell.
- New "The thesis" section names three principles the fork has
  converged on: corpus-shape-eats-algorithm, verbatim-vs-derivative,
  right-to-measure. Each links out to the long-form essay rather than
  re-deriving.
- New "What this fork ships" section organizes work by axis (structural
  retrieval / single-writer / deterministic hooks) instead of bag of
  features.
- Quickstart added near top — was buried at line 426.
- Architectural principles kept (light edits) and now cross-reference
  the new thesis.
- Roadmap reorganized around verbatim-vs-derivative axis. Adds P8
  ("Corpus partitioning by purpose") as first-class item; the
  checkpoint split is its first instance, future transcript-mine
  separation and KG-triple-store moves extend it.
- Active investigations rolled forward: engram-2 critique partially
  closed (one concrete instance fixed empirically); LongMemEval-S E2E
  deferred until chunk-size + embedding-model alignment lands.
- New "Composition with upstream" section makes explicit that the fork
  cherry-picks in-flight upstream PRs, contributes back, and closes
  fork-only attempts when upstream's version is at the right layer
  (MemPalace#1171MemPalace#976).
- Fork-change inventory moves from headline to bottom — same content,
  reframed as reference for verifying claims rather than the lede.
- Open upstream PRs table updated for 4 bensig approvals 2026-04-26.
- Test count, palace size, deploy commit, post-migration numbers all
  refreshed against live state.

Net: 480 → 383 lines (~20% shorter), all four check-docs.sh stages
pass clean (15 fork hashes resolve, 84 PR refs match upstream state).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
jphein added a commit to jphein/mempalace that referenced this pull request Apr 26, 2026
Brings in upstream's corpus-origin + privacy-warning track (PRs MemPalace#1211
MemPalace#1221 MemPalace#1223 MemPalace#1224 MemPalace#1225) plus the canonical merged versions of our four
PRs that landed today (21:22-21:41 UTC):

  MemPalace#1173  quarantine_stale_hnsw on make_client + cold-start gate +
         integrity sniff-test (PROTO/STOP byte check, no deserialization)
  MemPalace#1177  .blob_seq_ids_migrated marker guard, closes MemPalace#1090
  MemPalace#1198  _tokenize None-document guard in BM25 reranker
  MemPalace#1201  palace_graph.build_graph skips None metadata

Conflict resolution:

* mempalace/backends/chroma.py — took upstream as base (it has the
  igorls-review pickle-protocol docstring, thread-safety paragraph, and
  Path(marker).touch() style nit), then re-applied MemPalace#1094's _coerce_none_metas
  in query()/get() since MemPalace#1094 is still open and not yet in develop.

* mempalace/mcp_server.py — took upstream's clean form. Dropped the
  fork-only `palace_path=` kwarg from four ChromaCollection() call sites:
  the kwarg was load-bearing for MemPalace#1171's per-collection write lock, but
  MemPalace#1171 closed in favor of MemPalace#976's mine_global_lock + daemon-strict, so
  the kwarg has no remaining consumer. ChromaCollection.__init__ in
  upstream/develop is back to (self, collection); calling it with
  palace_path= raised TypeError → silently swallowed by the broad except
  in _get_collection() → returned None → tool_status() returned _no_palace().
  41 mcp_server tests went from failing-with-KeyError to passing.

* mempalace/cli.py — dropped fork-only `workers=args.workers` from the
  cmd_mine -> miner.mine() call. Pre-existing fork-side bug: the
  `--workers` argparse arg landed in 5cd14bd but miner.mine() never
  accepted a workers param, so production `mempalace mine` TypeError'd
  on every invocation. Removed the broken plumbing; tests/test_cli.py
  updated to match.

* CHANGELOG.md — took upstream verbatim. Fork-specific changelog lives
  in FORK_CHANGELOG.md (canonical: docs/fork-changes.yaml).

* CLAUDE.md — kept ours. Fork's CLAUDE.md is operational; upstream's
  added a "Design Principles / Contributing" charter, which lives in
  README.md on the fork.

* tests/test_backends.py — took upstream's ruff-formatted line widths.

docs/fork-changes.yaml flips the two MemPalace#1173 entries (hnsw-integrity-gate,
hnsw-cold-start-gate) and the MemPalace#1201 entry (palace-graph-none-guard) from
OPEN to MERGED 2026-04-26. MemPalace#1173 MemPalace#1177 MemPalace#1198 MemPalace#1201 added to the
merged_upstream archive at the bottom. FORK_CHANGELOG.md regenerated.

scripts/check-docs.sh: 4/4 clean.
Test suite: 1460/1460.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/mcp MCP server and tools bug Something isn't working storage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants