Conversation
#1262) #1262 split `get_or_create_collection` into `get_collection` + fallback `create_collection` inside `ChromaBackend.get_collection`, fixing the chromadb 1.5.x Rust-binding SIGSEGV that fires when stored collection metadata differs from the call-site's `_HNSW_BLOAT_GUARD` payload. The MCP server's `_get_collection(create=True)` carries the same metadata payload at `mcp_server.py:287` and routes through chromadb's Python client directly, bypassing the backend layer. Both `tool_add_drawer` and `tool_diary_write` reach this site on every invocation, and the Stop hook fires `mempalace_diary_write` at session end — which was exactly the crash path #1089 named. Apply the same try/except split here so legacy palaces whose stored metadata predates the bloat-guard expansion no longer crash on the MCP-server reopen path. Regression test patches `get_or_create_collection` at the chromadb client class level (not the instance — chromadb's mtime-change detection rebuilds the client between calls, so an instance-level spy doesn't survive) and asserts the second `_get_collection(create=True)` call never reaches it.
Contributor
There was a problem hiding this comment.
Pull request overview
Follow-up to #1262 that applies the same ChromaDB 1.5.x crash-avoidance pattern in the MCP server: avoid get_or_create_collection(..., metadata=...) on reopen by trying get_collection() first and only creating with metadata when missing.
Changes:
- Update
mcp_server._get_collection(create=True)to useget_collectionwith acreate_collectionfallback on_ChromaNotFoundError. - Add a regression test ensuring reopen does not call
get_or_create_collection(the metadata-mismatch SIGSEGV path).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| mempalace/mcp_server.py | Splits get_or_create_collection into get_collection + create_collection fallback to avoid ChromaDB 1.5.x metadata-mismatch segfaults on reopen. |
| tests/test_mcp_server.py | Adds regression coverage to ensure _get_collection(create=True) avoids get_or_create_collection on reopen. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
igorls
added a commit
that referenced
this pull request
May 1, 2026
The same try/except split that #1262 applied at the backend layer (ChromaBackend.get_collection) was needed at the parallel call site in mcp_server._get_collection(create=True), which carries the same metadata payload directly to chromadb's Python client. Both reopen paths in mempalace now bypass get_or_create_collection on existing collections, closing the SIGSEGV class for both tool_add_drawer and tool_diary_write (the Stop hook's path).
xcarbo
added a commit
to xcarbo/mempalace
that referenced
this pull request
May 1, 2026
Catches up xdev-patches with 112 commits from MemPalace/develop, including: - v3.3.4 release - MemPalace#1262/MemPalace#1289 ChromaDB collection-reopen crash fix (relevant to long-running MCP server & mempalace-api) - MemPalace#1287 HNSW divergence floor fix - MemPalace#1288 BLOB seq_id decode in repair - MemPalace#1180 cross-wing tunnels by shared topics - MemPalace#1194 wing-slug normalization for hyphenated dirs Conflict resolution: hooks_cli.py and mcp_server.py both had local patches (6ef44cb route CC transcripts via convo_miner; 3fad61d allow leading dash) that overlap with upstream fixes (MemPalace#1231, MemPalace#1194). Took upstream entirely on those two files — upstream's version handles separate transcript/project ingest, uses _mempalace_python(), and adds _pin_hnsw_threads. The local config.py regex relaxation auto-merged cleanly and is preserved. Safety tag: pre-upstream-merge-20260501-091227 (rollback target). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
5 tasks
mjc
pushed a commit
to mjc/mempalace
that referenced
this pull request
May 2, 2026
…lace#1299) `mcp_server._get_collection` bypassed `ChromaBackend.get_collection` and called `client.get_collection` / `client.create_collection` without `embedding_function=`. ChromaDB 1.x does not persist the EF identity with the collection, so the MCP server's reopen silently bound chromadb's built-in `DefaultEmbeddingFunction` while the miner / Stop hook ingest path bound `mempalace.embedding.get_embedding_function()`. On bleeding-edge interpreters (python 3.14 + chromadb 1.5.x on Apple Silicon, per MemPalace#1299) the default EF's lazy ONNX provider selection could SIGSEGV the host process on first `col.add()`, killing the MCP stdio server and leaving every subsequent tool call returning `Connection closed` until Claude Code was relaunched. Reads worked because `col.get(ids=...)` and metadata fetches don't invoke the EF; the auto-ingest path worked because mining routes through the backend abstraction. Diary writes were the consistent failure surface. Resolve the EF up front (matching `ChromaBackend._resolve_embedding_function`) and pass it into both reopen branches. Falls back to the chromadb default only if `mempalace.embedding.get_embedding_function` itself raises. Regression test patches the chromadb client class to capture `embedding_function=` on every `get_collection` / `create_collection` call from `_get_collection(create=True)` and `_get_collection()`, and fails if any call omits it. Follow-up to MemPalace#1262 / MemPalace#1289 (which fixed the metadata-mismatch SIGSEGV path); this addresses the EF-mismatch SIGSEGV path on the same surface.
Scorpion1221
pushed a commit
to Scorpion1221/mempalace
that referenced
this pull request
May 2, 2026
…SEGV (upstream MemPalace#1289) Mirrors the backend-layer fix from MemPalace#1262 at the MCP server's _get_collection cache rebuild path. Also imports _HNSW_BLOAT_GUARD for correct metadata on fresh collection creation.
Scorpion1221
pushed a commit
to Scorpion1221/mempalace
that referenced
this pull request
May 2, 2026
Sync with upstream/develop via cherry-pick of 9 critical bugfixes: - HNSW index bloat prevention (MemPalace#1191) - SIGSEGV guards on collection reopen (MemPalace#1262, MemPalace#1289) - blob-seq marker fast-path (MemPalace#1177) - palace_graph None-metadata guard (MemPalace#1201) - palace_graph security tunnels (MemPalace#1168) - hyphenated wing name normalization (MemPalace#1197) - searcher _tokenize None guard (MemPalace#1198) - mcp_server diary topic sanitize (MemPalace#936) Tests: 1459 default + 113 benchmark + 6/7 stress all pass. (random-kill stress test was failing on dev pre-merge; not regressed.)
This was referenced May 2, 2026
jphein
added a commit
to jphein/mempalace
that referenced
this pull request
May 3, 2026
…Palace#1289/MemPalace#1303 After the 2026-05-03 develop sync to commit 1888b67: - Row 15 (_get_client get-then-create guard) clears: MemPalace#1262 fixed ChromaBackend; MemPalace#1289 fixed mcp_server's parallel call site; MemPalace#1303 plumbed embedding_function= through the reopen path. - Trailing tracking paragraph updated (sync target, count of merged/ open/closed PRs, list of features brought in). - Upstream-PRs table: MemPalace#1262 → merged, MemPalace#1286 flagged CONFLICTING with rebase queued, MemPalace#1289 + MemPalace#1303 added as merged. Note: fork-only _get_session_recovery_collection still uses the older get_or_create_collection pattern; theoretical SIGSEGV exposure on legacy recovery collections only. Tracked as follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
jphein
added a commit
to jphein/mempalace
that referenced
this pull request
May 3, 2026
…attern to recovery collection `_get_session_recovery_collection` (fork-only, introduced for the checkpoint collection split in row 23) used the older `get_or_create_collection` pattern that MemPalace#1262/MemPalace#1289/MemPalace#1303 just fixed in `_get_collection`. Same theoretical SIGSEGV class on chromadb 1.5.x when stored metadata diverges from the call site, plus the `embedding_function=` omission MemPalace#1303 caught. Both branches now resolve EF via `ChromaBackend._resolve_embedding_function()` and the create-path uses try `get_collection` / except `NotFoundError` → `create_collection`. Recovery collection's metadata is intentionally lighter than the main collection's (no `_HNSW_BLOAT_GUARD`); preserved that asymmetry — recovery is small enough that segment bloat isn't a concern. Existing `test_session_recovery.py` covers happy paths; 83/83 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
jphein
added a commit
to jphein/mempalace
that referenced
this pull request
May 3, 2026
A small improvement to `_get_collection`: log the exception on failure (instead of returning `None` silently) and retry once after clearing the cached client and collection. Behaviour on the happy path is unchanged. ## Why The current shape (`try: ... except Exception: return None`) has two costs in practice on a long-running deployment: 1. Invisible incidents. When a transient ChromaDB error poisons the cached `_client_cache` / `_collection_cache`, every subsequent call returns `None` but the operator never sees the underlying exception in the log. Debugging has to start from the symptoms — 404s, "no palace" responses — without the actual stack. 2. No self-healing. Once the cache is stale, the only way back to a working state is a process restart. A single retry that clears the caches first resolves the most common shape (a transient error left bad state behind) without operator intervention. ## What changes - One `logger.exception(...)` per failed attempt, including the palace path so multi-palace deployments can disambiguate. - One retry after the first failure, with `_client_cache`, `_collection_cache`, `_metadata_cache`, `_metadata_cache_time` reset before the second attempt so the retry rebuilds from scratch rather than reusing a poisoned handle. - Happy path (first attempt succeeds) is unchanged: same return, same caching, no extra log line. ## Why this is still useful after MemPalace#1289 / MemPalace#1303 MemPalace#1289 split `get_or_create_collection` into `get_collection` + `create_collection` (the metadata-mismatch SIGSEGV path); MemPalace#1303 plumbed `embedding_function=` through both reopen branches. Those fix specific *crash* shapes. This PR addresses the *aftermath* — when an error of any class poisons the client/collection cache and every subsequent call returns `None` invisibly. Retry-once self-heals; logging makes the underlying exception visible. The retry loop sits *above* the get-then-create logic, so retries benefit from MemPalace#1289's correct call shape and MemPalace#1303's EF binding. ## Rebased on top of MemPalace#1289 + MemPalace#1303 This PR was filed before MemPalace#1289/MemPalace#1303 merged. The original two commits (2038269 + 757eb7a) wrapped the older `get_or_create` body. Rebased onto the new `try get_collection / except NotFoundError → create_collection` body — the retry/log envelope is unchanged, only the inner body picks up the merged fixes. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Companion to #1262.
#1262 split
get_or_create_collectionintoget_collection+ fallbackcreate_collectioninsideChromaBackend.get_collection, fixing the chromadb 1.5.x Rust-binding SIGSEGV that fires when stored collection metadata differs from the call-site's_HNSW_BLOAT_GUARDpayload.The MCP server has the same payload at a parallel call site that #1262 did not touch:
mempalace/mcp_server.py:287
_get_collection(create=True)is reached bytool_add_drawer(line 794) andtool_diary_write(line 1118). The Stop hook firesmempalace_diary_writeat session end — exactly the crash path #1089 was filed for. Without this follow-up, legacy palaces whose stored metadata predates the bloat-guard expansion still SIGSEGV the MCP server on reopen.Fix
Apply the same try/except split here. New behaviour:
_pin_hnsw_threadsstill runs after both branches, preserving the legacy-palace retrofit comment that already lived above this block.Test plan
test_get_collection_create_true_avoids_get_or_create_on_reopenpatchesget_or_create_collectionat the chromadb client class level (not the instance — chromadb's mtime-change detection rebuilds the client between calls, so an instance-level spy doesn't survive) and asserts the second_get_collection(create=True)call never reaches it.assert col2 is not None— None returned because the spy raises and the outer try/except swallows it).tests/test_mcp_server.pypass with the fix.ruff check+ruff format --checkclean (CI-pinned 0.4.x).