You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
mempalace.mcp_server has no single-instance guard — concurrent client spawns race on the palace and trigger HNSW corruption (defense-in-depth for #976) #1229
Claude Desktop's MCP loader can spawn a second python -m mempalace.mcp_server process against the same palace before the previous process has finished its shutdown handshake. Two writer processes then operate concurrently on ~/.mempalace/palace/<collection-uuid>/, which on mempalace 3.3.2 (i.e. without PR #976's hnsw:num_threads=1 pin) reliably triggers the HNSW link_lists.bin corruption pattern from #974.
PR #976 is the correct ChromaDB-side fix and resolves the corruption itself. This issue is the complementary MemPalace-side defense: the server has no single-instance guard, so any client whose lifecycle misbehaves (Desktop today, potentially other harnesses tomorrow) can put two writers on the same palace. With #976 in place corruption is prevented, but two processes still race on the SQLite metadata layer, double-load the embedding model, and waste RAM/CPU on every overlap. A startup-time PID file + advisory lock would make MemPalace robust against any misbehaving client without requiring fixes upstream of MemPalace.
This issue: addresses the MemPalace-side trigger condition — there is nothing in mempalace.mcp_server preventing two server processes from attaching to the same palace at the same time. With fix: HNSW graph corruption, PreCompact deadlock, mine fan-out (closes #974, #965, #955) #976, those two processes won't corrupt the index, but they will still race on SQLite, double-load the embedding model, and waste a couple of GB of RAM each time the client misbehaves. Defense in depth, not a duplicate.
Cross-ref: this is the same family of "concurrent writer" bug as #1202 (Stop hook firing a second mine while one is already running). #1202 added a hook-side lock; this issue proposes the equivalent guard inside mcp_server itself, so any client — Desktop, Code, a third-party MCP host, a future hook variant — gets the same protection for free.
Steps to reproduce
Install mempalace 3.3.2 (the version captured in evidence; behavior is structurally present on all current versions since there's no PID guard in mcp_server).
Configure the MemPalace MCP server in ~/Library/Application Support/Claude/claude_desktop_config.json per the standard install instructions.
Use Claude Desktop normally. The exact trigger from the client side is not yet pinned down — see "What I haven't been able to isolate" below — but in the captured 30-hour window the second-process spawn happens in two distinct patterns:
Pattern A (no shutdown logged): a second Initializing server... line appears ~1h17m after the first, with no Shutting down server... between them.
Pattern B (rapid re-spawn after intentional shutdown): an intentional shutdown is logged and then a new Initializing server... appears ~1m later, followed by another spawn ~18m after that, all without the original process being cleanly torn down.
Important context for reading the log: the closed unexpectedly lines at 04-25 04:07:23 and 04-26 08:05:41 are not server-side crashes — they correspond to manual pkill mempalace invocations during recovery from the runaway link_lists.bin growth. The bug being reported here is the concurrent spawn, not the unexpected close.
Why this is a MemPalace-side problem (and not just a Claude Desktop problem)
I want to be careful with the scope of the claim, since I haven't independently audited Claude Desktop's MCP loader.
What I observed in this environment: the Desktop transport log shows multiple Initializing server... events without intervening Shutting down server... events for the same server tag, which is sufficient evidence that two server processes were live concurrently against the same palace.
What is publicly reported about Claude Desktop's MCP lifecycle: there is an open upstream report (anthropics/claude-code#53134) describing two internal managers (directMcpHost and LocalMcpServerManager) spawning every configured MCP server twice on Windows MSIX builds without coordinating. The Cursor community (forum thread) has reported a related "spawns happen faster than a PID lock can be established" race in their MCP loader. Independent fixes for this class of issue exist (Cresnova/claude-desktop-mcp-fix). My environment is macOS Tahoe, not Windows MSIX, so I'm not asserting it's the same exact bug — only that the pattern of MCP host loaders double-spawning servers is a known, documented class of behavior in the broader ecosystem.
Why this still belongs in MemPalace: even if every MCP host eventually fixes its own lifecycle, MemPalace today has no defense against a misbehaving client. A single-instance guard inside mempalace.mcp_server is the structural fix — any future client (Desktop today, Cursor, Continue, a third-party MCP host, a hook variant) that misbehaves can't damage the palace if the second process refuses to start.
Expected behavior
When python -m mempalace.mcp_server is invoked while another mcp_server process is already attached to the same palace, the second invocation should:
Detect the existing process via PID file + advisory lock.
Refuse to start, emit a clear stderr message naming the holding PID and palace path, and exit non-zero.
The MCP host's transport log will then surface the failure cleanly (closed unexpectedly with a useful stderr trail) instead of silently double-attaching.
Suggested fix
A startup guard inside mempalace/mcp_server/__init__.py (or wherever the entrypoint lives — happy to PR once direction is confirmed):
On startup, compute a palace-keyed lock path, e.g. <palace_root>/.mcp_server.lock.
Open the lock file and attempt fcntl.flock(fd, LOCK_EX | LOCK_NB) (POSIX) / msvcrt.locking (Windows).
If the lock is already held: log "MemPalace MCP server already running for palace <path> (held by PID <pid>) — refusing to start" to stderr and exit 1.
If acquired: write the current PID into the file, register an atexit / signal handler to release on clean shutdown.
Stale lock recovery: if the PID in the file is no longer alive (kernel panic, pkill -9), reclaim the lock — flock releases automatically when the holder dies, but the PID-file content will be stale and the new process should overwrite it.
One caveat from prior art: the Cursor forum report notes that some loaders can spawn processes "faster than a PID lock can be established." flock-based locking is the right primitive here precisely because the kernel atomically arbitrates the contending opens — this is hopefully a non-issue on macOS/Linux, but worth verifying on Windows if MemPalace supports it.
Deferring final implementation choice to maintainer judgment — the structural ask is "make mcp_server refuse to second-start against a palace it doesn't own."
What I haven't been able to isolate
Exact client trigger. I can confirm from the Desktop log that two server processes existed concurrently, but I haven't been able to pin down which user-side action causes the second spawn. It is not consistently correlated with Desktop restarts, sleep/wake, or specific MCP tool calls in the captured window. A maintainer fix doesn't depend on knowing the trigger — the lock guard is correct regardless — but I want to flag the gap honestly.
Whether this reproduces on Claude Code. The captured log is Desktop-only (~/Library/Logs/Claude/); Claude Code's MCP traffic logs to ~/.claude/projects/.../*.jsonl instead and I haven't examined those for the same pattern. The lock guard would protect both.
Description
Claude Desktop's MCP loader can spawn a second
python -m mempalace.mcp_serverprocess against the same palace before the previous process has finished its shutdown handshake. Two writer processes then operate concurrently on~/.mempalace/palace/<collection-uuid>/, which onmempalace 3.3.2(i.e. without PR #976'shnsw:num_threads=1pin) reliably triggers the HNSWlink_lists.bincorruption pattern from #974.PR #976 is the correct ChromaDB-side fix and resolves the corruption itself. This issue is the complementary MemPalace-side defense: the server has no single-instance guard, so any client whose lifecycle misbehaves (Desktop today, potentially other harnesses tomorrow) can put two writers on the same palace. With #976 in place corruption is prevented, but two processes still race on the SQLite metadata layer, double-load the embedding model, and waste RAM/CPU on every overlap. A startup-time PID file + advisory lock would make MemPalace robust against any misbehaving client without requiring fixes upstream of MemPalace.
How this is different from #976 / #974
develop, awaiting v3.3.4): fixes the ChromaDB-side data race by pinninghnsw:num_threads=1so a single-threaded HNSW writer can't corruptlink_lists.bineven under concurrent access. Necessary, in.mempalace.mcp_serverpreventing two server processes from attaching to the same palace at the same time. With fix: HNSW graph corruption, PreCompact deadlock, mine fan-out (closes #974, #965, #955) #976, those two processes won't corrupt the index, but they will still race on SQLite, double-load the embedding model, and waste a couple of GB of RAM each time the client misbehaves. Defense in depth, not a duplicate.Cross-ref: this is the same family of "concurrent writer" bug as #1202 (Stop hook firing a second mine while one is already running). #1202 added a hook-side lock; this issue proposes the equivalent guard inside
mcp_serveritself, so any client — Desktop, Code, a third-party MCP host, a future hook variant — gets the same protection for free.Steps to reproduce
mempalace 3.3.2(the version captured in evidence; behavior is structurally present on all current versions since there's no PID guard inmcp_server).~/Library/Application Support/Claude/claude_desktop_config.jsonper the standard install instructions.Initializing server...line appears ~1h17m after the first, with noShutting down server...between them.intentional shutdownis logged and then a newInitializing server...appears ~1m later, followed by another spawn ~18m after that, all without the original process being cleanly torn down.3.3.2(nohnsw:num_threadspin), this overlap is the reproducer for the SIGSEGV in HNSW parallel inserts — missing hnsw:num_threads on collection creation #974/fix: HNSW graph corruption, PreCompact deadlock, mine fan-out (closes #974, #965, #955) #976 corruption —link_lists.binbloats unboundedly (peaked at 55 GB in our environment for ~50K vectors; expected size ~30 MB).Observed behavior — evidence from a 30-hour Desktop transport log
Concurrent-spawn instances captured in the sanitized Claude Desktop MCP transport log:
Initializing server...Initializing server...(overlapping)2026-04-25T02:32:02.572Z2026-04-25T03:49:08.358Z(1h17m later, noShutting down server...between them)2026-04-25T04:07:23.637Z(closed unexpectedly— operatorpkillduring recovery)2026-04-26T06:35:37.955Z2026-04-26T07:10:47.037Z, then again2026-04-26T07:29:08.614Z2026-04-26T08:05:41.878Z(closed unexpectedly— operatorpkillduring recovery)Full sanitized log (username/drawer-counts/AAAK content redacted, all timestamps and JSON-RPC events preserved): https://gist.github.com/Seph396/d8f724e58f066201b3cb527d0c7ffcc0
Important context for reading the log: the
closed unexpectedlylines at04-25 04:07:23and04-26 08:05:41are not server-side crashes — they correspond to manualpkill mempalaceinvocations during recovery from the runawaylink_lists.bingrowth. The bug being reported here is the concurrent spawn, not the unexpected close.Why this is a MemPalace-side problem (and not just a Claude Desktop problem)
I want to be careful with the scope of the claim, since I haven't independently audited Claude Desktop's MCP loader.
What I observed in this environment: the Desktop transport log shows multiple
Initializing server...events without interveningShutting down server...events for the same server tag, which is sufficient evidence that two server processes were live concurrently against the same palace.What is publicly reported about Claude Desktop's MCP lifecycle: there is an open upstream report (anthropics/claude-code#53134) describing two internal managers (
directMcpHostandLocalMcpServerManager) spawning every configured MCP server twice on Windows MSIX builds without coordinating. The Cursor community (forum thread) has reported a related "spawns happen faster than a PID lock can be established" race in their MCP loader. Independent fixes for this class of issue exist (Cresnova/claude-desktop-mcp-fix). My environment is macOS Tahoe, not Windows MSIX, so I'm not asserting it's the same exact bug — only that the pattern of MCP host loaders double-spawning servers is a known, documented class of behavior in the broader ecosystem.Why this still belongs in MemPalace: even if every MCP host eventually fixes its own lifecycle, MemPalace today has no defense against a misbehaving client. A single-instance guard inside
mempalace.mcp_serveris the structural fix — any future client (Desktop today, Cursor, Continue, a third-party MCP host, a hook variant) that misbehaves can't damage the palace if the second process refuses to start.Expected behavior
When
python -m mempalace.mcp_serveris invoked while anothermcp_serverprocess is already attached to the same palace, the second invocation should:closed unexpectedlywith a useful stderr trail) instead of silently double-attaching.Suggested fix
A startup guard inside
mempalace/mcp_server/__init__.py(or wherever the entrypoint lives — happy to PR once direction is confirmed):<palace_root>/.mcp_server.lock.fcntl.flock(fd, LOCK_EX | LOCK_NB)(POSIX) /msvcrt.locking(Windows)."MemPalace MCP server already running for palace <path> (held by PID <pid>) — refusing to start"to stderr and exit1.atexit/ signal handler to release on clean shutdown.pkill -9), reclaim the lock —flockreleases automatically when the holder dies, but the PID-file content will be stale and the new process should overwrite it.One caveat from prior art: the Cursor forum report notes that some loaders can spawn processes "faster than a PID lock can be established."
flock-based locking is the right primitive here precisely because the kernel atomically arbitrates the contending opens — this is hopefully a non-issue on macOS/Linux, but worth verifying on Windows if MemPalace supports it.Deferring final implementation choice to maintainer judgment — the structural ask is "make
mcp_serverrefuse to second-start against a palace it doesn't own."What I haven't been able to isolate
~/Library/Logs/Claude/); Claude Code's MCP traffic logs to~/.claude/projects/.../*.jsonlinstead and I haven't examined those for the same pattern. The lock guard would protect both.0d9929c0, so I don't have empirical evidence of how anum_threads=1build behaves under the same concurrent-spawn condition. Theory says the corruption is prevented but the resource waste / SQLite contention remains.Related upstream
hnsw:num_threads=1fix (merged todevelop, awaiting v3.3.4)mempalace migrate/statusSIGSEGV on chromadb version mismatch (unrelated root cause but adjacent failure mode)max_elementscapacity issue (separate from this)Environment
Sanitized 30-hour Desktop MCP transport log: https://gist.github.com/Seph396/d8f724e58f066201b3cb527d0c7ffcc0