fix(mcp): lazy per-path KnowledgeGraph cache (#1136)#1160
fix(mcp): lazy per-path KnowledgeGraph cache (#1136)#1160igorls merged 8 commits intoMemPalace:developfrom
Conversation
2f93e6c to
e9dc63f
Compare
|
Hi, tool_reconnect() clears only the ChromaDB client/collection caches and leaves the per-path KnowledgeGraph cache intact, so after external edits/replacement of knowledge_graph.sqlite3 the server can keep using an old open SQLite connection. Severity: action required | Category: reliability How to fix: Invalidate KG cache on reconnect Agent prompt to fix - you can give this to your LLM of choice:
We noticed a couple of other issues in this PR as well - happy to share if helpful. Spotted by Qodo code review - free for open-source projects. |
e9dc63f to
0de41d7
Compare
…1136) Swap the module-level KnowledgeGraph singleton for a lazy, per-path cache keyed by the resolved sqlite path. Import no longer creates a sqlite file as a side effect, and MCP servers started with --palace now route KG calls to the correct tenant when MEMPALACE_PALACE_PATH changes between calls, matching the per-call behavior of _get_client() on the ChromaDB side. Default-path behavior is preserved: without --palace at startup, KG stays on DEFAULT_KG_PATH regardless of env var. The "no --palace but env var set" case is MemPalace#540's scope and is not changed here.
Direct module-attribute patching of _kg is obsolete after the lazy cache refactor. Switch test helpers to patch _get_kg instead so the fixture KG replaces the factory rather than a now-missing singleton. - tests/test_mcp_server.py: _patch_mcp_server helper - tests/benchmarks/test_mcp_bench.py: _patch_mcp_config helper - tests/benchmarks/test_memory_profile.py: inline patch in test_tool_status_repeated_calls
TestKGLazyCache covers the scenarios behind the lazy per-path refactor: - test_lazy_init_no_import_side_effect: a fresh subprocess import does not create ~/.mempalace/knowledge_graph.sqlite3 (what closed PR MemPalace#167 was aiming at). - test_get_kg_returns_same_instance: two _get_kg() calls under the same resolved path return the same object, cache has one entry. - test_get_kg_different_paths_different_instances: rotating env var produces distinct KGs. - test_multi_tenant_env_switch: the exact scenario from MemPalace#1136 — write under path A, query under path B returns empty, switching back to A sees the fact. - test_cache_thread_safe: 16 threads racing _get_kg() end up with one shared instance and one cache entry.
Passing a stripped env dict without SYSTEMROOT/WINDIR breaks Python bootstrap on Windows (_Py_HashRandomization_Init). Inherit the parent env and strip MEMPAL* vars instead, then override HOME/USERPROFILE to the tmp dir.
Inline comments referencing MemPalace#1136 and MemPalace#540 add no information the identifiers do not already convey. PR description carries the context; code stays quiet.
tool_reconnect cleared ChromaDB caches but left _kg_by_path entries intact. After an external replacement of knowledge_graph.sqlite3 the server kept serving the old open sqlite3.Connection, returning stale results. Now iterate _kg_by_path under _kg_cache_lock, call close() best-effort, and clear the dict so the next tool call reopens the KG from disk. Two new tests in TestKGLazyCache verify cache invalidation and that a failing close() does not block the clear.
0de41d7 to
0a62658
Compare
Race scenario: a KG tool handler calls _get_kg() and gets a live KnowledgeGraph; another thread fires tool_reconnect() between that return and the handler's kg.add_triple()/kg.query_entity()/etc call. tool_reconnect drains _kg_by_path and closes the underlying sqlite3.Connection; the handler then raises sqlite3.ProgrammingError: 'Cannot operate on a closed database', which surfaces as a -32000 to the MCP client even though the user just asked for a reconnect. New _call_kg(op) helper wraps each handler's kg call in a one-shot retry: catch exactly sqlite3.ProgrammingError, evict the stale entry (only if the cache slot still points at the closed instance — another thread may have already replaced it), and rerun op against a fresh _get_kg(). Beyond one retry give up so a sustained close-stream surfaces clearly instead of looping. All five KG handlers (tool_kg_query, tool_kg_add, tool_kg_invalidate, tool_kg_timeline, tool_kg_stats) now route through _call_kg. Tests pin the contract: * retries with a fresh KG and returns the second result * non-ProgrammingError exceptions propagate without retry * gives up after exactly one retry on sustained close
|
Heads-up: Adds
The body's |
Summary
Replaces the module-level
_kg = KnowledgeGraph(...)singleton inmcp_server.py with a lazy, per-path cache. The KG path is now resolved
on each tool invocation, so a multi-tenant host can rotate
MEMPALACE_PALACE_PATHbetween calls and have each call address thecorrect sqlite file. The ChromaDB side of the server was already
per-call via
_get_client(); this brings the KG side in line.Closes #1136.
Before / After
Before (module scope, import-time):
After:
Follows Option A from the reporter in #1136.
Context: earlier attempt
A previous attempt in #167 introduced a lazy single-instance init. It
was closed on 2026-04-12 with a note saying it had been reworked in
#647 and #667, but neither of those PRs actually touched the singleton
pattern. That is why #1136 landed on current develop with the eager
init still in place. Easy to verify against
upstream/developatmempalace/mcp_server.py:101before this PR.Scope notes
without
--palace, KG keeps writing to~/.mempalace/knowledge_graph.sqlite3regardless ofMEMPALACE_PALACE_PATH. Multi-tenant path rotation applies onlywhen
--palacewas given at startup. The "no --palace but env varset" case is the subject of fix: KG path mismatch between MCP server and CLI #540 and is explicitly out of scope here.
each KnowledgeGraph holds one sqlite connection plus a lock.
Test plan
_patch_mcp_serverhelper patches_get_kginstead of_kgtest_lazy_init_no_import_side_effect: fresh subprocess importunder
HOME=tmpdoes not create the sqlite filetest_get_kg_returns_same_instance: cache hits returnis-equaltest_get_kg_different_paths_different_instances: rotating envgives distinct KGs
test_multi_tenant_env_switch: reporter's scenario (write underA, query under B is empty, query under A again returns the fact)
test_cache_thread_safe: 16 concurrent_get_kg()end up withone shared instance and one cache entry
pytest tests/ --ignore=tests/benchmarks: 1080 passedruff check+ruff format --checkcleanBreaking changes (internal)
Direct attribute access to
mempalace.mcp_server._kgis removed. Thisis an underscore-prefixed private attribute; the only in-tree callers
were three test files, all migrated in this PR.