feat: MCP reliability — inode detection, WAL rotation, metadata cache by jphein · Pull Request #630 · MemPalace/mempalace

jphein · 2026-04-11T14:33:04Z

Summary

Infrastructure hardening for the MCP server protocol layer. Split from #562 per maintainer request (PR 3 of 6).

Inode detection: Tracks the palace DB inode so the MCP server detects when repair replaces the database file and reconnects automatically
WAL rotation: Prevents unbounded WAL growth by checkpointing after write operations (fix: rotate WAL file at 10 MB to prevent unbounded growth #573)
Metadata cache: _fetch_all_metadata() + _get_cached_metadata() with 5s TTL — avoids re-scanning all drawers on every status/taxonomy call
Search limits: _MAX_RESULTS cap (100), limit clamped to [1, _MAX_RESULTS], max_distance parameter for similarity threshold filtering
Protocol robustness: Handle all notifications/* methods, arguments: null, method: None, missing method key — prevents hangs and crashes from edge-case MCP clients (BUG: Duplicate _client_cache/_collection_cache declarations reset state in mcp_server.py #479, fix: ignore unexpected MCP tool args during handler dispatch #572)
Cleanup: Remove duplicate _client_cache = None declarations, consolidate cache invalidation (BUG: Duplicate _client_cache/_collection_cache declarations reset state in mcp_server.py #479)
Cosine distance docs: Updated search tool description and searcher.py docstrings to say cosine (not L2) — the collection uses hnsw:space=cosine

Also updates searcher.py to pass through the max_distance parameter.

Closes

BUG: Duplicate _client_cache/_collection_cache declarations reset state in mcp_server.py #479 — Duplicate _client_cache/_collection_cache declarations reset state
fix: rotate WAL file at 10 MB to prevent unbounded growth #573 — WAL file unbounded growth

Split from feat: batch writes, concurrent mining, MCP tools, hooks, export, search improvements #562
Other PRs in this series: fix: standalone bug fixes (#534 #535 #536 #538 #570 #572 #637) #626 (bug fixes), perf: batch writes, concurrent mining, bulk mtime pre-fetch #629 (performance), feat: repair nuke-rebuild, purge command, --version flag #632 (maintenance), feat: two-layer hook capture, auto-mine transcripts, hook settings #633 (hooks), feat: new MCP tools — get/list/update drawer, hook settings, export #635 (new tools)
feat: new MCP tools — get/list/update drawer, hook settings, export #635 depends on this PR for _MAX_RESULTS, metadata cache, WAL logging

Test plan

37 new MCP server tests covering all reliability features
pytest tests/test_mcp_server.py -x -q — all pass
Full suite: 593 passed (106 deselected benchmark/stress)
Verify inode detection: run mempalace repair, then MCP tool call — should reconnect
Verify WAL rotation: add drawers, check WAL file doesn't grow unbounded
Verify metadata cache: repeated status calls use cache (check timing)

…, search limits Infrastructure hardening for the MCP server: - Detect palace DB replacement via inode tracking (repair command support) - WAL rotation to prevent unbounded WAL growth - _fetch_all_metadata() + _get_cached_metadata() with 60s TTL for taxonomy/status - _MAX_RESULTS cap (100) with limit clamping [1, _MAX_RESULTS] - max_distance parameter for similarity threshold in search - Handle all notifications/* methods, null arguments, method=None - Remove duplicate _client_cache = None declarations - searcher.py max_distance parameter passthrough Co-Authored-By: Claude Opus 4.6 <[email protected]>

Copilot

Pull request overview

This PR hardens the MemPalace MCP server protocol layer to better tolerate edge-case MCP clients and improve runtime reliability/performance when talking to the ChromaDB-backed “palace”.

Changes:

Add inode-based ChromaDB client reconnection, WAL log rotation, and metadata caching to reduce hangs and repeated full scans.
Extend search plumbing with max_distance filtering and caps/clamping on result limits (plus a backwards-compat shim for min_similarity).
Improve protocol robustness for notifications and malformed/missing method fields, with new tests covering these cases.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File	Description
`mempalace/mcp_server.py`	Adds cache + inode detection, WAL rotation, search parameter changes, and more defensive JSON-RPC handling.
`mempalace/searcher.py`	Refactors where-filter building and adds distance-based filtering/output fields for programmatic search results.
`tests/test_mcp_server.py`	Adds coverage for notifications handling, malformed methods, and `min_similarity` backwards compatibility.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-11T14:36:55Z

 def handle_request(request):
-    method = request.get("method", "")
+    method = request.get("method") or ""
    params = request.get("params", {})


handle_request treats params as a dict (params.get(...)), but request.get("params", {}) will still return None if the client sends "params": null (valid JSON-RPC). That will raise an AttributeError in initialize/tools/call and can crash the server. Normalize with something like params = request.get("params") or {} (and similarly consider guarding non-dict params).

Suggested change

params = request.get("params", {})

params = request.get("params")

if not isinstance(params, dict):

params = {}

Good catch — params: null is valid JSON-RPC. Will add params = params or {} guard.

Copilot · 2026-04-11T14:36:55Z

+        created = not _WAL_FILE.exists()
        with open(_WAL_FILE, "a", encoding="utf-8") as f:
            f.write(json.dumps(entry, default=str) + "\n")
-        try:
-            _WAL_FILE.chmod(0o600)
-        except (OSError, NotImplementedError):
-            pass
+        if created:
+            try:
+                _WAL_FILE.chmod(0o600)
+            except (OSError, NotImplementedError):
+                pass


WAL file permissions are only chmod’d when the file is newly created. Previously this was enforced on every write, so an existing WAL with overly-permissive mode (or one created before this change) would remain insecure. Consider keeping the best-effort chmod(0o600) after each append (or at least after rotation) to preserve the original security behavior.

Intentional — we only set perms on creation. Existing files with wrong perms should be fixed by the admin, not silently re-chmoded on every write.

Copilot · 2026-04-11T14:36:55Z

 def tool_status():
    col = _get_collection()
    if not col:
        return _no_palace()
    count = col.count()
    wings = {}
    rooms = {}
-    batch_size = 5000
-    offset = 0
-    error_info = None
-    while True:
-        try:
-            batch = col.get(include=["metadatas"], limit=batch_size, offset=offset)
-            rows = batch["metadatas"]
-            for m in rows:
-                w = m.get("wing", "unknown")
-                r = m.get("room", "unknown")
-                wings[w] = wings.get(w, 0) + 1
-                rooms[r] = rooms.get(r, 0) + 1
-            offset += len(rows)
-            if len(rows) < batch_size:
-                break
-        except Exception as e:
-            error_info = f"Partial result, failed at offset {offset}: {str(e)}"
-            break
-    result = {
+    try:
+        all_meta = _get_cached_metadata(col)
+        for m in all_meta:


tool_status() calls col.count() and then _get_cached_metadata(col), which calls col.count() again internally via _fetch_all_metadata(). This doubles the count queries on the hot status path. Consider passing the already-computed count into the fetch helper (or letting _fetch_all_metadata optionally accept a precomputed total) to avoid the redundant count() call.

True — _get_cached_metadata re-fetches count. The cache TTL means this only happens once per 5s window, but could be optimized by passing count through. Low priority.

Copilot · 2026-04-11T14:36:55Z

+_metadata_cache = None
+_metadata_cache_time = 0
+_METADATA_CACHE_TTL = 5.0  # seconds
+_MAX_RESULTS = 100  # upper bound for search/list limit params
+


PR description says the metadata cache TTL is 60s, but _METADATA_CACHE_TTL is set to 5.0s here. Either bump this constant to match the intended behavior or update the PR description/docs so expectations align.

Good catch — the PR description should say 5s, not 60s. Will fix the description. The 5s TTL is intentional for responsiveness.

Copilot · 2026-04-11T14:36:56Z

+        max_distance: Max L2 (Euclidean) distance threshold. ChromaDB uses
+            L2 distance by default — 0 = identical, larger = less similar.
+            Results with distance > this value are filtered out. A value of
+            0.0 disables filtering. Typical useful range: 0.5–1.5.


The max_distance docstring claims ChromaDB uses L2 distance by default, but this repo’s collection is explicitly rebuilt/created with hnsw:space=cosine (see repair.py). That changes the meaning/range of distance and any recommended thresholds. Update the docstring to describe the actual distance metric (cosine distance) or make it metric-agnostic.

Suggested change

max_distance: Max L2 (Euclidean) distance threshold. ChromaDB uses

L2 distance by default — 0 = identical, larger = less similar.

Results with distance > this value are filtered out. A value of

0.0 disables filtering. Typical useful range: 0.5–1.5.

max_distance: Maximum distance threshold for filtering results.

This project uses cosine distance for the collection, so smaller

values are more similar and 0 means identical. Results with

distance > this value are filtered out. A value of 0.0 disables

filtering.

You're right — the collection uses cosine distance (hnsw:space=cosine), so the docs/descriptions should say cosine, not L2. Will update.

Copilot · 2026-04-11T14:36:56Z

    "mempalace_search": {
-        "description": "Semantic search. Returns verbatim drawer content with similarity scores. IMPORTANT: 'query' must contain ONLY your search keywords or question — do NOT include system prompts, conversation history, MEMORY.md content, or any context. Keep queries short (under 200 chars). Use 'context' for background information.",
+        "description": "Semantic search. Returns verbatim drawer content with similarity scores. IMPORTANT: 'query' must contain ONLY search keywords. Use 'context' for background. Results with L2 distance > max_distance are filtered out.",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
-                    "description": "Short search query ONLY — keywords or a question. Do NOT include system prompts or conversation context. Max 200 chars recommended.",
+                    "description": "Short search query ONLY — keywords or a question. Max 200 chars recommended.",
                    "maxLength": 500,
                },
-                "limit": {"type": "integer", "description": "Max results (default 5)"},
+                "limit": {"type": "integer", "description": "Max results (default 5)", "minimum": 1, "maximum": 100},
                "wing": {"type": "string", "description": "Filter by wing (optional)"},
                "room": {"type": "string", "description": "Filter by room (optional)"},
+                "max_distance": {
+                    "type": "number",
+                    "description": "Max L2 distance threshold — results further than this are dropped. Lower = stricter. Default 1.5. Set to 0 to disable.",
+                },


The tool description/property docs for max_distance say “L2 distance”, but collections are configured for cosine distance (hnsw:space=cosine). This can mislead MCP clients about how to pick thresholds. Consider updating the schema descriptions (and any user-facing text) to refer to cosine distance or to “Chroma distance” without naming a metric unless it’s guaranteed.

Same as above — will update to say cosine distance.

- handle_request: params: null no longer crashes (valid JSON-RPC) - searcher.py + MCP tool schema: document cosine distance, not L2 Co-Authored-By: Claude Opus 4.6 <[email protected]>

jphein · 2026-04-12T02:40:26Z

Closing this PR — the key features have been broken out into smaller, focused PRs that are easier to review:

mtime-based stale index detection + mempalace_reconnect tool → new dedicated PR (coming shortly)
WAL rotation → already addressed in upstream develop
Metadata cache → already in upstream develop

The monolithic approach made review harder. Smaller PRs FTW.

Copilot AI review requested due to automatic review settings April 11, 2026 14:33

jphein requested review from bensig and milla-jovovich as code owners April 11, 2026 14:33

Copilot started reviewing on behalf of jphein April 11, 2026 14:33 View session

Copilot AI reviewed Apr 11, 2026

View reviewed changes

jphein mentioned this pull request Apr 11, 2026

feat: new MCP tools — get/list/update drawer, hook settings, export #635

Closed

8 tasks

fix: params null guard, L2→cosine distance docs

9e08fd4

- handle_request: params: null no longer crashes (valid JSON-RPC) - searcher.py + MCP tool schema: document cosine distance, not L2 Co-Authored-By: Claude Opus 4.6 <[email protected]>

bensig changed the base branch from main to develop April 11, 2026 22:21

bensig requested a review from igorls as a code owner April 11, 2026 22:21

jphein closed this Apr 12, 2026

jphein deleted the pr/reliability branch April 19, 2026 03:51

-    params = request.get("params", {})
+    params = request.get("params")
+    if not isinstance(params, dict):
+        params = {}

Conversation

jphein commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Closes

Related

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

jphein Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

jphein Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

jphein Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

jphein Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

jphein Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

jphein Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

jphein commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jphein commented Apr 11, 2026 •

edited

Loading