Skip to content

BUG: mempalace_search returns stale results after mid-session CLI mine — cached HNSW client not invalidated #608

@sinful1992

Description

@sinful1992

Summary

After a live Claude Code session has connected to the MCP server, running mempalace mine from another process (e.g., the CLI) does not make the newly-filed drawers visible to mempalace_search. The MCP server's mempalace_status does reflect the new total_drawers, so the failure is subtle and easy to miss.

Reproduction

  1. Start a Claude Code session with the mempalace plugin active (MCP server running).
  2. Call mempalace_status → note total_drawers (e.g., 18,941).
  3. Call mempalace_search "some query" → note top 5 results.
  4. From a bash tool in the same session, run py -3.10 -m mempalace mine /path/to/new/content --wing X.
  5. Call mempalace_status again → total_drawers is correctly updated (e.g., 19,208). Good.
  6. Call mempalace_search "some query" with the same query → results are unchanged from step 3, even when the newly-mined content should rank in the top-N.

Symptom signature

Comparing MCP search to a fresh chromadb.PersistentClient opened via the CLI against the same palace, for identical queries:

# Fresh client (CLI/REPL) MCP server (cached)
1 0.291 agent-aa9b45c 0.291 agent-aa9b45c
2 0.271 agent-a6570a… 0.271 agent-a6570a…
3 0.233 agent-af34ef… 0.233 agent-af34ef…
4 0.231 tender-shimmying-glade.md (newly mined) 0.155 agent-a6570a…
5 0.181 tender-shimmying-glade.md (newly mined) 0.134 agent-a35abd…

The first N positions tend to match because dominant older content wins regardless, but positions N+1 onward quietly return lower-scoring older drawers instead of correctly-ranked new ones. Easy to miss unless you A/B against a fresh process.

Root cause

Since #135, mcp_server.py caches the ChromaDB client and collection as module globals, set once on first use:

https://github.com/milla-jovovich/mempalace/blob/main/mempalace/mcp_server.py#L103-L126

_client_cache = None
_collection_cache = None

def _get_client():
    global _client_cache
    if _client_cache is None:
        _client_cache = chromadb.PersistentClient(path=_config.palace_path)
    return _client_cache

Two compounding factors:

  1. _client_cache / _collection_cache are never invalidated.
  2. ChromaDB's PersistentClient additionally holds a per-path SharedSystemClient singleton with an in-memory HNSW index frozen at client creation. Even if you re-instantiate PersistentClient without clearing SharedSystemClient, you get the same stale index back.

Why mempalace_status still looks correct: col.count() reads SQLite directly, which external writes hit. Query path uses the frozen HNSW, which they don't. Hence: count moves, results don't.

Proposed fix

mtime-triggered cache invalidation on palace/chroma.sqlite3, rate-limited to once every 2s, with a defensive fallback if SharedSystemClient.clear_system_cache ever moves in a future chromadb release. Tested locally against chromadb 0.6.3 on Windows.

import time

_client_cache = None
_collection_cache = None
_cache_sqlite_mtime = 0.0
_cache_last_check = 0.0
_CACHE_CHECK_INTERVAL = 2.0


def _sqlite_mtime():
    try:
        return os.path.getmtime(os.path.join(_config.palace_path, "chroma.sqlite3"))
    except OSError:
        return 0.0


def _maybe_invalidate_cache():
    global _client_cache, _collection_cache, _cache_sqlite_mtime, _cache_last_check
    now = time.monotonic()
    if now - _cache_last_check < _CACHE_CHECK_INTERVAL:
        return
    _cache_last_check = now
    current = _sqlite_mtime()
    if _cache_sqlite_mtime == 0.0:
        _cache_sqlite_mtime = current
        return
    if current > _cache_sqlite_mtime:
        logger.info("palace mtime changed; clearing chromadb client cache")
        try:
            from chromadb.api.client import SharedSystemClient
            SharedSystemClient.clear_system_cache()
        except Exception as e:
            logger.warning(f"clear_system_cache failed: {e}")
        _client_cache = None
        _collection_cache = None
        _cache_sqlite_mtime = current


def _get_client():
    global _client_cache
    _maybe_invalidate_cache()
    if _client_cache is None:
        _client_cache = chromadb.PersistentClient(path=_config.palace_path)
    return _client_cache

Design notes:

  • Rate-limited stat() (2s) to avoid filesystem hammering on every tool call.
  • First call records the initial mtime without invalidating (preserves cold-start behavior).
  • clear_system_cache is imported lazily and wrapped in try/except — a chromadb API change won't crash the MCP server; it'll log a warning and fall through to the current stale-cache behavior.
  • Writes that go through the MCP server itself (_get_collection(create=True)) continue to work because they share the same client.

Alternatives considered

Environment

  • mempalace 3.1.0 (installed via pip)
  • chromadb 0.6.3
  • Python 3.10, Windows 11
  • Claude Code plugin milla-jovovich/mempalace v3.0.14

Happy to turn this into a PR with tests if the mtime-stat approach is the direction you'd like to take.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/mcpMCP server and toolsarea/searchSearch and retrievalbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions