Skip to content

HNSW max_elements frozen at 16K while collection grows to 100K+ entries — MCP server segfaults on every tool call #1222

@gounthar

Description

@gounthar

What happened?

After organic palace growth through regular mining and session capture, the MCP server began crashing with a segfault (exit 139) on every tool call, including `mempalace_status`. The palace was completely inaccessible.

Diagnosis revealed a fundamental mismatch: the HNSW vector index (`ba5ad160…` VECTOR segment for `mempalace_drawers`) had `max_elements=16384` but `chroma.sqlite3` held 192,997 embeddings. The HNSW resize mechanism stopped at some point, leaving SQLite and HNSW permanently diverged. Any operation that touched the HNSW triggered SIGSEGV in the Rust layer.

This is distinct from the link_lists.bin sparse-file bloat issues (#344, #1092). No bloated files here — just the SQLite entry count outgrowing `max_elements` with no guard or warning.

What did you expect?

Either ChromaDB auto-resizes the HNSW as the collection grows, or mempalace detects and warns about a SQLite/HNSW count mismatch before the server segfaults.

How to reproduce:

  1. Allow a palace to grow organically past ~16K drawers via `mine` or session hooks
  2. At some point (likely after an interrupted mine or concurrent write), the HNSW stops resizing
  3. SQLite continues accumulating entries; HNSW stays at 16K max_elements
  4. On next MCP server start, every tool call segfaults

The divergence is completely silent — no warning that HNSW is lagging behind SQLite.

Workaround:

# 1. Export current drawers from SQLite (safe, no HNSW touch)
python3 ~/.mempalace/recover.py export

# 2. Confirm the mismatch
python3 ~/.mempalace/recover.py status

# 3. Delete the undersized HNSW dir (segment ID = VECTOR segment for mempalace_drawers)
rm -rf ~/.mempalace/palace/<segment-id>/

# 4. ChromaDB auto-creates a fresh HNSW — BM25 text search keeps working for all entries
#    Semantic vector search degrades until new entries organically rebuild the index

Find the right segment ID:

import sqlite3
conn = sqlite3.connect(os.path.expanduser('~/.mempalace/palace/chroma.sqlite3'))
c = conn.cursor()
c.execute("""SELECT s.id FROM segments s
             JOIN collections col ON s.collection = col.id
             WHERE col.name = 'mempalace_drawers' AND s.scope = 'VECTOR'""")
print(c.fetchone()[0])

Also worth knowing: `recover.py test-fresh` fails with "attempt to write a readonly database" if the export JSON retains the `chroma:document` reserved metadata key. Strip it before upsert: `{k: v for k, v in meta.items() if k not in {'chroma:document', 'chroma:embedding'}}`.

Suggested fix:

  • `recover.py status` (and ideally the MCP server startup) should compare the SQLite embedding count vs HNSW `cur_element_count` and `max_elements`, and warn loudly when they diverge significantly
  • A SIGSEGV in the HNSW layer shouldn't take the whole MCP server down silently — some form of guard or graceful fallback to BM25-only mode would prevent the "palace completely unreachable" scenario

Environment:

  • OS: Linux 6.6.87 (WSL2 / Debian)
  • Python: 3.12
  • MemPalace: 3.3.3 (pip), 3.3.0 (plugin)
  • chromadb: 1.5.8

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions