HNSW max_elements frozen at 16K while collection grows to 100K+ entries — MCP server segfaults on every tool call

**What happened?**

After organic palace growth through regular mining and session capture, the MCP server began crashing with a segfault (exit 139) on every tool call, including \`mempalace_status\`. The palace was completely inaccessible.

Diagnosis revealed a fundamental mismatch: the HNSW vector index (\`ba5ad160…\` VECTOR segment for \`mempalace_drawers\`) had \`max_elements=16384\` but \`chroma.sqlite3\` held **192,997 embeddings**. The HNSW resize mechanism stopped at some point, leaving SQLite and HNSW permanently diverged. Any operation that touched the HNSW triggered SIGSEGV in the Rust layer.

This is distinct from the link_lists.bin sparse-file bloat issues (#344, #1092). No bloated files here — just the SQLite entry count outgrowing \`max_elements\` with no guard or warning.

**What did you expect?**

Either ChromaDB auto-resizes the HNSW as the collection grows, or mempalace detects and warns about a SQLite/HNSW count mismatch before the server segfaults.

**How to reproduce:**

1. Allow a palace to grow organically past ~16K drawers via \`mine\` or session hooks
2. At some point (likely after an interrupted mine or concurrent write), the HNSW stops resizing
3. SQLite continues accumulating entries; HNSW stays at 16K max_elements
4. On next MCP server start, every tool call segfaults

The divergence is completely silent — no warning that HNSW is lagging behind SQLite.

**Workaround:**

```bash
# 1. Export current drawers from SQLite (safe, no HNSW touch)
python3 ~/.mempalace/recover.py export

# 2. Confirm the mismatch
python3 ~/.mempalace/recover.py status

# 3. Delete the undersized HNSW dir (segment ID = VECTOR segment for mempalace_drawers)
rm -rf ~/.mempalace/palace/<segment-id>/

# 4. ChromaDB auto-creates a fresh HNSW — BM25 text search keeps working for all entries
#    Semantic vector search degrades until new entries organically rebuild the index
```

Find the right segment ID:
```python
import sqlite3
conn = sqlite3.connect(os.path.expanduser('~/.mempalace/palace/chroma.sqlite3'))
c = conn.cursor()
c.execute("""SELECT s.id FROM segments s
             JOIN collections col ON s.collection = col.id
             WHERE col.name = 'mempalace_drawers' AND s.scope = 'VECTOR'""")
print(c.fetchone()[0])
```

Also worth knowing: \`recover.py test-fresh\` fails with "attempt to write a readonly database" if the export JSON retains the \`chroma:document\` reserved metadata key. Strip it before upsert: \`{k: v for k, v in meta.items() if k not in {'chroma:document', 'chroma:embedding'}}\`.

**Suggested fix:**

- \`recover.py status\` (and ideally the MCP server startup) should compare the SQLite embedding count vs HNSW \`cur_element_count\` and \`max_elements\`, and warn loudly when they diverge significantly
- A SIGSEGV in the HNSW layer shouldn't take the whole MCP server down silently — some form of guard or graceful fallback to BM25-only mode would prevent the "palace completely unreachable" scenario

**Environment:**
- OS: Linux 6.6.87 (WSL2 / Debian)
- Python: 3.12
- MemPalace: 3.3.3 (pip), 3.3.0 (plugin)
- chromadb: 1.5.8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HNSW max_elements frozen at 16K while collection grows to 100K+ entries — MCP server segfaults on every tool call #1222

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

HNSW max_elements frozen at 16K while collection grows to 100K+ entries — MCP server segfaults on every tool call #1222

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions