mempalace migrate/status crash with SIGSEGV on chromadb version mismatch — palace unrecoverable via official tooling

## Problem

When `chromadb` is upgraded and the palace index is on an older format, `mempalace migrate` (and `mempalace status`) exit with code 139 (SIGSEGV) and produce **zero output**.

**Root cause:** `migrate.py` (~line 162) calls `col.count()` to probe whether the palace is readable. This call segfaults inside chromadb's native HNSW layer. SIGSEGV bypasses Python's `try/except`, so the migration fallback code (lines 166–171: extract from SQLite, rebuild) is **never reached** — the whole process dies silently.

The same SIGSEGV also kills `mempalace status` and the MCP server on startup, leaving the palace **completely inaccessible through all official tooling**.

## Environment

- macOS 15, Apple Silicon M-series
- mempalace installed via pip (pyenv 3.11.6)
- Palace: 3870 drawers + 139 closets
- `mempalace migrate --dry-run` → exit 139, zero output
- `mempalace status` → exit 139, zero output
- `mempalace repair` → exit 139, zero output

## Workaround

Manual recovery by bypassing the `col.count()` probe entirely:

1. Extract drawers + closets from `chroma.sqlite3` via raw SQL (reuses `extract_drawers_from_sqlite` from `migrate.py`)
2. `shutil.copytree` backup
3. `ChromaBackend().get_or_create_collection` in a temp dir, `col.add()` in batches (re-embeds locally)
4. Copy `knowledge_graph.sqlite3` sidecars
5. Swap temp dir into place

Full recovery script:

```python
"""Manual MemPalace recovery — bypasses the segfaulting count() probe."""
from __future__ import annotations
import os, shutil, sys, tempfile
from datetime import datetime

PALACE = os.path.expanduser("~/.mempalace/palace")
DB = os.path.join(PALACE, "chroma.sqlite3")

from mempalace.migrate import extract_drawers_from_sqlite
from mempalace.backends.chroma import ChromaBackend

def extract_collection_from_sqlite(db_path, collection_name):
    import sqlite3
    conn = sqlite3.connect(db_path)
    try:
        rows = conn.execute("""
            SELECT em.id,
                MAX(CASE WHEN em.key = 'chroma:document' THEN em.string_value END) as document
            FROM embedding_metadata em
            JOIN embeddings e ON e.id = em.id
            JOIN segments s ON s.id = e.segment_id
            JOIN collections c ON c.id = s.collection
            WHERE c.name = ? GROUP BY em.id""", (collection_name,)).fetchall()
        results = []
        for row_id, document in rows:
            if not document: continue
            metas = conn.execute("""SELECT em.key, em.string_value, em.int_value,
                em.float_value, em.bool_value FROM embedding_metadata em
                WHERE em.id = ? AND em.key NOT LIKE 'chroma:%'""", (row_id,)).fetchall()
            metadata = {}
            for k, sv, iv, fv, bv in metas:
                if sv is not None: metadata[k] = sv
                elif iv is not None: metadata[k] = iv
                elif fv is not None: metadata[k] = fv
                elif bv is not None: metadata[k] = bool(bv)
            ext_id = conn.execute(
                "SELECT embedding_id FROM embeddings WHERE id = ?", (row_id,)).fetchone()
            if ext_id:
                results.append({"id": ext_id[0], "document": document, "metadata": metadata})
        return results
    finally:
        conn.close()

drawers = extract_drawers_from_sqlite(DB)
closets = extract_collection_from_sqlite(DB, "mempalace_closets")
print(f"Extracted: {len(drawers)} drawers, {len(closets)} closets")

ts = datetime.now().strftime("%Y%m%d_%H%M%S")
backup = f"{PALACE}.pre-migrate.{ts}"
shutil.copytree(PALACE, backup)
print(f"Backup: {backup}")

temp = tempfile.mkdtemp(prefix="mempalace_recover_")
fresh = ChromaBackend()

def reimport(name, items, batch_size=500):
    if not items: return 0
    col = fresh.get_or_create_collection(temp, name)
    done = 0
    for i in range(0, len(items), batch_size):
        batch = items[i:i+batch_size]
        col.add(ids=[d["id"] for d in batch],
                documents=[d["document"] for d in batch],
                metadatas=[d["metadata"] for d in batch])
        done += len(batch)
        print(f"  {name}: {done}/{len(items)}")
    return col.count()

dr_n = reimport("mempalace_drawers", drawers)
cl_n = reimport("mempalace_closets", closets)
print(f"Re-imported: drawers={dr_n}, closets={cl_n}")

for sidecar in ("knowledge_graph.sqlite3", "knowledge_graph.sqlite3-shm", "knowledge_graph.sqlite3-wal"):
    src = os.path.join(PALACE, sidecar)
    if os.path.isfile(src):
        shutil.copy2(src, os.path.join(temp, sidecar))

shutil.rmtree(PALACE)
shutil.move(temp, PALACE)
print("Done.")
```

## Suggested fix

Wrap the `col.count()` probe in a **subprocess** so SIGSEGV is contained at the process boundary:

```python
import subprocess, sys

def probe_collection_count(palace_path: str, collection_name: str):
    """Returns int count, or None if probe segfaults/fails."""
    code = (
        f"from mempalace.backends.chroma import ChromaBackend; "
        f"cb=ChromaBackend(); col=cb.get_collection('{palace_path}', '{collection_name}'); "
        f"print(col.count())"
    )
    try:
        result = subprocess.run(
            [sys.executable, "-c", code],
            capture_output=True, text=True, timeout=15
        )
        if result.returncode == 0:
            return int(result.stdout.strip())
    except Exception:
        pass
    return None  # crashed → proceed with SQL extraction
```

Same pattern applies to `mempalace status` and MCP server startup.

## Impact

Without this fix, every chromadb version bump that changes the HNSW index format leaves users with a palace that cannot be migrated, repaired, or read — all official tooling crashes before producing any output. The SQLite data is always intact but unreachable without a manual workaround like the script above.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mempalace migrate/status crash with SIGSEGV on chromadb version mismatch — palace unrecoverable via official tooling #1218

Problem

Environment

Workaround

Suggested fix

Impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

mempalace migrate/status crash with SIGSEGV on chromadb version mismatch — palace unrecoverable via official tooling #1218

Description

Problem

Environment

Workaround

Suggested fix

Impact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions