Skip to content

mempalace migrate/status crash with SIGSEGV on chromadb version mismatch — palace unrecoverable via official tooling #1218

@sonicology

Description

@sonicology

Problem

When chromadb is upgraded and the palace index is on an older format, mempalace migrate (and mempalace status) exit with code 139 (SIGSEGV) and produce zero output.

Root cause: migrate.py (~line 162) calls col.count() to probe whether the palace is readable. This call segfaults inside chromadb's native HNSW layer. SIGSEGV bypasses Python's try/except, so the migration fallback code (lines 166–171: extract from SQLite, rebuild) is never reached — the whole process dies silently.

The same SIGSEGV also kills mempalace status and the MCP server on startup, leaving the palace completely inaccessible through all official tooling.

Environment

  • macOS 15, Apple Silicon M-series
  • mempalace installed via pip (pyenv 3.11.6)
  • Palace: 3870 drawers + 139 closets
  • mempalace migrate --dry-run → exit 139, zero output
  • mempalace status → exit 139, zero output
  • mempalace repair → exit 139, zero output

Workaround

Manual recovery by bypassing the col.count() probe entirely:

  1. Extract drawers + closets from chroma.sqlite3 via raw SQL (reuses extract_drawers_from_sqlite from migrate.py)
  2. shutil.copytree backup
  3. ChromaBackend().get_or_create_collection in a temp dir, col.add() in batches (re-embeds locally)
  4. Copy knowledge_graph.sqlite3 sidecars
  5. Swap temp dir into place

Full recovery script:

"""Manual MemPalace recovery — bypasses the segfaulting count() probe."""
from __future__ import annotations
import os, shutil, sys, tempfile
from datetime import datetime

PALACE = os.path.expanduser("~/.mempalace/palace")
DB = os.path.join(PALACE, "chroma.sqlite3")

from mempalace.migrate import extract_drawers_from_sqlite
from mempalace.backends.chroma import ChromaBackend

def extract_collection_from_sqlite(db_path, collection_name):
    import sqlite3
    conn = sqlite3.connect(db_path)
    try:
        rows = conn.execute("""
            SELECT em.id,
                MAX(CASE WHEN em.key = 'chroma:document' THEN em.string_value END) as document
            FROM embedding_metadata em
            JOIN embeddings e ON e.id = em.id
            JOIN segments s ON s.id = e.segment_id
            JOIN collections c ON c.id = s.collection
            WHERE c.name = ? GROUP BY em.id""", (collection_name,)).fetchall()
        results = []
        for row_id, document in rows:
            if not document: continue
            metas = conn.execute("""SELECT em.key, em.string_value, em.int_value,
                em.float_value, em.bool_value FROM embedding_metadata em
                WHERE em.id = ? AND em.key NOT LIKE 'chroma:%'""", (row_id,)).fetchall()
            metadata = {}
            for k, sv, iv, fv, bv in metas:
                if sv is not None: metadata[k] = sv
                elif iv is not None: metadata[k] = iv
                elif fv is not None: metadata[k] = fv
                elif bv is not None: metadata[k] = bool(bv)
            ext_id = conn.execute(
                "SELECT embedding_id FROM embeddings WHERE id = ?", (row_id,)).fetchone()
            if ext_id:
                results.append({"id": ext_id[0], "document": document, "metadata": metadata})
        return results
    finally:
        conn.close()

drawers = extract_drawers_from_sqlite(DB)
closets = extract_collection_from_sqlite(DB, "mempalace_closets")
print(f"Extracted: {len(drawers)} drawers, {len(closets)} closets")

ts = datetime.now().strftime("%Y%m%d_%H%M%S")
backup = f"{PALACE}.pre-migrate.{ts}"
shutil.copytree(PALACE, backup)
print(f"Backup: {backup}")

temp = tempfile.mkdtemp(prefix="mempalace_recover_")
fresh = ChromaBackend()

def reimport(name, items, batch_size=500):
    if not items: return 0
    col = fresh.get_or_create_collection(temp, name)
    done = 0
    for i in range(0, len(items), batch_size):
        batch = items[i:i+batch_size]
        col.add(ids=[d["id"] for d in batch],
                documents=[d["document"] for d in batch],
                metadatas=[d["metadata"] for d in batch])
        done += len(batch)
        print(f"  {name}: {done}/{len(items)}")
    return col.count()

dr_n = reimport("mempalace_drawers", drawers)
cl_n = reimport("mempalace_closets", closets)
print(f"Re-imported: drawers={dr_n}, closets={cl_n}")

for sidecar in ("knowledge_graph.sqlite3", "knowledge_graph.sqlite3-shm", "knowledge_graph.sqlite3-wal"):
    src = os.path.join(PALACE, sidecar)
    if os.path.isfile(src):
        shutil.copy2(src, os.path.join(temp, sidecar))

shutil.rmtree(PALACE)
shutil.move(temp, PALACE)
print("Done.")

Suggested fix

Wrap the col.count() probe in a subprocess so SIGSEGV is contained at the process boundary:

import subprocess, sys

def probe_collection_count(palace_path: str, collection_name: str):
    """Returns int count, or None if probe segfaults/fails."""
    code = (
        f"from mempalace.backends.chroma import ChromaBackend; "
        f"cb=ChromaBackend(); col=cb.get_collection('{palace_path}', '{collection_name}'); "
        f"print(col.count())"
    )
    try:
        result = subprocess.run(
            [sys.executable, "-c", code],
            capture_output=True, text=True, timeout=15
        )
        if result.returncode == 0:
            return int(result.stdout.strip())
    except Exception:
        pass
    return None  # crashed → proceed with SQL extraction

Same pattern applies to mempalace status and MCP server startup.

Impact

Without this fix, every chromadb version bump that changes the HNSW index format leaves users with a palace that cannot be migrated, repaired, or read — all official tooling crashes before producing any output. The SQLite data is always intact but unreachable without a manual workaround like the script above.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions