Skip to content

HNSW link_lists.bin grows to terabytes, causes segfault and APFS orphaned blocks on macOS #525

@swesty

Description

@swesty

Summary

On macOS (Apple Silicon, APFS), the HNSW index file link_lists.bin can grow to 2.8TB (apparent) / ~1.7TB (allocated), causing segfaults on any search or mine operation. Deleting the corrupted file leaves orphaned APFS block allocations that can only be reclaimed via Recovery Mode Disk Utility First Aid.

Environment

  • macOS 15 (Darwin 25.3.0), Apple Silicon
  • mempalace 3.0.0 (also reproduced path to corruption on 3.1.0's underlying chromadb)
  • chromadb 0.6.3, chroma-hnswlib 0.7.6
  • Palace size: 53,222 drawers across 10,000+ rooms
  • Filesystem: APFS

Steps to reproduce

  1. Build a large palace (50K+ drawers) via repeated mempalace mine runs
  2. Over time, link_lists.bin in the HNSW directory grows unbounded
  3. ChromaDB logs Add of existing embedding ID: drawer_* warnings on every access — it's reconciling/re-adding existing embeddings into the HNSW graph
  4. Eventually mempalace search segfaults during HNSW index load

Observed behavior

$ mempalace search "test query"
Add of existing embedding ID: drawer_claude_sessions_technical_e9de31813119608b
Add of existing embedding ID: drawer_claude_sessions_technical_1e43ec1539179dfc
[... ~50 more warnings ...]
[1]    99847 segmentation fault  mempalace search "test query"

File sizes in the palace directory:

-rw-r--r--  2.8T  link_lists.bin      # HNSW index (corrupted/bloated)
-rw-r--r--   84M  data_level0.bin     # HNSW data (normal)
-rw-r--r--  276M  chroma.sqlite3      # All actual data (intact)

Secondary issue: APFS orphaned blocks

After deleting the corrupted HNSW directory and running mempalace repair, the APFS Data volume still reports ~1.9TB consumed while du only accounts for ~200GB. diskutil verifyVolume confirms corruption:

The volume /dev/rdisk3s5 was found to be corrupt and needs to be repaired

Live repair cannot reclaim the orphaned blocks — requires booting into Recovery Mode and running First Aid on the Data volume. This means users who hit this bug lose effective disk space until they do an offline filesystem repair.

Workaround

  1. Delete the HNSW index directory (the UUID-named folder inside the palace, NOT chroma.sqlite3)
  2. Delete any partial .backup from a failed mempalace repair
  3. Run mempalace repair to rebuild the index from SQLite
  4. Boot into Recovery Mode → Disk Utility → First Aid on the Data volume to reclaim orphaned APFS blocks

Suggestions

  • Guard against unbounded growth: Check link_lists.bin size relative to drawer count before/after mining. A 53K-drawer palace should have an HNSW index of ~100-200MB, not terabytes.
  • Repair command: Skip the full shutil.copytree backup when disk space is low — the SQLite DB is the source of truth and is untouched by the rebuild. Consider backing up only chroma.sqlite3 instead of the entire palace directory (which includes the bloated HNSW files).
  • upsert vs add: The "Add of existing embedding ID" warnings suggest embeddings are being added when they already exist. Using upsert or checking for existence first would prevent the HNSW graph from accumulating duplicate entries.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions