HNSW link_lists.bin grows to terabytes, causes segfault and APFS orphaned blocks on macOS

## Summary

On macOS (Apple Silicon, APFS), the HNSW index file `link_lists.bin` can grow to **2.8TB** (apparent) / **~1.7TB** (allocated), causing segfaults on any `search` or `mine` operation. Deleting the corrupted file leaves orphaned APFS block allocations that can only be reclaimed via Recovery Mode Disk Utility First Aid.

## Environment

- macOS 15 (Darwin 25.3.0), Apple Silicon
- mempalace 3.0.0 (also reproduced path to corruption on 3.1.0's underlying chromadb)
- chromadb 0.6.3, chroma-hnswlib 0.7.6
- Palace size: 53,222 drawers across 10,000+ rooms
- Filesystem: APFS

## Steps to reproduce

1. Build a large palace (50K+ drawers) via repeated `mempalace mine` runs
2. Over time, `link_lists.bin` in the HNSW directory grows unbounded
3. ChromaDB logs `Add of existing embedding ID: drawer_*` warnings on every access — it's reconciling/re-adding existing embeddings into the HNSW graph
4. Eventually `mempalace search` segfaults during HNSW index load

## Observed behavior

```
$ mempalace search "test query"
Add of existing embedding ID: drawer_claude_sessions_technical_e9de31813119608b
Add of existing embedding ID: drawer_claude_sessions_technical_1e43ec1539179dfc
[... ~50 more warnings ...]
[1]    99847 segmentation fault  mempalace search "test query"
```

File sizes in the palace directory:
```
-rw-r--r--  2.8T  link_lists.bin      # HNSW index (corrupted/bloated)
-rw-r--r--   84M  data_level0.bin     # HNSW data (normal)
-rw-r--r--  276M  chroma.sqlite3      # All actual data (intact)
```

## Secondary issue: APFS orphaned blocks

After deleting the corrupted HNSW directory and running `mempalace repair`, the APFS Data volume still reports ~1.9TB consumed while `du` only accounts for ~200GB. `diskutil verifyVolume` confirms corruption:

```
The volume /dev/rdisk3s5 was found to be corrupt and needs to be repaired
```

Live repair cannot reclaim the orphaned blocks — requires booting into Recovery Mode and running First Aid on the Data volume. This means users who hit this bug lose effective disk space until they do an offline filesystem repair.

## Workaround

1. Delete the HNSW index directory (the UUID-named folder inside the palace, NOT `chroma.sqlite3`)
2. Delete any partial `.backup` from a failed `mempalace repair`
3. Run `mempalace repair` to rebuild the index from SQLite
4. Boot into Recovery Mode → Disk Utility → First Aid on the Data volume to reclaim orphaned APFS blocks

## Suggestions

- **Guard against unbounded growth**: Check `link_lists.bin` size relative to drawer count before/after mining. A 53K-drawer palace should have an HNSW index of ~100-200MB, not terabytes.
- **Repair command**: Skip the full `shutil.copytree` backup when disk space is low — the SQLite DB is the source of truth and is untouched by the rebuild. Consider backing up only `chroma.sqlite3` instead of the entire palace directory (which includes the bloated HNSW files).
- **`upsert` vs `add`**: The "Add of existing embedding ID" warnings suggest embeddings are being `add`ed when they already exist. Using `upsert` or checking for existence first would prevent the HNSW graph from accumulating duplicate entries.

## Related

- #74 (macOS ARM64 segfault)
- #100 (ChromaDB version pinning)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HNSW link_lists.bin grows to terabytes, causes segfault and APFS orphaned blocks on macOS #525

Summary

Environment

Steps to reproduce

Observed behavior

Secondary issue: APFS orphaned blocks

Workaround

Suggestions

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

HNSW link_lists.bin grows to terabytes, causes segfault and APFS orphaned blocks on macOS #525

Description

Summary

Environment

Steps to reproduce

Observed behavior

Secondary issue: APFS orphaned blocks

Workaround

Suggestions

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions