HNSW index bloat: link_lists.bin grows to 441GB when mining >10K drawers

## Summary

When mining a project with ~56K drawers, ChromaDB's HNSW index file `link_lists.bin` grows to **441GB apparent size (168GB on disk)** instead of the expected ~8MB. This caused two sequential system crashes due to memory and disk exhaustion before the problem was identified. The `mempalace status` command segfaults (exit 139) when attempting to load the corrupted index.

## Environment

- mempalace v3.0.0
- chromadb 1.5.7
- macOS Darwin 25.3.0

## Steps to Reproduce

1. Have a project with ~50K+ text files (or fewer files that chunk into 50K+ drawers)
2. Run `mempalace mine <project_dir>`
3. Observe `link_lists.bin` growing to hundreds of GB
4. `mempalace status` segfaults

Smaller mines (~10K drawers) work fine. The issue appears at scale.

## Root Cause

ChromaDB creates the HNSW collection with default parameters:
- `hnsw:batch_size = 100`
- `hnsw:sync_threshold = 1000`
- Initial capacity = 1000
- `resize_factor = 1.2`

When mempalace inserts drawers one at a time via `collection.upsert()`, each 1000 inserts trigger a `persistDirty()` call, and the index resizes ~30 times to accommodate 56K elements (1000 -> 1200 -> 1440 -> ... -> 56000+). The `persistDirty()` function uses relative seek positioning in `link_lists.bin`, and after many resize cycles the seek positions drift, causing macOS to extend the sparse file with zero-filled regions. Each resize+persist cycle compounds the problem.

Relevant upstream issues:
- chroma-core/chroma#6621 -- proposes `hnsw:initial_capacity` (not yet merged)
- nmslib/hnswlib#633 -- chunked array allocation (not yet merged)
- chroma-core/chroma#5925 -- user with 240K vectors reports failure loading HNSW index

## Impact

- **Disk exhaustion**: 441GB file on a laptop with limited SSD space
- **System crashes**: Two sequential crashes from memory/disk pressure before the cause was identified
- **Data loss risk**: The HNSW index stores the actual embedding vectors; when corrupted, they cannot be recovered from ChromaDB's SQLite (which only stores IDs, metadata, and documents)
- **Segfaults**: `mempalace status` and `mempalace search` crash when trying to load the bloated index

## Workaround

Delete the corrupted index and re-mine:
```bash
rm -rf ~/.mempalace/palace/*/
mempalace mine <project_dir>
```

## Fix

PR incoming -- two changes:
1. Pass `hnsw:batch_size=50000` and `hnsw:sync_threshold=50000` on collection creation to defer persistence until large batches complete
2. Batch per-file drawer upserts instead of inserting one drawer at a time, reducing the number of resize operations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HNSW index bloat: link_lists.bin grows to 441GB when mining >10K drawers #344

Summary

Environment

Steps to Reproduce

Root Cause

Impact

Workaround

Fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

HNSW index bloat: link_lists.bin grows to 441GB when mining >10K drawers #344

Description

Summary

Environment

Steps to Reproduce

Root Cause

Impact

Workaround

Fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions