Skip to content

fix: purge stale drawers before re-mine to avoid hnswlib update-path race#523

Closed
StefanKremen wants to merge 1 commit intoMemPalace:developfrom
StefanKremen:fix/hnswlib-update-path-race
Closed

fix: purge stale drawers before re-mine to avoid hnswlib update-path race#523
StefanKremen wants to merge 1 commit intoMemPalace:developfrom
StefanKremen:fix/hnswlib-update-path-race

Conversation

@StefanKremen
Copy link
Copy Markdown

Summary

Fixes the hnswlib updatePoint / repairConnectionsForUpdate race documented in #521. One-hunk patch in mempalace/miner.py::process_file: delete any drawers for a source_file before re-inserting the fresh chunks, so modified-file re-mines never enter the update path.

Why only miner.py

mcp_server.py::tool_add_drawer uses a content-keyed drawer_id and has an existence pre-check at lines 355-361 that short-circuits before the upsert. Combined with single-item batches per MCP tool call, that path is structurally incapable of reaching updatePoint / repairConnectionsForUpdate under normal operation. No parallel fix needed.

Test plan

  • New regression test: mine a file, os.utime() the file, re-mine, assert no crash and new chunks replace old
  • pytest tests/ -v — full suite passes (534 passed, 106 deselected, 20.78s)
  • ruff check mempalace/miner.py — clean
  • ruff format --check mempalace/miner.py — clean
  • Manual reproduction: mine a directory, touch a file, re-mine — no segfault after patch

Not included

  • No changes to mcp_server.py (analysis in the commit message).
  • No changes to the on-disk index format or migration.
  • No new dependencies.

Fixes #521

…race

Modified-file re-mines previously upserted over existing deterministic
drawer_ids, which pushes ChromaDB through hnswlib.addItems ->
addPoint(existing_internal_id) -> updatePoint ->
repairConnectionsForUpdate. That path has a long-standing thread-safety
bug in nmslib/hnswlib where searchBaseLayer dereferences a neighbor
pointer being mutated by another worker thread. On macOS ARM64 with
chromadb 0.6.3 + Python 3.13 this reliably segfaults the mining
subprocess.

This commit converts modified-file re-mines into a clean
delete-by-source_file + insert, bypassing the update path entirely.

Note: mcp_server.py::tool_add_drawer (line 364) does NOT need the same
patch. Its drawer_id is content-keyed
(sha256(wing+room+content[:100])) and there is a pre-check at lines
355-361 that short-circuits already-existing IDs before reaching
upsert, so it only runs the pure INSERT path.

Fixes MemPalace#521
@web3guru888
Copy link
Copy Markdown

Clean targeted fix for the hnswlib update race. The one-hunk approach (delete-before-upsert in process_file) is the right minimal patch — it doesn't reach into HNSW internals, just eliminates the condition that triggers the update path.

The analysis on tool_add_drawer is correct: content-keyed drawer IDs with an existence pre-check make single-item MCP adds structurally safe. Worth adding that explicitly to the PR description as a note for reviewers who might wonder whether tool_add_drawer needs a parallel fix.

One test gap: the test plan marks the regression test as unchecked ([ ] New regression test: mine a file, os.utime() the file, re-mine, assert no crash and new chunks replace old). This is the most important test — without it the fix could regress silently. I'd want to see that test added before merge.

Also: #522 and #523 overlap on the same miner.py change. Both add delete-before-upsert logic, but #522 does it more comprehensively (all three strategies + CLI + MCP). If #522 merges first, #523 becomes a subset. If #523 merges first, #522 will have a conflict in miner.py. Worth flagging the coordination dependency — these should probably be merged in order or deduplicated before review.


MemPalace-AGI dashboard

milla-jovovich pushed a commit that referenced this pull request Apr 10, 2026
Delete existing drawers for a file before re-inserting fresh chunks.
Converts re-mines from upsert (hnswlib updatePoint path, thread-unsafe
on macOS ARM + chromadb 0.6.3) into delete+insert (safe addPoint path).

Credit: @StefanKremen (#523)

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@rusel95
Copy link
Copy Markdown

rusel95 commented Apr 11, 2026

FYI: #251 already implements this purge-before-re-insert pattern in process_file (from @igorls review item #1 — atomic per-file re-mine). It's part of a broader sync command but the core fix for the hnswlib update-path race is the same: delete existing drawers for the source_file before calling add_drawer() with the fresh chunks.

If #251 lands, this race is fixed as a side effect. If #523 lands first, the narrower scope here is fine and #251 just needs a rebase.

@bensig bensig changed the base branch from main to develop April 11, 2026 22:21
@igorls igorls added area/mining File and conversation mining bug Something isn't working labels Apr 14, 2026
@StefanKremen
Copy link
Copy Markdown
Author

Closing as superseded.

The delete-before-upsert pattern proposed here landed upstream via a different PR — #784 (merged 2026-04-13, shipped v3.3.0) wrapped both miners' delete+insert critical sections in per-file mine_lock, with collection.delete(where={"source_file": ...}) before collection.upsert(...). Same pattern, broader coverage (both miner.py and convo_miner.py), plus cross-process safety via flock.

Additional reinforcements since:

Closing without merge — the fix is in main, just not from this branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/mining File and conversation mining bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

hnswlib updatePoint race on modified-file re-mine: EXC_BAD_ACCESS in repairConnectionsForUpdate (macOS ARM64, Python 3.13, chromadb 0.6.3)

4 participants