feat(backends): quarantine_stale_hnsw — recover from HNSW/sqlite drift (closes #823) by jphein · Pull Request #1000 · MemPalace/mempalace

jphein · 2026-04-18T17:03:52Z

Summary

Adds a quarantine_stale_hnsw(palace_path, stale_seconds=3600) helper that renames HNSW segment directories whose data_level0.bin is significantly older than chroma.sqlite3. Drift between the on-disk HNSW graph and the live embeddings table in sqlite is the root cause of the bug #823 describes — and, under heavier load or after an interrupted write, the same drift escalates from "silent stale results" to a SIGSEGV where the Rust graph-walk dereferences dangling neighbor pointers for entries that exist in the metadata segment but not in the HNSW index.

Closes #823.

The drift, in data

On one 135K-drawer palace where the bug reproduced, index_metadata.pickle claimed total_elements_added: 137813 against SELECT COUNT(*) FROM embeddings = 135464 — a 2,349-entry drift. Fresh Python processes crashed in col.count() on 17/20 attempts:

import chromadb
c = chromadb.PersistentClient(path=\"/palace\")
col = c.get_collection(\"mempalace_drawers\")
col.count()  # SIGSEGV, background thread, chromadb_rust_bindings.abi3.so

After renaming the segment dir out of the way and letting ChromaDB rebuild lazily, the same 20-run check went to 0 crashes and queries returned reasonable distances against the same underlying drawers.

Same failure mode is tracked publicly at neo-cortex-mcp#2 (chromadb 1.5.5, Python 3.12, SIGSEGV on .count() with the deep-recursion stack signature) and acknowledged at chroma-core/chroma#2594 (HNSW drift is by-design under default sync_threshold).

Why not #823's suggested recovery

#823 suggests "export all drawers, recreate the collection with sync_threshold=100, re-import." That works, but it re-embeds every drawer — expensive on a 100K+ palace and requires the collection to be readable in the first place, which a drifted palace sometimes isn't. This helper is lighter:

Rename the drifted segment dir so ChromaDB reopens without it.
ChromaDB writes a fresh minimal segment on next use; the underlying embeddings in chroma.sqlite3 are untouched.
The renamed directory is not deleted — it's moved to <uuid>.drift-YYYYMMDD-HHMMSS so operators can recover if the heuristic misfires.

Heuristic

If chroma.sqlite3 is more than stale_seconds (default 3600) newer than the segment's data_level0.bin, the segment is considered suspect. One hour is deliberately conservative — normal HNSW flush cadence is seconds to minutes, so an hour of drift implies a crashed mid-write, not routine lag.

Scope

Additive only. Exposes quarantine_stale_hnsw() as a helper. Intentionally not wired into _client() / startup on this PR — the goal is to land the primitive first so operators and higher layers can opt in. A natural follow-up is to call it automatically on palace open behind an env var or a config flag; happy to do that in a second PR once this shape is agreed.
No conflict with fix: HNSW graph corruption, PreCompact deadlock, mine fan-out (closes #974, #965, #955) #976 (INSERT-path HNSW race, hnsw:num_threads=1) or fix: degrade gracefully when filtered Chroma search fails #951 (graceful fallback when filtered queries fail) — those are sibling failure modes. This PR addresses the READ-path crash on drifted segments, which neither of them touches.
Relates to ChromaDB: HNSW persistence vs add_drawer — default sync_threshold can leave semantic search stale #823 as the recovery primitive. Tuning sync_threshold (the other half of ChromaDB: HNSW persistence vs add_drawer — default sync_threshold can leave semantic search stale #823) can still be done separately and remains a reasonable addition.

Tests

Four new tests in tests/test_backends.py:

test_quarantine_stale_hnsw_renames_drifted_segment — 2h drift: renamed, original files preserved under .drift-TS suffix.
test_quarantine_stale_hnsw_leaves_fresh_segment_alone — 10s drift: not touched.
test_quarantine_stale_hnsw_no_palace — missing palace / missing chroma.sqlite3: returns [], no raise.
test_quarantine_stale_hnsw_skips_already_quarantined — directories already carrying the .drift- suffix are never re-renamed.

pytest tests/test_backends.py → 11 passed. ruff check / ruff format --check — clean.

Credit

Diagnosis and fix worked out in jphein/mempalace@fdfacdf while chasing the read-path SIGSEGV on a production palace. The mtime heuristic is a deliberate compromise (no pickle.load on index_metadata.pickle — linting policy in the downstream didn't allow it); happy to make it authoritative (read total_elements_added vs sqlite count) on the upstream path if you'd prefer.

Copilot

Pull request overview

Adds an operator-facing recovery primitive to mitigate ChromaDB HNSW-on-disk drift vs chroma.sqlite3 by quarantining suspected-stale HNSW segment directories, addressing the read-path crash/staleness failure mode described in #823.

Changes:

Introduce quarantine_stale_hnsw(palace_path, stale_seconds=3600) to rename stale HNSW segment dirs to a .drift-YYYYMMDD-HHMMSS suffix based on mtime skew vs chroma.sqlite3.
Add pytest coverage for renaming behavior, no-op behavior on fresh segments, missing-palace safety, and skipping already-quarantined directories.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`mempalace/backends/chroma.py`	Adds the `quarantine_stale_hnsw` helper that quarantines stale HNSW segments via renaming.
`tests/test_backends.py`	Adds tests validating the quarantine heuristic and safety/no-op cases.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-18T17:06:31Z

+def quarantine_stale_hnsw(palace_path: str, stale_seconds: float = 3600.0) -> list:
+    """Rename HNSW segment dirs whose files are stale vs. chroma.sqlite3.
+
+    When a ChromaDB 1.5.x PersistentClient opens a palace whose on-disk
+    HNSW segment is significantly older than ``chroma.sqlite3``, the Rust
+    graph-walk can dereference dangling neighbor pointers for entries that
+    exist in the metadata segment but not in the HNSW index, and segfault
+    in a background thread on the next ``count()`` or ``query(...)`` call.
+
+    This is the same failure mode reported at #823 (semantic search stale
+    after ``add_drawer``), observed at neo-cortex-mcp#2 (SIGSEGV on
+    ``count()`` with chromadb 1.5.5), and acknowledged as by-design at
+    chroma-core/chroma#2594. On one fork palace (135K drawers), the drift
+    caused a 65–85% crash rate on fresh-process opens; fresh-process
+    crash rate dropped to 0% after the segment dir was renamed out of the
+    way and ChromaDB rebuilt lazily.
+
+    Heuristic: if ``chroma.sqlite3`` is more than ``stale_seconds`` newer
+    than the segment's ``data_level0.bin``, the segment is considered
+    suspect and renamed to ``<uuid>.drift-<timestamp>``. ChromaDB reopens
+    cleanly without it and writes fresh index files on next use. The
+    original directory is renamed, not deleted, so recovery remains
+    possible if the heuristic misfires.
+
+    The default threshold (1h) is deliberately conservative — ChromaDB's
+    HNSW flush cadence means legitimate drift is normally on the order of
+    seconds to minutes. A segment that is more than an hour out of date is
+    almost certainly in a "crashed mid-write" state.
+
+    Args:
+        palace_path: path to the palace directory containing ``chroma.sqlite3``
+        stale_seconds: minimum mtime gap to treat a segment as stale
+
+    Returns:
+        List of paths that were quarantined (empty if nothing drifted).
+    """
+    db_path = os.path.join(palace_path, "chroma.sqlite3")
+    if not os.path.isfile(db_path):
+        return []
+    try:
+        sqlite_mtime = os.path.getmtime(db_path)
+    except OSError:
+        return []
+
+    moved: list = []
+    try:


Type hints here are overly generic: the function returns a list of quarantined path strings, but the signature uses -> list and moved: list = []. Elsewhere in the repo newer built-in generics are used (e.g. list[str]). Consider updating this to -> list[str] and moved: list[str] = [] (and optionally stale_seconds: float already matches).

jphein · 2026-04-18T17:38:16Z

Thanks @copilot-pull-request-reviewer — addressed in c6413fd: tightened the signature to -> list[str] and the local to moved: list[str] = [] to match PEP 585 usage elsewhere in the repo (e.g. exporter.py, sweeper.py).

@bensig

README: - Bump drawer count 134K → 137K (matches current palace) - Update "Open upstream PRs" table with actual reviewer status, add MemPalace#999 and MemPalace#1000, drop stale "rebased against MemPalace#863" note since MemPalace#863 is in v3.3.1 CLAUDE.md: - Rename section header v3.3.0 → v3.3.1 (we merged v3.3.1 earlier today) - Add fork changes 11-13: None-metadata guards, .blob_seq_ids_migrated marker, quarantine_stale_hnsw - Replace "Pulled in from upstream v3.3.1" summary: i18n, BCP-47 locales, UTF-8 Path.read_text, non-blocking precompact (MemPalace#863), silent_save honoring (MemPalace#966) — ours still broader - Update the open-upstream-PRs table to 7 open, include MemPalace#999 + MemPalace#1000 and refresh each row's status line to match what's actually on GitHub (MERGEABLE flags, @bensig ping context, Copilot review addressed, etc.) Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

jphein · 2026-04-18T19:07:57Z

Rebased onto develop after #995 merged (055386e). Integration was clean — RFC 001's _validate_where + _fix_blob_seq_ids kept from upstream, quarantine_stale_hnsw() added alongside with list[str] type hints matching the new typed-results style. Dropped my earlier one-line style commit since the type hint was already right in the rebased placement. Tests still pass (29/29 on test_backends.py).

README: - Fork-changes table: expand the None-metadata row to cover all 8 sites (searcher.py CLI + API + closet-boost, miner.status, 4 mcp_server handlers). Previous row only called out the CLI print path. - Add a new Search row: warnings + sqlite BM25 top-up contract (the "never silent miss" feature) with pointer to MemPalace#951 + MemPalace#823. - Open-PR table: expand MemPalace#999 scope line to mention 8 sites + architectural note, update MemPalace#1000 to reflect post-MemPalace#995 rebase, add MemPalace#1005 with Copilot fixes noted. CLAUDE.md: - PR status header: 7 open -> 8 open (adds MemPalace#1005). - Same PR row updates as README for MemPalace#999/MemPalace#1000/MemPalace#1005. - Fork Changes list: expand entry 11 (None guards) to 8 sites + adapter consolidation proposal on MemPalace#999; add entry 14 for the warnings+BM25 feature; keep 12 and 13 as-is. 42 README-claim tests still pass.

Concrete evidence: - New "What this looks like in practice" section right after the status line, showing a real stop-hook systemMessage output and a real mempalace_search response shape (warnings, available_in_scope, matched_via tags, similarity scores). Someone evaluating the fork can see what "running in production" actually looks like. Headlines box: - New "Headlines" subsection at the top of Fork Changes with the three differentiators someone should know if they only read one section — silent-save hook, ChromaDB 1.5.x hardening (quarantine + None guards), search-never-misses contract. Links to MemPalace#673/MemPalace#999/ MemPalace#1000/MemPalace#1005 so readers can jump to the work itself. Citations for comparison table: - Every row now links to its upstream repo: Hindsight (vectorize-io), Mem0 + OpenMemory (mem0ai), Cognee (topoteretes), Letta (letta-ai), engram (NickCirv), CaviraOSS OpenMemory. Cognee row updated since they've added MCP support since we first wrote the row. - Replaced the "Systems mentioned without captured primary URLs" footnote (now stale since we have them) with a "Verification note" that's honest about the point-in-time nature of these columns and explains why TagMem is absent. Structure cleanups: - Removed the upstream MemPalace logo at the top — it's milla- jovovich's asset and using it in a fork README is awkward. - Renamed "Roadmap" → "Planned work" — the section is decisive with priorities and time estimates, "Roadmap" was underselling it. No content removed from the document beyond the stale footnote and the upstream logo. All existing sections intact. 42 README-claim tests pass.

Every remaining row in "Still ahead of upstream" now carries a status so the reader can tell at a glance whether each change is being upstreamed, pending a PR, or deliberately fork-local. Dropped: - "Guard ChromaDB 1.5.x metadata-mismatch segfault" — this row was overstated. The memory file for today's debugging notes that the try-get/except-create pattern is defensive code that never reproduced a specific crash (the actual crashes traced to HNSW drift). Leaving it in "Still ahead" implied an upstream-candidate fix, which it isn't. Code stays in place as defensive, but the README no longer claims it as a fork-ahead feature. Moved to Superseded: - "Stale HNSW mtime detection + mempalace_reconnect" — upstream took a different approach in MemPalace#757. Our broader inode+mtime detection and the mempalace_reconnect MCP tool remain as fork-local convenience; they're just not "ahead of upstream" anymore. Statuses now populated: - Linked PR number for the 7 changes with active upstream PRs (MemPalace#659, MemPalace#660, MemPalace#661, MemPalace#673 with APPROVED note, MemPalace#999, MemPalace#1000, MemPalace#1005). - "PR pending" for 3 items that are good candidates but unfiled: epsilon mtime comparison, max_distance parameter, tool output mining. - "fork-only" for 2 items we keep intentionally without pitching upstream: .blob_seq_ids_migrated marker (narrow), bulk_check_mined (complementary to upstream's MemPalace#784 file-locking). Legend sentence added above the table explains the three status values. 42 README-claim tests pass.

Dialectician

Read the diff. Logic is right: compare data_level0.bin mtime to chroma.sqlite3 mtime, rename the segment directory if the gap exceeds threshold. Rename-not-delete is the right call so recovery is reversible by hand if the heuristic ever misfires. Default stale_seconds=3600.0 is conservative enough that a quick write-then-read cycle will not misclassify a fresh segment.

Tests cover the two main branches (drifted segment renamed, fresh segment untouched) with clean fixtures. Closes #823 credibly.

Two small questions, both follow-ups not blockers:

Multi-segment palaces: the loop iterates all subdirs with data_level0.bin, but if two segments drift, the second rename could collide with a name from the first. Probably safe because quarantine-<ts> is unique per call, worth a quick verification.
Logging: logger.exception on rename failure is right. Might also log the sqlite/hnsw mtime delta when quarantining, so postmortem grep tells you how stale the segment was.

Peace B.Sweet

jphein · 2026-04-19T00:57:31Z

Addressed Copilot type hint nit: -> list → -> list[str] and moved: list = [] → moved: list[str] = [] to match the repo's use of built-in generics elsewhere.

jphein · 2026-04-19T00:59:45Z

Thanks for reading the diff carefully — the rename-not-delete design was exactly the choice I sweated over most, so good to have it independently confirmed. The 1h default being conservative-by-design is also appreciated; a false positive here (quarantining a healthy segment) would be annoying to recover from, so erring cautious is the right call.

Fixed the list[str] nit.

Addresses Copilot nit on MemPalace#1000. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

…review addressed Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

Add a helper that renames HNSW segment directories whose `data_level0.bin` is significantly older than `chroma.sqlite3`. Drift between the on-disk HNSW graph and the live embeddings table is the root cause of a segfault class where the Rust graph-walk dereferences dangling neighbor pointers for entries in the metadata segment that no longer exist in the HNSW index, crashing in a background thread on `count()` or `query()`. Issue MemPalace#823 describes the same drift as a silent-staleness symptom (semantic search returns stale results after `add_drawer` because `data_level0.bin` lags the sqlite metadata under the default `sync_threshold=1000`). Under heavier load or after an interrupted write, the same drift can escalate from "silent stale results" to "SIGSEGV on next open," which is the failure mode observed at neo-cortex-mcp#2 (chromadb 1.5.5, Python 3.12) and acknowledged at chroma-core/chroma#2594. On one 135K-drawer palace where `index_metadata.pickle` claimed 137,813 elements against 135,464 rows in sqlite (2,349-entry drift), fresh Python processes crashed in `col.count()` 17/20 times; after renaming the segment dir out of the way and letting ChromaDB rebuild lazily, the same 20-run check went to 0 crashes. The recovery path MemPalace#823 suggests (export / recreate / reimport) is heavy — it re-embeds every drawer. This helper is lighter: rename the segment dir so ChromaDB reopens without it, and the indexer rebuilds lazily on the next write. The original directory is renamed (not deleted) so the operator can recover if the heuristic misfires. If `chroma.sqlite3` is more than `stale_seconds` (default 3600) newer than the segment's `data_level0.bin`, the segment is considered suspect. One hour is deliberately conservative — normal HNSW flush cadence is seconds to minutes, so an hour of drift implies a crashed mid-write, not routine lag. - Additive: exposes `quarantine_stale_hnsw(palace_path, stale_seconds)` as a helper. Not wired into `_client()` / startup on this PR — the goal is to land the primitive first so operators and higher layers can opt in. A follow-up could call it automatically on palace open behind an env var or config flag. - Closes MemPalace#823 by giving operators a first-class recovery path without having to install `chromadb-ops` or re-mine. Four new tests in `tests/test_backends.py`: - renames drifted segment, preserves original files under `.drift-TS` suffix - leaves fresh segments alone - no-op on missing palace path / missing `chroma.sqlite3` - skips already-quarantined (`.drift-` suffixed) directories `pytest tests/test_backends.py` → 11 passed. `ruff check` / `ruff format --check` — clean.

bensig · 2026-04-19T06:20:36Z

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

…emPalace#1036 Overnight/morning: - MemPalace#681, MemPalace#1000, MemPalace#1023 merged — moved from "Still ahead" to "Merged upstream (post-v3.3.1)" - bensig reviewed MemPalace#659 (wing_ prefix + agent filter) and MemPalace#1021 (silent_guard default) — both addressed on their PR branches - MemPalace#673 needed re-rebase after overnight develop merges; done - MemPalace#1036 filed: paginate miner.status(), closes upstream MemPalace#802 and MemPalace#1015 Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Version bumps across pyproject.toml, mempalace/version.py, README badge, uv.lock, and plugin manifests (.claude-plugin/*, .codex-plugin/*). CHANGELOG aligned with main (post-3.3.1) and a new [3.3.2] section added covering the 11 PRs merged on develop since v3.3.1 — silent-transcript-drop fix + tandem sweeper (#998), None-metadata guards (#999, #1013), chromadb ≥1.5.4 for Py 3.13/3.14 (#1010), Windows Unicode (#681), HNSW quarantine recovery (#1000), PID stacking guard (#1023), doc-path cleanup (#996, #1012), and RFC 001/002 internal scaffolding (#995, #1014, #990).

Resolutions: - `.claude-plugin/.mcp.json`, `plugin.json` — adopt upstream's `mempalace-mcp` console-script command (added via upstream MemPalace#340 for pipx/uv). Run `pip install -e .` in plugin venv after merge to install the entry point. - `.claude-plugin/hooks/*.sh`, `hooks/*.sh` — adopt upstream's console-command resolution order (`mempalace` script → `python3 -m mempalace` → `python`). `MEMPAL_PYTHON` override still works inside `hooks_cli.py`. - `mempalace/hooks_cli.py`, `tests/test_hooks_cli.py` — keep fork's `_mempalace_python()` helper (fork-ahead #4); upstream only had `sys.executable`, which loses MEMPAL_PYTHON override. - `mempalace/miner.py` — keep fork's concurrent mining path (fork-ahead #1), apply upstream's unicode-`✓` → ASCII-`+` fix (MemPalace#681) to both paths. - `mempalace/backends/chroma.py` — take upstream's refined `quarantine_stale_hnsw` docstring (it's the version merged via our own MemPalace#1000). Brought in: 33 upstream commits including Belarusian/Chinese/German/Spanish/French entity detection, console-script entry points, hook plugin-root space quoting, and v3.3.2 tag (which contains our MemPalace#681/MemPalace#1000/MemPalace#1023). Tests: 1096 passed, 106 deselected (benchmarks). Ruff clean.

…#1023 merged, MemPalace#673 rebased - Bump fork-ahead section header to "after v3.3.2" - Strike #11 (quarantine_stale_hnsw) and MemPalace#18 (PID file guard) as merged-into-upstream-via-v3.3.2, keep entries for traceability - Add v3.3.2-shipped items to "Merged into upstream (post-v3.3.1)" - Rebuild PR table: 10 merged / 7 open / 7 closed; add MemPalace#1024 row, reclassify MemPalace#681/MemPalace#1000/MemPalace#1023 as merged, note MemPalace#673 rebased 2026-04-21 - Annotate MemPalace#661 status with the GitHub review-state machine caveat (CHANGES_REQUESTED persists until reviewer dismisses, not owed) - Bump test count 1063 → 1096 post-merge

…unts - Upstream released v3.3.2 on 2026-04-21 (our MemPalace#681/MemPalace#1000/MemPalace#1023) - Drawer count 152,682 → 165,632; wings 23 → 28; tests 1063 → 1096 - MemPalace#673 now MERGEABLE after fresh rebase + squash to 1 commit - MemPalace#661 status clarified: CHANGES_REQUESTED persists until reviewer dismisses, not an outstanding owe - Merged-upstream section split out v3.3.2 release group

Port quarantine_stale_hnsw from upstream MemPalace#1000 (0c38dea) and wire it into _client() / make_client() so palaces with drifted HNSW segments self-recover on open instead of segfaulting in the Rust graph walk. Hit on personal-v2 today: 125k-embedding palace, three segment dirs with data_level0.bin 8-9 days older than chroma.sqlite3 after an interrupted write. Every fresh client open SIGSEGV'd in chromadb_rust_bindings at the same offset (KERN_INVALID_ADDRESS at 0x44, dangling neighbor pointer in HNSW graph walk). Renaming the segment dirs out of the way let chroma rebuild lazily from sqlite with zero data loss. Differs from upstream by being on-by-default rather than operator-opt-in; opt out with MEMPALACE_HNSW_NO_QUARANTINE=1. The stat-call overhead is microseconds; the cost of skipping is a hard crash on the next query() or count(). Threshold unchanged at 3600s — legitimate HNSW flush lag is seconds to minutes, so an hour of drift implies a crashed mid-write. Tests: 11/11 in test_backends.py (4 new for the helper).

jphein requested review from bensig, igorls and milla-jovovich as code owners April 18, 2026 17:03

Copilot AI review requested due to automatic review settings April 18, 2026 17:03

Copilot started reviewing on behalf of jphein April 18, 2026 17:04 View session

Copilot AI reviewed Apr 18, 2026

View reviewed changes

jphein mentioned this pull request Apr 18, 2026

feat(searcher): warnings + sqlite BM25 top-up when vector underdelivers #1005

Open

jphein force-pushed the fix/quarantine-stale-hnsw branch from c6413fd to 055386e Compare April 18, 2026 19:07

Dialectician reviewed Apr 19, 2026

View reviewed changes

jphein added a commit to jphein/mempalace that referenced this pull request Apr 19, 2026

fix(backends): use list[str] type hints in quarantine_stale_hnsw

8349bb9

Addresses Copilot nit on MemPalace#1000. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

jphein added a commit to jphein/mempalace that referenced this pull request Apr 19, 2026

docs: update README — MemPalace#1000 and MemPalace#1005 Dialectician …

590acae

…review addressed Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

jphein force-pushed the fix/quarantine-stale-hnsw branch from 055386e to 0c38dea Compare April 19, 2026 01:05

bensig merged commit caf503f into MemPalace:develop Apr 19, 2026
6 checks passed

igorls mentioned this pull request Apr 19, 2026

release: v3.3.2 #1041

Merged

8 tasks

jphein mentioned this pull request Apr 21, 2026

mempalace status crashes on palaces >~32K drawers (SQLite variable limit) #1015

Closed

messelink mentioned this pull request Apr 22, 2026

Wire quarantine_stale_hnsw into backend open path (follow-up to #1000) #1108

Closed

marcel10100 mentioned this pull request Apr 23, 2026

Runaway HNSW index explosion (582 GB link_lists.bin) on add_drawer to new room in large wing #1091

Open

dylanneve1 mentioned this pull request Apr 23, 2026

fix(mempalace): require >= 3.3.2 for PID guard + HNSW crash fix dylanneve1/talon#83

Closed

4 tasks

jphein mentioned this pull request Apr 24, 2026

fix: skip _fix_blob_seq_ids sqlite open on already-migrated palaces (#1090) #1177

Merged

3 tasks

potterdigital mentioned this pull request Apr 25, 2026

fix(search): hint at mempalace repair when filtered query hits HNSW index mismatch (#1035) #1081

Open

4 tasks

jphein deleted the fix/quarantine-stale-hnsw branch April 25, 2026 14:53

This was referenced Apr 25, 2026

mempalace mine --mode convos segfaults after chromadb upgrade (stale HNSW index) #1132

Closed

mempalace migrate/status crash with SIGSEGV on chromadb version mismatch — palace unrecoverable via official tooling #1218

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(backends): quarantine_stale_hnsw — recover from HNSW/sqlite drift (closes #823)#1000

feat(backends): quarantine_stale_hnsw — recover from HNSW/sqlite drift (closes #823)#1000
bensig merged 1 commit intoMemPalace:developfrom
jphein:fix/quarantine-stale-hnsw

jphein commented Apr 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

jphein commented Apr 18, 2026

Uh oh!

jphein commented Apr 18, 2026

Uh oh!

Dialectician left a comment

Uh oh!

jphein commented Apr 19, 2026

Uh oh!

jphein commented Apr 19, 2026

Uh oh!

bensig commented Apr 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jphein commented Apr 18, 2026

Summary

The drift, in data

Why not #823's suggested recovery

Heuristic

Scope

Tests

Credit

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

jphein commented Apr 18, 2026

Uh oh!

jphein commented Apr 18, 2026

Uh oh!

Dialectician left a comment

Choose a reason for hiding this comment

Uh oh!

jphein commented Apr 19, 2026

Uh oh!

jphein commented Apr 19, 2026

Uh oh!

bensig commented Apr 19, 2026

Code review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants