fix(searcher): guard against None metadata in CLI print path by jphein · Pull Request #999 · MemPalace/mempalace

jphein · 2026-04-18T17:01:15Z

Summary

col.query(...) can return None entries in the inner metadatas list for drawers whose metadata was never set — older palaces, or rows written outside the normal mining path. When that happens, the CLI search() function renders earlier results successfully, then crashes mid-loop with:

AttributeError: 'NoneType' object has no attribute 'get'

at searcher.py:286 (meta.get("source_file", "?")). The user sees partial output followed by a traceback, with no indication of which drawers rendered and which were skipped.

Fix

One-line guard inside the loop: meta = meta or {} so entries with missing metadata fall back to the existing "?" defaults. Matches the way search_memories() assembles hit dicts against the same data (meta.get("wing", "unknown") etc.).

Test

Added a regression test (test_search_handles_none_metadata_without_crash) that mocks a ChromaDB result with a None entry in the middle of the inner metadatas list and asserts both result blocks render. Verified the test fails without the guard (reproduces the crash) and passes with it.

Scope

Independent of — but complementary to — #951 which handles a different search failure mode (ChromaDB filter planner returning Error finding id on migrated palaces). #951 wraps the query call; this one wraps the result rendering.

Testing

pytest tests/test_searcher.py — 20 passed (up from 19)
Regression-test verification: temp-reverted the one-line guard, ran just the new test, confirmed AttributeError
ruff check / ruff format --check — clean

`col.query(...)` can return `None` entries in the inner ``metadatas`` list for drawers whose metadata was never set (older palaces, rows written outside the normal mining path). The CLI `search()` function would render earlier results successfully and then crash mid-loop with: AttributeError: 'NoneType' object has no attribute 'get' at ``searcher.py:286`` — ``meta.get("source_file", "?")``. The user sees partial output followed by a traceback, with no indication of which drawers rendered OK and which were skipped. Guard with ``meta = meta or {}`` inside the loop so entries with missing metadata fall back to the existing ``"?"`` defaults instead of crashing, matching the hit dict assembly in ``search_memories()`` which already uses ``meta.get("wing", "unknown")`` etc. against the same data. Adds a regression test that mocks a ChromaDB result with a ``None`` metadata entry in the middle of the inner list and asserts both result blocks render to stdout.

Copilot

Pull request overview

Guards the CLI search() render loop against None entries in ChromaDB’s returned metadatas, preventing mid-output crashes and adding a regression test.

Changes:

Add a one-line meta = meta or {} guard in mempalace.searcher.search() before accessing metadata keys.
Add a regression test that mocks a ChromaDB result containing a None metadata entry and asserts both result blocks render.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`mempalace/searcher.py`	Prevents `AttributeError` during CLI result rendering when a metadata entry is `None`.
`tests/test_searcher.py`	Adds a regression test ensuring CLI search output doesn’t crash and prints all results when metadata contains `None`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-18T17:03:55Z

    for i, (doc, meta, dist) in enumerate(zip(docs, metas, dists), 1):
        similarity = round(max(0.0, 1 - dist), 3)
+        meta = meta or {}
        source = Path(meta.get("source_file", "?")).name
        wing_name = meta.get("wing", "?")


The CLI guard handles meta is None, but search_memories() in the same module still calls meta.get(...) (and cmeta.get(...) for closet hits) without guarding against None metadata returned by Chroma. If metadatas contains None, the API path will still raise AttributeError. Consider applying the same meta = meta or {} (or an isinstance(meta, dict) check) in the search_memories() loops as well, and adding a regression test for the API function if this is an expected real-world state.

`status()` walks `col.get(include=["metadatas"])` and buckets each drawer into a `wing_rooms[wing][room]` histogram. The same ChromaDB return shape fixed in the search print path — `None` entries in the `metadatas` list for drawers with no stored metadata — crashes the status command with: AttributeError: 'NoneType' object has no attribute 'get' Applies the matching ``m = m or {}`` guard so None-metadata drawers roll up under the existing `?/?` fallback bucket instead of killing the command mid-tally. Reproduced on a 135K-drawer palace where two drawers had `metadata=None`; both now show under `WING: ? / ROOM: ?` in the tally while the command prints the full histogram as designed. Adds a regression test that feeds `status()` a fake collection whose `get()` returns a `None` in the middle of the metadatas list and asserts both the fallback bucket and the real wing render.

jphein · 2026-04-18T17:26:26Z

Expanded scope in feba7e8 — same AttributeError: 'NoneType' object has no attribute 'get' hit again today in miner.status() on a production palace with 2 None-metadata drawers. Added the matching m = m or {} guard at the histogram loop and a regression test that feeds status a fake collection whose get() returns a None in the metadatas list.

Both spots are the same class of bug — ChromaDB can surface None entries in the metadatas inner list for drawers whose metadata was never stored, and the two read paths (searcher print, status histogram) both ran .get() on the entry unconditionally. Happy to split into two PRs if you'd prefer one-concern-per-PR.

Per Copilot review on the CLI-only PR (MemPalace#999): search_memories() has the same vulnerability in two additional spots, since ChromaDB can return None entries in the inner metadatas list for either the drawer query or the closets query. Without guards, the API path crashes with: AttributeError: 'NoneType' object has no attribute 'get' at either \`cmeta.get("source_file", "")\` in the closet boost lookup or \`meta.get("source_file", "") or ""\` in the drawer scoring loop. Applies the matching \`meta = meta or {}\` / \`cmeta = cmeta or {}\` guard at both sites and adds an API-path regression test that mocks a drawer query result with a None metadata entry and asserts both hits render — the None-metadata hit with the existing \`"unknown"\` sentinel values the scoring loop already writes for missing keys. Verified both the new API test and the existing CLI test fail without the guards (AttributeError) and pass with them.

jphein · 2026-04-18T17:37:13Z

Thanks @copilot-pull-request-reviewer — good catch. Addressed in 7690574: added the same meta = meta or {} / cmeta = cmeta or {} guards to the two call sites in search_memories() (the closet boost lookup at the cmeta.get line and the drawer scoring loop at the meta.get line). Added an API-path regression test (test_search_memories_handles_none_metadata) that mocks a drawer result with a None metadata entry and asserts both hits render, with the None-metadata hit using the existing "unknown" sentinels. Verified it fails without the guard.

@bensig

README: - Bump drawer count 134K → 137K (matches current palace) - Update "Open upstream PRs" table with actual reviewer status, add MemPalace#999 and MemPalace#1000, drop stale "rebased against MemPalace#863" note since MemPalace#863 is in v3.3.1 CLAUDE.md: - Rename section header v3.3.0 → v3.3.1 (we merged v3.3.1 earlier today) - Add fork changes 11-13: None-metadata guards, .blob_seq_ids_migrated marker, quarantine_stale_hnsw - Replace "Pulled in from upstream v3.3.1" summary: i18n, BCP-47 locales, UTF-8 Path.read_text, non-blocking precompact (MemPalace#863), silent_save honoring (MemPalace#966) — ours still broader - Update the open-upstream-PRs table to 7 open, include MemPalace#999 + MemPalace#1000 and refresh each row's status line to match what's actually on GitHub (MERGEABLE flags, @bensig ping context, Copilot review addressed, etc.) Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Same class of bug as #1007: ChromaDB's query() can return None in the documents and metadatas arrays when a drawer's HNSW vector entry exists but its metadata/document rows haven't been materialized. The code in Layer3.search_raw (mempalace/layers.py) calls meta.get("wing", ...), meta.get("room", ...), meta.get("source_file", ...) directly without null safety, so it raises: AttributeError: 'NoneType' object has no attribute 'get' Two-line defensive coercion matching the pattern in #1009 / PR #999 for searcher.py: meta = meta or {}, doc = doc or "". The hit still appears with its real distance; source/wing/room fall back to their fallback values where the metadata row is missing. Frequently hit on chromadb 1.5.x (root cause #1006). Even after the chromadb floor lands (#1010), partial-state results remain possible during interrupted mines and schema upgrade boundaries, so the guard is worth having on its own. Fixes #1011.

Per Copilot review on the CLI-only PR (MemPalace#999): search_memories() has the same vulnerability in two additional spots, since ChromaDB can return None entries in the inner metadatas list for either the drawer query or the closets query. Without guards, the API path crashes with: AttributeError: 'NoneType' object has no attribute 'get' at either \`cmeta.get("source_file", "")\` in the closet boost lookup or \`meta.get("source_file", "") or ""\` in the drawer scoring loop. Applies the matching \`meta = meta or {}\` / \`cmeta = cmeta or {}\` guard at both sites and adds an API-path regression test that mocks a drawer query result with a None metadata entry and asserts both hits render — the None-metadata hit with the existing \`"unknown"\` sentinel values the scoring loop already writes for missing keys. Verified both the new API test and the existing CLI test fail without the guards (AttributeError) and pass with them.

…t None metadata Four more MCP handlers iterate a metadata list and call m.get(...) unconditionally. When the cache contains a None entry (drawers with no metadata, common on older mining paths), the try block catches the AttributeError and marks the response "partial: true" with an error message — visible as {"error": "'NoneType' object has no attribute 'get'", "partial": true} returned from mempalace_status even though the palace data is otherwise fetchable. Same m = m or {} guard we applied to searcher.py (d3a2d22, a51c3c2) and miner.status() (66f08a1). None-metadata drawers now roll up under the existing "unknown" fallback bucket instead of poisoning the response with a misleading partial flag. Regression test: mock the metadata cache with a None in the middle, assert tool_status returns clean counts and no error/partial fields. Verified the test fails without the guard. 998 tests pass.

jphein · 2026-04-18T19:39:24Z

Heads-up on scope drift — this PR now covers 8 spots across 3 files:

mempalace/searcher.py — CLI print loop + search_memories() closet-boost loop + drawer-scoring loop
mempalace/miner.py — status() wing/room histogram loop
mempalace/mcp_server.py — tool_status, tool_list_wings, tool_list_rooms, tool_get_taxonomy

All the same meta = meta or {} one-liner, all catching ChromaDB get() / query() returning None entries in the inner metadatas list for drawers with no stored metadata.

Whack-a-mole across 8 call sites suggests the cleaner fix probably lives one layer down — either:

_get_cached_metadata / _fetch_all_metadata (in mcp_server.py) strip None entries as they build the cache, OR
ChromaCollection.get() / ChromaCollection.query() in backends/chroma.py coerce None metadatas to {} at the adapter boundary.

Option 2 is more general — every future call site would get the guard for free, and it matches @bensig's observation in #1013 that Layer3.search_raw also needs this (a fourth file we don't touch here). Option 1 only covers the two cache helpers but keeps the backend adapter faithful to what Chroma actually returns.

Happy to convert this to the adapter-level fix if maintainers prefer. Current state (per-site guards) is defensive and ships today; the adapter-level consolidation would be a follow-up that deletes all the individual guards.

README: - Fork-changes table: expand the None-metadata row to cover all 8 sites (searcher.py CLI + API + closet-boost, miner.status, 4 mcp_server handlers). Previous row only called out the CLI print path. - Add a new Search row: warnings + sqlite BM25 top-up contract (the "never silent miss" feature) with pointer to MemPalace#951 + MemPalace#823. - Open-PR table: expand MemPalace#999 scope line to mention 8 sites + architectural note, update MemPalace#1000 to reflect post-MemPalace#995 rebase, add MemPalace#1005 with Copilot fixes noted. CLAUDE.md: - PR status header: 7 open -> 8 open (adds MemPalace#1005). - Same PR row updates as README for MemPalace#999/MemPalace#1000/MemPalace#1005. - Fork Changes list: expand entry 11 (None guards) to 8 sites + adapter consolidation proposal on MemPalace#999; add entry 14 for the warnings+BM25 feature; keep 12 and 13 as-is. 42 README-claim tests still pass.

Concrete evidence: - New "What this looks like in practice" section right after the status line, showing a real stop-hook systemMessage output and a real mempalace_search response shape (warnings, available_in_scope, matched_via tags, similarity scores). Someone evaluating the fork can see what "running in production" actually looks like. Headlines box: - New "Headlines" subsection at the top of Fork Changes with the three differentiators someone should know if they only read one section — silent-save hook, ChromaDB 1.5.x hardening (quarantine + None guards), search-never-misses contract. Links to MemPalace#673/MemPalace#999/ MemPalace#1000/MemPalace#1005 so readers can jump to the work itself. Citations for comparison table: - Every row now links to its upstream repo: Hindsight (vectorize-io), Mem0 + OpenMemory (mem0ai), Cognee (topoteretes), Letta (letta-ai), engram (NickCirv), CaviraOSS OpenMemory. Cognee row updated since they've added MCP support since we first wrote the row. - Replaced the "Systems mentioned without captured primary URLs" footnote (now stale since we have them) with a "Verification note" that's honest about the point-in-time nature of these columns and explains why TagMem is absent. Structure cleanups: - Removed the upstream MemPalace logo at the top — it's milla- jovovich's asset and using it in a fork README is awkward. - Renamed "Roadmap" → "Planned work" — the section is decisive with priorities and time estimates, "Roadmap" was underselling it. No content removed from the document beyond the stale footnote and the upstream logo. All existing sections intact. 42 README-claim tests pass.

jphein · 2026-04-18T20:24:57Z

Surfacing a parallel effort for maintainer awareness: @cantenesse's #1019 just filed the same meta or {} / doc or "" guard pattern for searcher.py's three loops. Their scope is narrower (just searcher.py); this PR additionally covers miner.status() and four mcp_server.py handlers (tool_status, tool_list_wings, tool_list_rooms, tool_get_taxonomy) that hit the same class of bug. If #1019 merges first I'm happy to rebase this down to just the miner + mcp_server sites.

Same class of bug as #1007: ChromaDB's query() can return None in the documents and metadatas arrays when a drawer's HNSW vector entry exists but its metadata/document rows haven't been materialized. The code in Layer3.search_raw (mempalace/layers.py) calls meta.get("wing", ...), meta.get("room", ...), meta.get("source_file", ...) directly without null safety, so it raises: AttributeError: 'NoneType' object has no attribute 'get' Two-line defensive coercion matching the pattern in #1009 / PR #999 for searcher.py: meta = meta or {}, doc = doc or "". The hit still appears with its real distance; source/wing/room fall back to their fallback values where the metadata row is missing. Frequently hit on chromadb 1.5.x (root cause #1006). Even after the chromadb floor lands (#1010), partial-state results remain possible during interrupted mines and schema upgrade boundaries, so the guard is worth having on its own. Fixes #1011.

Every remaining row in "Still ahead of upstream" now carries a status so the reader can tell at a glance whether each change is being upstreamed, pending a PR, or deliberately fork-local. Dropped: - "Guard ChromaDB 1.5.x metadata-mismatch segfault" — this row was overstated. The memory file for today's debugging notes that the try-get/except-create pattern is defensive code that never reproduced a specific crash (the actual crashes traced to HNSW drift). Leaving it in "Still ahead" implied an upstream-candidate fix, which it isn't. Code stays in place as defensive, but the README no longer claims it as a fork-ahead feature. Moved to Superseded: - "Stale HNSW mtime detection + mempalace_reconnect" — upstream took a different approach in MemPalace#757. Our broader inode+mtime detection and the mempalace_reconnect MCP tool remain as fork-local convenience; they're just not "ahead of upstream" anymore. Statuses now populated: - Linked PR number for the 7 changes with active upstream PRs (MemPalace#659, MemPalace#660, MemPalace#661, MemPalace#673 with APPROVED note, MemPalace#999, MemPalace#1000, MemPalace#1005). - "PR pending" for 3 items that are good candidates but unfiled: epsilon mtime comparison, max_distance parameter, tool output mining. - "fork-only" for 2 items we keep intentionally without pitching upstream: .blob_seq_ids_migrated marker (narrow), bulk_check_mined (complementary to upstream's MemPalace#784 file-locking). Legend sentence added above the table explains the three status values. 42 README-claim tests pass.

bensig merged fix/searcher-none-metadata at 13:41 PDT (commit 1b89b49), carrying all 8 None-metadata guards (3 searcher.py + 1 miner.status + 4 mcp_server handlers) plus 112 lines of tests into develop. README: move row out of "Still ahead", fold into "Headlines" bullet, add "Merged upstream (post-v3.3.1)" section, drop from Open PRs table, adjust counts to "7 open" + note the merge in the status line. CLAUDE.md: renumber Fork Changes list (drop item 11, now 13 items), add new "Merged into upstream (post-v3.3.1)" subsection above v3.3.0, update PR table to "6 merged, 7 open, 7 closed", mark MemPalace#999 merged. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Dialectician

Read the diff. Defensive meta or {} coercion pattern applied at each of the call sites that could crash on a None metadata entry, and the new tests cover each guarded path (tool_status, miner.status, searcher, and the MCP list handlers per #1005-era conversation). Additive only, no deletions, no behavior change for the happy path.

The failure mode this guards against is real (ChromaDB query() / get() returning None for drawers with no stored metadata, crashing mid-tally on AttributeError) and the class-of-fix matches what is running in production at 137K drawers in jphein/mempalace. Straightforward and well-scoped.

One observation for a follow-up, not for this PR: 8 guard sites is a strong signal that adapter-level coercion (ChromaCollection.get/query returning meta or {}) would be more DRY than whack-a-mole at call sites. You flagged this yourself in the fork CLAUDE.md as a proposed follow-up — agreed.

Peace B.Sweet

Upstream added test_search_handles_none_metadata_without_crash in MemPalace#999; our _count_in_scope calls col.count() so mocks need count.return_value set. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

Chroma 1.5.x can return ``None`` inside the ``metadatas`` / ``documents`` lists of a query/get result for partially-flushed rows. The codebase already has a systemic None-guard pattern (merged MemPalace#999, MemPalace#1013, MemPalace#1019) but three call sites were still unguarded: * ``mcp_server.tool_check_duplicate`` (``mcp_server.py:487-488``) — ``meta = results["metadatas"][0][i]`` followed by ``meta.get(...)`` raises ``AttributeError: 'NoneType' object has no attribute 'get'``. The broad ``except Exception`` wrapper (line 504) swallows it and returns an uninformative ``"Duplicate check failed"``. * ``layers.Layer1.generate`` (``layers.py:126``) — iterates ``zip(docs, metas)`` and calls ``meta.get(key)`` in the importance loop. A single None metadata blows up the entire wake-up render. * ``layers.Layer2.retrieve`` (``layers.py:224``) — same pattern, same crash path for the on-demand render. Apply the same ``meta = meta or {}`` / ``doc = doc or ""`` idiom used by the merged guards in the search path. Three-line additions, no behaviour change on well-formed results. Tests added: * ``test_check_duplicate_handles_none_metadata`` — mocks the collection query to return ``None`` for one metadata and document, asserts the call does not crash and the sentinel-rendered entry has wing/room "?" and empty content. * ``test_layer1_handles_none_metadata`` / ``_handles_none_document`` * ``test_layer2_handles_none_metadata`` Relationship to other open PRs: * **MemPalace#1019** guarded ``searcher.py`` loops. This PR extends the same guard to the three call sites MemPalace#1019 did not touch. * **MemPalace#979** fixed ``tool_check_duplicate`` negative similarity but left the None-metadata path unguarded. * Does not overlap **MemPalace#1013** (``Layer3.search_raw``) or **MemPalace#999**.

Version bumps across pyproject.toml, mempalace/version.py, README badge, uv.lock, and plugin manifests (.claude-plugin/*, .codex-plugin/*). CHANGELOG aligned with main (post-3.3.1) and a new [3.3.2] section added covering the 11 PRs merged on develop since v3.3.1 — silent-transcript-drop fix + tandem sweeper (#998), None-metadata guards (#999, #1013), chromadb ≥1.5.4 for Py 3.13/3.14 (#1010), Windows Unicode (#681), HNSW quarantine recovery (#1000), PID stacking guard (#1023), doc-path cleanup (#996, #1012), and RFC 001/002 internal scaffolding (#995, #1014, #990).

…ndary ChromaDB 1.5.x can return None inside the `metadatas` list even when the write stored a dict. Three prior PRs added per-site `meta = meta or {}` guards to compensate: - MemPalace#999 — guards in searcher.py (CLI + API + closet-boost), miner.status(), and 4 mcp_server.py handlers - MemPalace#1013 — guard in Layer3.search_raw() - Various other paths picked up the same defensive pattern organically (e.g. palace_graph, layers.py L1 pre-filter) That scatter works but violates the typed contract QueryResult and GetResult declare: both advertise `metadatas: list[dict]`, never `list[Optional[dict]]`. Every call site that forgot the guard is a latent `AttributeError: 'NoneType' object has no attribute 'get'`. This PR moves the coercion to the single place the type contract is produced — ChromaCollection.query() and ChromaCollection.get() in backends/chroma.py. Each individual None dict gets coerced to {} at the same boundary where the outer list is already normalized via _none_list_to_empty(). Downstream callers receive a guaranteed list[dict] matching the declared return shape. ## Per-site guards Leaving the existing guards in place as defensive belt-and-suspenders. They become redundant after this boundary coercion but removing them risks regressing any call path that might bypass the typed wrappers. A follow-up cleanup PR can remove them once we're confident nothing else reaches chromadb directly. ## Tests Six new tests in tests/test_backends.py cover the boundary's behavior on the known None-metadata shapes: - query(): mixed None / dict in a single batch - query(): all-None inner list - query(): None dicts across multiple queries in one call - get(): mixed None / dict - get(): padding regression (short metas list → {} pads, never None) - get(): metadatas not requested → empty list passthrough Full test_backends.py suite: 35 passed (6 new). Full repo suite: 1072 passed. Ruff format + check clean.

`_tokenize` calls `text.lower()` unconditionally; when ChromaDB returns a drawer with `documents` containing `None`, the hybrid-rerank path raises `AttributeError: 'NoneType' object has no attribute 'lower'`. Observed in production daemon log (2026-04-24 21:07:05) during a search that triggered `_hybrid_rank → _bm25_scores → _tokenize`: File "mempalace/searcher.py", line 81, in _bm25_scores tokenized = [_tokenize(d) for d in documents] File "mempalace/searcher.py", line 52, in _tokenize return _TOKEN_RE.findall(text.lower()) AttributeError: 'NoneType' object has no attribute 'lower' Closes the gap left by the upstream None-metadata audit (MemPalace#999), which covered metadata loops but not BM25 helpers. Returns `[]` for falsy input so a None doc gets score 0.0 while the rest of the corpus reranks normally. Three regression tests in TestBM25NoneSafety lock the behavior and reference the production trace. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

ChromaDB can return None for drawers without metadata (legacy data, partial writes — same root cause as upstream MemPalace#1020 / our PR MemPalace#1094). build_graph at line 95 called meta.get("room", "") unconditionally, which AttributeErrors on None and takes out every consumer of build_graph for the whole call path: graph_stats, find_tunnels, traverse, and (most visibly) the daemon's /stats endpoint. Caught 2026-04-25 by palace-daemon's verify-routes.sh smoke test against the canonical 151K-drawer palace — /stats was 500-ing on a single None drawer. Adds `if meta is None: continue` guard. Closes the same gap upstream's MemPalace#999 None-metadata audit closed in searcher.py / mcp_server.py / miner.status, just in a different file the audit didn't reach. The graph-build is recoverable: skipping a single None drawer doesn't distort the graph since build_graph already filters `room and room != "general" and wing` — a missing-metadata drawer was never going to participate anyway. Test: TestBuildGraph::test_none_metadata_does_not_crash mixes a None entry into a 3-drawer fixture and asserts the two real drawers are processed normally. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

palace_graph.build_graph None-metadata guard fix (commit 5fd15db). Caught by palace-daemon's verify-routes.sh on 2026-04-25; /stats was 500-ing on a single legacy drawer. Filed upstream as PR MemPalace#1201 — same lineage as MemPalace#999 (None-metadata audit) and MemPalace#1094 (None-coercion at backend boundary). Updates the row count: 14 merged, 8 open (added MemPalace#1201), 10 closed. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

…ases A–C) Implements the structural fix promoted by the 2026-04-25 Cat 9 A/B (632 vs 3 tokens per query): Stop-hook auto-save checkpoint diary entries no longer share an index with content drawers. Phase A — scaffolding (no behavior change): - _SESSION_RECOVERY_COLLECTION constant + get_session_recovery_collection() in palace.py, mirroring get_collection's shape (cosine + thread-pin). - New tests/test_session_recovery.py: 3 tests covering constant, metadata, multi-collection coexistence on one palace. Phase B — write routing (the behavior change): - tool_diary_write routes topic in _CHECKPOINT_TOPICS to the recovery collection; everything else stays in the main collection. - _get_session_recovery_collection() in mcp_server.py with parallel cache. - conftest._reset_mcp_cache resets the new cache between tests. - 4 tests in TestCheckpointRouting — checkpoint→recovery, auto-save→recovery, general→main, search regression (checkpoints structurally invisible to mempalace_search now, not merely post-filtered). Phase C — new read path: - tool_session_recovery_read reads from the recovery collection only, with optional filters: session_id, agent, since, until, wing, limit. - session_id added as optional metadata field on tool_diary_write so recovery reads can filter by Claude Code session. - Defensive None-metadata handling per the MemPalace#999 / MemPalace#1094 / MemPalace#1201 family. - Registered in TOOLS dict + documented in website/reference/mcp-tools.md (test_no_undocumented_tools regression caught the missing doc; fixed). - 5 tests in TestSessionRecoveryRead — empty case, session_id filter, agent filter, limit, None-metadata defense. Phases D (migration of existing checkpoints out of main collection) and E (palace-daemon startup integration) explicitly deferred — multi-day work, gated on a separate go-ahead. Full suite 1360/1360 pass; 106 benchmark/stress deselected per default. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

…rry-pick of MemPalace#1094) Fork main was carrying the per-site `meta = meta or {}` guards from MemPalace#999 in eight read paths but didn't have the boundary coercion that closes the issue once for all callers. Cherry-picked MemPalace#1094 (open upstream, jp-authored) so the typed QueryResult/GetResult contract — both declare `metadatas: list[dict]`, never `list[Optional[dict]]` — is honored at the chromadb adapter boundary. Three same-family fork-ahead PRs we filed since (MemPalace#1198 _tokenize None-doc, MemPalace#1201 palace_graph None metadata, MemPalace#1199 review) all pointed at gaps that would have been impossible if this pattern had been in place. Future None-metadata bugs of the same shape are now catchable at one site instead of N. Per-site guards are kept as belt-and-suspenders for paths that might bypass the typed wrappers (e.g. legacy `import chromadb` direct calls). A future cleanup PR can remove them once we're confident no caller reaches chromadb directly. 6 new tests in test_backends.py (MemPalace#1094 originals); composes cleanly with fork main's _segment_appears_healthy + _quarantined_paths from earlier today. Full suite 1383/1383. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

ChromaDB can return None for drawers without metadata (legacy data, partial writes — same root cause as upstream #1020 / our PR #1094). build_graph at line 95 called meta.get("room", "") unconditionally, which AttributeErrors on None and takes out every consumer of build_graph for the whole call path: graph_stats, find_tunnels, traverse, and (most visibly) the daemon's /stats endpoint. Caught 2026-04-25 by palace-daemon's verify-routes.sh smoke test against the canonical 151K-drawer palace — /stats was 500-ing on a single None drawer. Adds `if meta is None: continue` guard. Closes the same gap upstream's #999 None-metadata audit closed in searcher.py / mcp_server.py / miner.status, just in a different file the audit didn't reach. The graph-build is recoverable: skipping a single None drawer doesn't distort the graph since build_graph already filters `room and room != "general" and wing` — a missing-metadata drawer was never going to participate anyway. Test: TestBuildGraph::test_none_metadata_does_not_crash mixes a None entry into a 3-drawer fixture and asserts the two real drawers are processed normally. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

`_tokenize` calls `text.lower()` unconditionally; when ChromaDB returns a drawer with `documents` containing `None`, the hybrid-rerank path raises `AttributeError: 'NoneType' object has no attribute 'lower'`. Observed in production daemon log (2026-04-24 21:07:05) during a search that triggered `_hybrid_rank → _bm25_scores → _tokenize`: File "mempalace/searcher.py", line 81, in _bm25_scores tokenized = [_tokenize(d) for d in documents] File "mempalace/searcher.py", line 52, in _tokenize return _TOKEN_RE.findall(text.lower()) AttributeError: 'NoneType' object has no attribute 'lower' Closes the gap left by the upstream None-metadata audit (#999), which covered metadata loops but not BM25 helpers. Returns `[]` for falsy input so a None doc gets score 0.0 while the rest of the corpus reranks normally. Three regression tests in TestBM25NoneSafety lock the behavior and reference the production trace. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

ChromaDB can return None for drawers without metadata (legacy data, partial writes — same root cause as upstream MemPalace#1020 / our PR MemPalace#1094). build_graph at line 95 called meta.get("room", "") unconditionally, which AttributeErrors on None and takes out every consumer of build_graph for the whole call path: graph_stats, find_tunnels, traverse, and (most visibly) the daemon's /stats endpoint. Caught 2026-04-25 by palace-daemon's verify-routes.sh smoke test against the canonical 151K-drawer palace — /stats was 500-ing on a single None drawer. Adds `if meta is None: continue` guard. Closes the same gap upstream's MemPalace#999 None-metadata audit closed in searcher.py / mcp_server.py / miner.status, just in a different file the audit didn't reach. The graph-build is recoverable: skipping a single None drawer doesn't distort the graph since build_graph already filters `room and room != "general" and wing` — a missing-metadata drawer was never going to participate anyway. Test: TestBuildGraph::test_none_metadata_does_not_crash mixes a None entry into a 3-drawer fixture and asserts the two real drawers are processed normally. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

`_tokenize` calls `text.lower()` unconditionally; when ChromaDB returns a drawer with `documents` containing `None`, the hybrid-rerank path raises `AttributeError: 'NoneType' object has no attribute 'lower'`. Observed in production daemon log (2026-04-24 21:07:05) during a search that triggered `_hybrid_rank → _bm25_scores → _tokenize`: File "mempalace/searcher.py", line 81, in _bm25_scores tokenized = [_tokenize(d) for d in documents] File "mempalace/searcher.py", line 52, in _tokenize return _TOKEN_RE.findall(text.lower()) AttributeError: 'NoneType' object has no attribute 'lower' Closes the gap left by the upstream None-metadata audit (MemPalace#999), which covered metadata loops but not BM25 helpers. Returns `[]` for falsy input so a None doc gets score 0.0 while the rest of the corpus reranks normally. Three regression tests in TestBM25NoneSafety lock the behavior and reference the production trace. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Copilot AI review requested due to automatic review settings April 18, 2026 17:01

jphein requested review from bensig, igorls and milla-jovovich as code owners April 18, 2026 17:01

Copilot started reviewing on behalf of jphein April 18, 2026 17:01 View session

Copilot AI reviewed Apr 18, 2026

View reviewed changes

This was referenced Apr 18, 2026

fix: guard searcher loop against None doc/meta from ChromaDB (#1007) #1009

Closed

bug: Layer3.search_raw() in layers.py crashes on None metadata (same pattern as #1007) #1011

Closed

bensig mentioned this pull request Apr 18, 2026

fix: guard Layer3.search_raw against None doc/meta from ChromaDB (#1011) #1013

Merged

3 tasks

bensig merged commit 1b89b49 into MemPalace:develop Apr 18, 2026
6 checks passed

Dialectician reviewed Apr 19, 2026

View reviewed changes

jphein deleted the fix/searcher-none-metadata branch April 19, 2026 03:51

eldar702 mentioned this pull request Apr 19, 2026

fix: guard None metadata/doc in tool_check_duplicate and Layer1/Layer2 #1030

Merged

jphein mentioned this pull request Apr 19, 2026

perf: paginate miner.status() — fixes SQLITE_MAX_VARIABLE_NUMBER crash past ~32K drawers #1036

Closed

2 tasks

igorls mentioned this pull request Apr 19, 2026

release: v3.3.2 #1041

Merged

8 tasks

bensig mentioned this pull request Apr 23, 2026

bug: 7 issues in v3.0.0→v3.3.0 — data loss in repair/migrate, missing input validation, retry gaps #934

Closed

jphein mentioned this pull request Apr 25, 2026

fix(searcher): tolerate None documents in BM25 reranker #1198

Merged

jphein mentioned this pull request Apr 25, 2026

fix(palace_graph): skip None metadata in build_graph #1201

Merged

This was referenced May 6, 2026

fix(mcp): retry _get_collection once on transient failure (#1286) #1377

Merged

fix(mcp_server): log exception + retry once on _get_collection failure #1286

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(searcher): guard against None metadata in CLI print path#999

fix(searcher): guard against None metadata in CLI print path#999
bensig merged 4 commits intoMemPalace:developfrom
jphein:fix/searcher-none-metadata

jphein commented Apr 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

jphein commented Apr 18, 2026

Uh oh!

jphein commented Apr 18, 2026

Uh oh!

jphein commented Apr 18, 2026

Uh oh!

jphein commented Apr 18, 2026

Uh oh!

Uh oh!

Dialectician left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jphein commented Apr 18, 2026

Summary

Fix

Test

Scope

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

jphein commented Apr 18, 2026

Uh oh!

jphein commented Apr 18, 2026

Uh oh!

jphein commented Apr 18, 2026

Uh oh!

jphein commented Apr 18, 2026

Uh oh!

Uh oh!

Dialectician left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants