Skip to content

fix(searcher): guard against None metadata in CLI print path#999

Merged
bensig merged 4 commits intoMemPalace:developfrom
jphein:fix/searcher-none-metadata
Apr 18, 2026
Merged

fix(searcher): guard against None metadata in CLI print path#999
bensig merged 4 commits intoMemPalace:developfrom
jphein:fix/searcher-none-metadata

Conversation

@jphein
Copy link
Copy Markdown
Collaborator

@jphein jphein commented Apr 18, 2026

Summary

col.query(...) can return None entries in the inner metadatas list for drawers whose metadata was never set — older palaces, or rows written outside the normal mining path. When that happens, the CLI search() function renders earlier results successfully, then crashes mid-loop with:

AttributeError: 'NoneType' object has no attribute 'get'

at searcher.py:286 (meta.get("source_file", "?")). The user sees partial output followed by a traceback, with no indication of which drawers rendered and which were skipped.

Fix

One-line guard inside the loop: meta = meta or {} so entries with missing metadata fall back to the existing "?" defaults. Matches the way search_memories() assembles hit dicts against the same data (meta.get("wing", "unknown") etc.).

Test

Added a regression test (test_search_handles_none_metadata_without_crash) that mocks a ChromaDB result with a None entry in the middle of the inner metadatas list and asserts both result blocks render. Verified the test fails without the guard (reproduces the crash) and passes with it.

Scope

Independent of — but complementary to — #951 which handles a different search failure mode (ChromaDB filter planner returning Error finding id on migrated palaces). #951 wraps the query call; this one wraps the result rendering.

Testing

  • pytest tests/test_searcher.py — 20 passed (up from 19)
  • Regression-test verification: temp-reverted the one-line guard, ran just the new test, confirmed AttributeError
  • ruff check / ruff format --check — clean

`col.query(...)` can return `None` entries in the inner ``metadatas`` list
for drawers whose metadata was never set (older palaces, rows written
outside the normal mining path). The CLI `search()` function would render
earlier results successfully and then crash mid-loop with:

    AttributeError: 'NoneType' object has no attribute 'get'

at ``searcher.py:286`` — ``meta.get("source_file", "?")``. The user sees
partial output followed by a traceback, with no indication of which
drawers rendered OK and which were skipped.

Guard with ``meta = meta or {}`` inside the loop so entries with missing
metadata fall back to the existing ``"?"`` defaults instead of crashing,
matching the hit dict assembly in ``search_memories()`` which already
uses ``meta.get("wing", "unknown")`` etc. against the same data.

Adds a regression test that mocks a ChromaDB result with a ``None``
metadata entry in the middle of the inner list and asserts both result
blocks render to stdout.
Copilot AI review requested due to automatic review settings April 18, 2026 17:01
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Guards the CLI search() render loop against None entries in ChromaDB’s returned metadatas, preventing mid-output crashes and adding a regression test.

Changes:

  • Add a one-line meta = meta or {} guard in mempalace.searcher.search() before accessing metadata keys.
  • Add a regression test that mocks a ChromaDB result containing a None metadata entry and asserts both result blocks render.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
mempalace/searcher.py Prevents AttributeError during CLI result rendering when a metadata entry is None.
tests/test_searcher.py Adds a regression test ensuring CLI search output doesn’t crash and prints all results when metadata contains None.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread mempalace/searcher.py
Comment on lines 284 to 288
for i, (doc, meta, dist) in enumerate(zip(docs, metas, dists), 1):
similarity = round(max(0.0, 1 - dist), 3)
meta = meta or {}
source = Path(meta.get("source_file", "?")).name
wing_name = meta.get("wing", "?")
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CLI guard handles meta is None, but search_memories() in the same module still calls meta.get(...) (and cmeta.get(...) for closet hits) without guarding against None metadata returned by Chroma. If metadatas contains None, the API path will still raise AttributeError. Consider applying the same meta = meta or {} (or an isinstance(meta, dict) check) in the search_memories() loops as well, and adding a regression test for the API function if this is an expected real-world state.

Copilot uses AI. Check for mistakes.
`status()` walks `col.get(include=["metadatas"])` and buckets each drawer
into a `wing_rooms[wing][room]` histogram. The same ChromaDB return shape
fixed in the search print path — `None` entries in the `metadatas` list
for drawers with no stored metadata — crashes the status command with:

    AttributeError: 'NoneType' object has no attribute 'get'

Applies the matching ``m = m or {}`` guard so None-metadata drawers roll
up under the existing `?/?` fallback bucket instead of killing the
command mid-tally. Reproduced on a 135K-drawer palace where two drawers
had `metadata=None`; both now show under `WING: ? / ROOM: ?` in the
tally while the command prints the full histogram as designed.

Adds a regression test that feeds `status()` a fake collection whose
`get()` returns a `None` in the middle of the metadatas list and asserts
both the fallback bucket and the real wing render.
@jphein
Copy link
Copy Markdown
Collaborator Author

jphein commented Apr 18, 2026

Expanded scope in feba7e8 — same AttributeError: 'NoneType' object has no attribute 'get' hit again today in miner.status() on a production palace with 2 None-metadata drawers. Added the matching m = m or {} guard at the histogram loop and a regression test that feeds status a fake collection whose get() returns a None in the metadatas list.

Both spots are the same class of bug — ChromaDB can surface None entries in the metadatas inner list for drawers whose metadata was never stored, and the two read paths (searcher print, status histogram) both ran .get() on the entry unconditionally. Happy to split into two PRs if you'd prefer one-concern-per-PR.

Per Copilot review on the CLI-only PR (MemPalace#999): search_memories() has the
same vulnerability in two additional spots, since ChromaDB can return
None entries in the inner metadatas list for either the drawer query or
the closets query. Without guards, the API path crashes with:

    AttributeError: 'NoneType' object has no attribute 'get'

at either \`cmeta.get("source_file", "")\` in the closet boost lookup or
\`meta.get("source_file", "") or ""\` in the drawer scoring loop.

Applies the matching \`meta = meta or {}\` / \`cmeta = cmeta or {}\`
guard at both sites and adds an API-path regression test that mocks a
drawer query result with a None metadata entry and asserts both hits
render — the None-metadata hit with the existing \`"unknown"\` sentinel
values the scoring loop already writes for missing keys.

Verified both the new API test and the existing CLI test fail without
the guards (AttributeError) and pass with them.
@jphein
Copy link
Copy Markdown
Collaborator Author

jphein commented Apr 18, 2026

Thanks @copilot-pull-request-reviewer — good catch. Addressed in 7690574: added the same meta = meta or {} / cmeta = cmeta or {} guards to the two call sites in search_memories() (the closet boost lookup at the cmeta.get line and the drawer scoring loop at the meta.get line). Added an API-path regression test (test_search_memories_handles_none_metadata) that mocks a drawer result with a None metadata entry and asserts both hits render, with the None-metadata hit using the existing "unknown" sentinels. Verified it fails without the guard.

jphein added a commit to jphein/mempalace that referenced this pull request Apr 18, 2026
README:
- Bump drawer count 134K → 137K (matches current palace)
- Update "Open upstream PRs" table with actual reviewer status, add MemPalace#999
  and MemPalace#1000, drop stale "rebased against MemPalace#863" note since MemPalace#863 is in v3.3.1

CLAUDE.md:
- Rename section header v3.3.0 → v3.3.1 (we merged v3.3.1 earlier today)
- Add fork changes 11-13: None-metadata guards, .blob_seq_ids_migrated
  marker, quarantine_stale_hnsw
- Replace "Pulled in from upstream v3.3.1" summary: i18n, BCP-47 locales,
  UTF-8 Path.read_text, non-blocking precompact (MemPalace#863), silent_save
  honoring (MemPalace#966) — ours still broader
- Update the open-upstream-PRs table to 7 open, include MemPalace#999 + MemPalace#1000 and
  refresh each row's status line to match what's actually on GitHub
  (MERGEABLE flags, @bensig ping context, Copilot review addressed, etc.)

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
bensig added a commit that referenced this pull request Apr 18, 2026
Same class of bug as #1007: ChromaDB's query() can return None in the
documents and metadatas arrays when a drawer's HNSW vector entry exists
but its metadata/document rows haven't been materialized. The code in
Layer3.search_raw (mempalace/layers.py) calls meta.get("wing", ...),
meta.get("room", ...), meta.get("source_file", ...) directly without
null safety, so it raises:

  AttributeError: 'NoneType' object has no attribute 'get'

Two-line defensive coercion matching the pattern in #1009 /
PR #999 for searcher.py: meta = meta or {}, doc = doc or "".
The hit still appears with its real distance; source/wing/room
fall back to their fallback values where the metadata row is missing.

Frequently hit on chromadb 1.5.x (root cause #1006). Even after the
chromadb floor lands (#1010), partial-state results remain possible
during interrupted mines and schema upgrade boundaries, so the guard
is worth having on its own.

Fixes #1011.
jphein added a commit to jphein/mempalace that referenced this pull request Apr 18, 2026
Per Copilot review on the CLI-only PR (MemPalace#999): search_memories() has the
same vulnerability in two additional spots, since ChromaDB can return
None entries in the inner metadatas list for either the drawer query or
the closets query. Without guards, the API path crashes with:

    AttributeError: 'NoneType' object has no attribute 'get'

at either \`cmeta.get("source_file", "")\` in the closet boost lookup or
\`meta.get("source_file", "") or ""\` in the drawer scoring loop.

Applies the matching \`meta = meta or {}\` / \`cmeta = cmeta or {}\`
guard at both sites and adds an API-path regression test that mocks a
drawer query result with a None metadata entry and asserts both hits
render — the None-metadata hit with the existing \`"unknown"\` sentinel
values the scoring loop already writes for missing keys.

Verified both the new API test and the existing CLI test fail without
the guards (AttributeError) and pass with them.
…t None metadata

Four more MCP handlers iterate a metadata list and call m.get(...)
unconditionally. When the cache contains a None entry (drawers with no
metadata, common on older mining paths), the try block catches the
AttributeError and marks the response "partial: true" with an
error message — visible as {"error": "'NoneType' object has no
attribute 'get'", "partial": true} returned from mempalace_status even
though the palace data is otherwise fetchable.

Same m = m or {} guard we applied to searcher.py (d3a2d22, a51c3c2)
and miner.status() (66f08a1). None-metadata drawers now roll up under
the existing "unknown" fallback bucket instead of poisoning the
response with a misleading partial flag.

Regression test: mock the metadata cache with a None in the middle,
assert tool_status returns clean counts and no error/partial fields.
Verified the test fails without the guard.

998 tests pass.
@jphein
Copy link
Copy Markdown
Collaborator Author

jphein commented Apr 18, 2026

Heads-up on scope drift — this PR now covers 8 spots across 3 files:

  • mempalace/searcher.py — CLI print loop + search_memories() closet-boost loop + drawer-scoring loop
  • mempalace/miner.pystatus() wing/room histogram loop
  • mempalace/mcp_server.pytool_status, tool_list_wings, tool_list_rooms, tool_get_taxonomy

All the same meta = meta or {} one-liner, all catching ChromaDB get() / query() returning None entries in the inner metadatas list for drawers with no stored metadata.

Whack-a-mole across 8 call sites suggests the cleaner fix probably lives one layer down — either:

  1. _get_cached_metadata / _fetch_all_metadata (in mcp_server.py) strip None entries as they build the cache, OR
  2. ChromaCollection.get() / ChromaCollection.query() in backends/chroma.py coerce None metadatas to {} at the adapter boundary.

Option 2 is more general — every future call site would get the guard for free, and it matches @bensig's observation in #1013 that Layer3.search_raw also needs this (a fourth file we don't touch here). Option 1 only covers the two cache helpers but keeps the backend adapter faithful to what Chroma actually returns.

Happy to convert this to the adapter-level fix if maintainers prefer. Current state (per-site guards) is defensive and ships today; the adapter-level consolidation would be a follow-up that deletes all the individual guards.

jphein added a commit to jphein/mempalace that referenced this pull request Apr 18, 2026
README:
- Fork-changes table: expand the None-metadata row to cover all 8 sites
  (searcher.py CLI + API + closet-boost, miner.status, 4 mcp_server
  handlers). Previous row only called out the CLI print path.
- Add a new Search row: warnings + sqlite BM25 top-up contract (the
  "never silent miss" feature) with pointer to MemPalace#951 + MemPalace#823.
- Open-PR table: expand MemPalace#999 scope line to mention 8 sites + architectural
  note, update MemPalace#1000 to reflect post-MemPalace#995 rebase, add MemPalace#1005 with Copilot
  fixes noted.

CLAUDE.md:
- PR status header: 7 open -> 8 open (adds MemPalace#1005).
- Same PR row updates as README for MemPalace#999/MemPalace#1000/MemPalace#1005.
- Fork Changes list: expand entry 11 (None guards) to 8 sites + adapter
  consolidation proposal on MemPalace#999; add entry 14 for the warnings+BM25
  feature; keep 12 and 13 as-is.

42 README-claim tests still pass.
jphein added a commit to jphein/mempalace that referenced this pull request Apr 18, 2026
Concrete evidence:
- New "What this looks like in practice" section right after the
  status line, showing a real stop-hook systemMessage output and a
  real mempalace_search response shape (warnings, available_in_scope,
  matched_via tags, similarity scores). Someone evaluating the fork
  can see what "running in production" actually looks like.

Headlines box:
- New "Headlines" subsection at the top of Fork Changes with the
  three differentiators someone should know if they only read one
  section — silent-save hook, ChromaDB 1.5.x hardening (quarantine +
  None guards), search-never-misses contract. Links to MemPalace#673/MemPalace#999/
  MemPalace#1000/MemPalace#1005 so readers can jump to the work itself.

Citations for comparison table:
- Every row now links to its upstream repo: Hindsight (vectorize-io),
  Mem0 + OpenMemory (mem0ai), Cognee (topoteretes), Letta (letta-ai),
  engram (NickCirv), CaviraOSS OpenMemory. Cognee row updated since
  they've added MCP support since we first wrote the row.
- Replaced the "Systems mentioned without captured primary URLs"
  footnote (now stale since we have them) with a "Verification note"
  that's honest about the point-in-time nature of these columns and
  explains why TagMem is absent.

Structure cleanups:
- Removed the upstream MemPalace logo at the top — it's milla-
  jovovich's asset and using it in a fork README is awkward.
- Renamed "Roadmap" → "Planned work" — the section is decisive with
  priorities and time estimates, "Roadmap" was underselling it.

No content removed from the document beyond the stale footnote and
the upstream logo. All existing sections intact. 42 README-claim
tests pass.
@jphein
Copy link
Copy Markdown
Collaborator Author

jphein commented Apr 18, 2026

Surfacing a parallel effort for maintainer awareness: @cantenesse's #1019 just filed the same meta or {} / doc or "" guard pattern for searcher.py's three loops. Their scope is narrower (just searcher.py); this PR additionally covers miner.status() and four mcp_server.py handlers (tool_status, tool_list_wings, tool_list_rooms, tool_get_taxonomy) that hit the same class of bug. If #1019 merges first I'm happy to rebase this down to just the miner + mcp_server sites.

bensig added a commit that referenced this pull request Apr 18, 2026
Same class of bug as #1007: ChromaDB's query() can return None in the
documents and metadatas arrays when a drawer's HNSW vector entry exists
but its metadata/document rows haven't been materialized. The code in
Layer3.search_raw (mempalace/layers.py) calls meta.get("wing", ...),
meta.get("room", ...), meta.get("source_file", ...) directly without
null safety, so it raises:

  AttributeError: 'NoneType' object has no attribute 'get'

Two-line defensive coercion matching the pattern in #1009 /
PR #999 for searcher.py: meta = meta or {}, doc = doc or "".
The hit still appears with its real distance; source/wing/room
fall back to their fallback values where the metadata row is missing.

Frequently hit on chromadb 1.5.x (root cause #1006). Even after the
chromadb floor lands (#1010), partial-state results remain possible
during interrupted mines and schema upgrade boundaries, so the guard
is worth having on its own.

Fixes #1011.
jphein added a commit to jphein/mempalace that referenced this pull request Apr 18, 2026
Every remaining row in "Still ahead of upstream" now carries a status
so the reader can tell at a glance whether each change is being
upstreamed, pending a PR, or deliberately fork-local.

Dropped:
- "Guard ChromaDB 1.5.x metadata-mismatch segfault" — this row was
  overstated. The memory file for today's debugging notes that the
  try-get/except-create pattern is defensive code that never
  reproduced a specific crash (the actual crashes traced to HNSW
  drift). Leaving it in "Still ahead" implied an upstream-candidate
  fix, which it isn't. Code stays in place as defensive, but the
  README no longer claims it as a fork-ahead feature.

Moved to Superseded:
- "Stale HNSW mtime detection + mempalace_reconnect" — upstream took
  a different approach in MemPalace#757. Our broader inode+mtime detection
  and the mempalace_reconnect MCP tool remain as fork-local
  convenience; they're just not "ahead of upstream" anymore.

Statuses now populated:
- Linked PR number for the 7 changes with active upstream PRs
  (MemPalace#659, MemPalace#660, MemPalace#661, MemPalace#673 with APPROVED note, MemPalace#999, MemPalace#1000, MemPalace#1005).
- "PR pending" for 3 items that are good candidates but unfiled:
  epsilon mtime comparison, max_distance parameter, tool output
  mining.
- "fork-only" for 2 items we keep intentionally without pitching
  upstream: .blob_seq_ids_migrated marker (narrow), bulk_check_mined
  (complementary to upstream's MemPalace#784 file-locking).

Legend sentence added above the table explains the three status
values. 42 README-claim tests pass.
@bensig bensig merged commit 1b89b49 into MemPalace:develop Apr 18, 2026
6 checks passed
jphein added a commit to jphein/mempalace that referenced this pull request Apr 18, 2026
bensig merged fix/searcher-none-metadata at 13:41 PDT (commit 1b89b49),
carrying all 8 None-metadata guards (3 searcher.py + 1 miner.status +
4 mcp_server handlers) plus 112 lines of tests into develop.

README: move row out of "Still ahead", fold into "Headlines" bullet,
add "Merged upstream (post-v3.3.1)" section, drop from Open PRs table,
adjust counts to "7 open" + note the merge in the status line.

CLAUDE.md: renumber Fork Changes list (drop item 11, now 13 items),
add new "Merged into upstream (post-v3.3.1)" subsection above v3.3.0,
update PR table to "6 merged, 7 open, 7 closed", mark MemPalace#999 merged.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Copy link
Copy Markdown

@Dialectician Dialectician left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read the diff. Defensive meta or {} coercion pattern applied at each of the call sites that could crash on a None metadata entry, and the new tests cover each guarded path (tool_status, miner.status, searcher, and the MCP list handlers per #1005-era conversation). Additive only, no deletions, no behavior change for the happy path.

The failure mode this guards against is real (ChromaDB query() / get() returning None for drawers with no stored metadata, crashing mid-tally on AttributeError) and the class-of-fix matches what is running in production at 137K drawers in jphein/mempalace. Straightforward and well-scoped.

One observation for a follow-up, not for this PR: 8 guard sites is a strong signal that adapter-level coercion (ChromaCollection.get/query returning meta or {}) would be more DRY than whack-a-mole at call sites. You flagged this yourself in the fork CLAUDE.md as a proposed follow-up — agreed.

Peace B.Sweet

jphein added a commit to jphein/mempalace that referenced this pull request Apr 19, 2026
Upstream added test_search_handles_none_metadata_without_crash in MemPalace#999;
our _count_in_scope calls col.count() so mocks need count.return_value set.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
@jphein jphein deleted the fix/searcher-none-metadata branch April 19, 2026 03:51
eldar702 added a commit to eldar702/mempalace that referenced this pull request Apr 19, 2026
Chroma 1.5.x can return ``None`` inside the ``metadatas`` / ``documents``
lists of a query/get result for partially-flushed rows. The codebase
already has a systemic None-guard pattern (merged MemPalace#999, MemPalace#1013, MemPalace#1019)
but three call sites were still unguarded:

* ``mcp_server.tool_check_duplicate`` (``mcp_server.py:487-488``) —
  ``meta = results["metadatas"][0][i]`` followed by ``meta.get(...)``
  raises ``AttributeError: 'NoneType' object has no attribute 'get'``.
  The broad ``except Exception`` wrapper (line 504) swallows it and
  returns an uninformative ``"Duplicate check failed"``.

* ``layers.Layer1.generate`` (``layers.py:126``) — iterates
  ``zip(docs, metas)`` and calls ``meta.get(key)`` in the importance
  loop. A single None metadata blows up the entire wake-up render.

* ``layers.Layer2.retrieve`` (``layers.py:224``) — same pattern, same
  crash path for the on-demand render.

Apply the same ``meta = meta or {}`` / ``doc = doc or ""`` idiom used
by the merged guards in the search path. Three-line additions, no
behaviour change on well-formed results.

Tests added:

* ``test_check_duplicate_handles_none_metadata`` — mocks the collection
  query to return ``None`` for one metadata and document, asserts the
  call does not crash and the sentinel-rendered entry has wing/room "?"
  and empty content.
* ``test_layer1_handles_none_metadata`` / ``_handles_none_document``
* ``test_layer2_handles_none_metadata``

Relationship to other open PRs:

* **MemPalace#1019** guarded ``searcher.py`` loops. This PR extends the same
  guard to the three call sites MemPalace#1019 did not touch.
* **MemPalace#979** fixed ``tool_check_duplicate`` negative similarity but left
  the None-metadata path unguarded.
* Does not overlap **MemPalace#1013** (``Layer3.search_raw``) or **MemPalace#999**.
igorls added a commit that referenced this pull request Apr 19, 2026
Version bumps across pyproject.toml, mempalace/version.py, README badge,
uv.lock, and plugin manifests (.claude-plugin/*, .codex-plugin/*).

CHANGELOG aligned with main (post-3.3.1) and a new [3.3.2] section added
covering the 11 PRs merged on develop since v3.3.1 — silent-transcript-drop
fix + tandem sweeper (#998), None-metadata guards (#999, #1013),
chromadb ≥1.5.4 for Py 3.13/3.14 (#1010), Windows Unicode (#681),
HNSW quarantine recovery (#1000), PID stacking guard (#1023), doc-path
cleanup (#996, #1012), and RFC 001/002 internal scaffolding (#995, #1014, #990).
@igorls igorls mentioned this pull request Apr 19, 2026
8 tasks
jphein added a commit to jphein/mempalace that referenced this pull request Apr 24, 2026
…ndary

ChromaDB 1.5.x can return None inside the `metadatas` list even when
the write stored a dict. Three prior PRs added per-site `meta = meta
or {}` guards to compensate:

- MemPalace#999 — guards in searcher.py (CLI + API + closet-boost),
  miner.status(), and 4 mcp_server.py handlers
- MemPalace#1013 — guard in Layer3.search_raw()
- Various other paths picked up the same defensive pattern
  organically (e.g. palace_graph, layers.py L1 pre-filter)

That scatter works but violates the typed contract QueryResult and
GetResult declare: both advertise `metadatas: list[dict]`, never
`list[Optional[dict]]`. Every call site that forgot the guard is a
latent `AttributeError: 'NoneType' object has no attribute 'get'`.

This PR moves the coercion to the single place the type contract is
produced — ChromaCollection.query() and ChromaCollection.get() in
backends/chroma.py. Each individual None dict gets coerced to {} at
the same boundary where the outer list is already normalized via
_none_list_to_empty(). Downstream callers receive a guaranteed
list[dict] matching the declared return shape.

## Per-site guards

Leaving the existing guards in place as defensive belt-and-suspenders.
They become redundant after this boundary coercion but removing them
risks regressing any call path that might bypass the typed wrappers.
A follow-up cleanup PR can remove them once we're confident nothing
else reaches chromadb directly.

## Tests

Six new tests in tests/test_backends.py cover the boundary's behavior
on the known None-metadata shapes:
- query(): mixed None / dict in a single batch
- query(): all-None inner list
- query(): None dicts across multiple queries in one call
- get(): mixed None / dict
- get(): padding regression (short metas list → {} pads, never None)
- get(): metadatas not requested → empty list passthrough

Full test_backends.py suite: 35 passed (6 new).
Full repo suite: 1072 passed.
Ruff format + check clean.
jphein added a commit to jphein/mempalace that referenced this pull request Apr 25, 2026
`_tokenize` calls `text.lower()` unconditionally; when ChromaDB returns a
drawer with `documents` containing `None`, the hybrid-rerank path raises
`AttributeError: 'NoneType' object has no attribute 'lower'`.

Observed in production daemon log (2026-04-24 21:07:05) during a search
that triggered `_hybrid_rank → _bm25_scores → _tokenize`:

    File "mempalace/searcher.py", line 81, in _bm25_scores
        tokenized = [_tokenize(d) for d in documents]
    File "mempalace/searcher.py", line 52, in _tokenize
        return _TOKEN_RE.findall(text.lower())
    AttributeError: 'NoneType' object has no attribute 'lower'

Closes the gap left by the upstream None-metadata audit (MemPalace#999), which
covered metadata loops but not BM25 helpers. Returns `[]` for falsy input
so a None doc gets score 0.0 while the rest of the corpus reranks normally.

Three regression tests in TestBM25NoneSafety lock the behavior and reference
the production trace.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
jphein added a commit to jphein/mempalace that referenced this pull request Apr 25, 2026
ChromaDB can return None for drawers without metadata (legacy data,
partial writes — same root cause as upstream MemPalace#1020 / our PR MemPalace#1094).
build_graph at line 95 called meta.get("room", "") unconditionally,
which AttributeErrors on None and takes out every consumer of
build_graph for the whole call path: graph_stats, find_tunnels,
traverse, and (most visibly) the daemon's /stats endpoint.

Caught 2026-04-25 by palace-daemon's verify-routes.sh smoke test
against the canonical 151K-drawer palace — /stats was 500-ing on a
single None drawer.

Adds `if meta is None: continue` guard. Closes the same gap upstream's
MemPalace#999 None-metadata audit closed in searcher.py / mcp_server.py /
miner.status, just in a different file the audit didn't reach. The
graph-build is recoverable: skipping a single None drawer doesn't
distort the graph since build_graph already filters
`room and room != "general" and wing` — a missing-metadata drawer was
never going to participate anyway.

Test: TestBuildGraph::test_none_metadata_does_not_crash mixes a None
entry into a 3-drawer fixture and asserts the two real drawers are
processed normally.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
jphein added a commit to jphein/mempalace that referenced this pull request Apr 25, 2026
palace_graph.build_graph None-metadata guard fix (commit 5fd15db).
Caught by palace-daemon's verify-routes.sh on 2026-04-25; /stats
was 500-ing on a single legacy drawer.

Filed upstream as PR MemPalace#1201 — same lineage as MemPalace#999 (None-metadata
audit) and MemPalace#1094 (None-coercion at backend boundary).

Updates the row count: 14 merged, 8 open (added MemPalace#1201), 10 closed.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
jphein added a commit to jphein/mempalace that referenced this pull request Apr 26, 2026
…ases A–C)

Implements the structural fix promoted by the 2026-04-25 Cat 9 A/B
(632 vs 3 tokens per query): Stop-hook auto-save checkpoint diary
entries no longer share an index with content drawers.

Phase A — scaffolding (no behavior change):
- _SESSION_RECOVERY_COLLECTION constant + get_session_recovery_collection()
  in palace.py, mirroring get_collection's shape (cosine + thread-pin).
- New tests/test_session_recovery.py: 3 tests covering constant, metadata,
  multi-collection coexistence on one palace.

Phase B — write routing (the behavior change):
- tool_diary_write routes topic in _CHECKPOINT_TOPICS to the recovery
  collection; everything else stays in the main collection.
- _get_session_recovery_collection() in mcp_server.py with parallel cache.
- conftest._reset_mcp_cache resets the new cache between tests.
- 4 tests in TestCheckpointRouting — checkpoint→recovery, auto-save→recovery,
  general→main, search regression (checkpoints structurally invisible to
  mempalace_search now, not merely post-filtered).

Phase C — new read path:
- tool_session_recovery_read reads from the recovery collection only,
  with optional filters: session_id, agent, since, until, wing, limit.
- session_id added as optional metadata field on tool_diary_write so
  recovery reads can filter by Claude Code session.
- Defensive None-metadata handling per the MemPalace#999 / MemPalace#1094 / MemPalace#1201 family.
- Registered in TOOLS dict + documented in website/reference/mcp-tools.md
  (test_no_undocumented_tools regression caught the missing doc; fixed).
- 5 tests in TestSessionRecoveryRead — empty case, session_id filter,
  agent filter, limit, None-metadata defense.

Phases D (migration of existing checkpoints out of main collection) and
E (palace-daemon startup integration) explicitly deferred — multi-day
work, gated on a separate go-ahead.

Full suite 1360/1360 pass; 106 benchmark/stress deselected per default.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
jphein added a commit to jphein/mempalace that referenced this pull request Apr 26, 2026
…rry-pick of MemPalace#1094)

Fork main was carrying the per-site `meta = meta or {}` guards from MemPalace#999
in eight read paths but didn't have the boundary coercion that closes
the issue once for all callers.

Cherry-picked MemPalace#1094 (open upstream, jp-authored) so the typed
QueryResult/GetResult contract — both declare `metadatas: list[dict]`,
never `list[Optional[dict]]` — is honored at the chromadb adapter
boundary. Three same-family fork-ahead PRs we filed since (MemPalace#1198
_tokenize None-doc, MemPalace#1201 palace_graph None metadata, MemPalace#1199 review)
all pointed at gaps that would have been impossible if this pattern
had been in place. Future None-metadata bugs of the same shape are
now catchable at one site instead of N.

Per-site guards are kept as belt-and-suspenders for paths that might
bypass the typed wrappers (e.g. legacy `import chromadb` direct calls).
A future cleanup PR can remove them once we're confident no caller
reaches chromadb directly.

6 new tests in test_backends.py (MemPalace#1094 originals); composes cleanly with
fork main's _segment_appears_healthy + _quarantined_paths from earlier
today. Full suite 1383/1383.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
igorls pushed a commit that referenced this pull request Apr 26, 2026
ChromaDB can return None for drawers without metadata (legacy data,
partial writes — same root cause as upstream #1020 / our PR #1094).
build_graph at line 95 called meta.get("room", "") unconditionally,
which AttributeErrors on None and takes out every consumer of
build_graph for the whole call path: graph_stats, find_tunnels,
traverse, and (most visibly) the daemon's /stats endpoint.

Caught 2026-04-25 by palace-daemon's verify-routes.sh smoke test
against the canonical 151K-drawer palace — /stats was 500-ing on a
single None drawer.

Adds `if meta is None: continue` guard. Closes the same gap upstream's
#999 None-metadata audit closed in searcher.py / mcp_server.py /
miner.status, just in a different file the audit didn't reach. The
graph-build is recoverable: skipping a single None drawer doesn't
distort the graph since build_graph already filters
`room and room != "general" and wing` — a missing-metadata drawer was
never going to participate anyway.

Test: TestBuildGraph::test_none_metadata_does_not_crash mixes a None
entry into a 3-drawer fixture and asserts the two real drawers are
processed normally.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
igorls pushed a commit that referenced this pull request Apr 26, 2026
`_tokenize` calls `text.lower()` unconditionally; when ChromaDB returns a
drawer with `documents` containing `None`, the hybrid-rerank path raises
`AttributeError: 'NoneType' object has no attribute 'lower'`.

Observed in production daemon log (2026-04-24 21:07:05) during a search
that triggered `_hybrid_rank → _bm25_scores → _tokenize`:

    File "mempalace/searcher.py", line 81, in _bm25_scores
        tokenized = [_tokenize(d) for d in documents]
    File "mempalace/searcher.py", line 52, in _tokenize
        return _TOKEN_RE.findall(text.lower())
    AttributeError: 'NoneType' object has no attribute 'lower'

Closes the gap left by the upstream None-metadata audit (#999), which
covered metadata loops but not BM25 helpers. Returns `[]` for falsy input
so a None doc gets score 0.0 while the rest of the corpus reranks normally.

Three regression tests in TestBM25NoneSafety lock the behavior and reference
the production trace.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
lealvona pushed a commit to lealvona/mempalace that referenced this pull request Apr 29, 2026
ChromaDB can return None for drawers without metadata (legacy data,
partial writes — same root cause as upstream MemPalace#1020 / our PR MemPalace#1094).
build_graph at line 95 called meta.get("room", "") unconditionally,
which AttributeErrors on None and takes out every consumer of
build_graph for the whole call path: graph_stats, find_tunnels,
traverse, and (most visibly) the daemon's /stats endpoint.

Caught 2026-04-25 by palace-daemon's verify-routes.sh smoke test
against the canonical 151K-drawer palace — /stats was 500-ing on a
single None drawer.

Adds `if meta is None: continue` guard. Closes the same gap upstream's
MemPalace#999 None-metadata audit closed in searcher.py / mcp_server.py /
miner.status, just in a different file the audit didn't reach. The
graph-build is recoverable: skipping a single None drawer doesn't
distort the graph since build_graph already filters
`room and room != "general" and wing` — a missing-metadata drawer was
never going to participate anyway.

Test: TestBuildGraph::test_none_metadata_does_not_crash mixes a None
entry into a 3-drawer fixture and asserts the two real drawers are
processed normally.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
lealvona pushed a commit to lealvona/mempalace that referenced this pull request Apr 29, 2026
`_tokenize` calls `text.lower()` unconditionally; when ChromaDB returns a
drawer with `documents` containing `None`, the hybrid-rerank path raises
`AttributeError: 'NoneType' object has no attribute 'lower'`.

Observed in production daemon log (2026-04-24 21:07:05) during a search
that triggered `_hybrid_rank → _bm25_scores → _tokenize`:

    File "mempalace/searcher.py", line 81, in _bm25_scores
        tokenized = [_tokenize(d) for d in documents]
    File "mempalace/searcher.py", line 52, in _tokenize
        return _TOKEN_RE.findall(text.lower())
    AttributeError: 'NoneType' object has no attribute 'lower'

Closes the gap left by the upstream None-metadata audit (MemPalace#999), which
covered metadata loops but not BM25 helpers. Returns `[]` for falsy input
so a None doc gets score 0.0 while the rest of the corpus reranks normally.

Three regression tests in TestBM25NoneSafety lock the behavior and reference
the production trace.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants