Skip to content

fix: guard searcher loop against None doc/meta from ChromaDB (#1007)#1009

Closed
bensig wants to merge 1 commit intodevelopfrom
fix/searcher-none-metadata-guard
Closed

fix: guard searcher loop against None doc/meta from ChromaDB (#1007)#1009
bensig wants to merge 1 commit intodevelopfrom
fix/searcher-none-metadata-guard

Conversation

@bensig
Copy link
Copy Markdown
Collaborator

@bensig bensig commented Apr 18, 2026

Summary

Fixes #1007.

`mempalace search` crashes with `AttributeError: 'NoneType' object has no attribute 'get'` when any result has `None` metadata. Happens routinely in partial-flush states (very common on chromadb 1.5.x per #1006, but also possible during interrupted mines or schema upgrades even on fixed versions).

Change

Two-line defensive coercion at the top of the results loop in `mempalace/searcher.py`:

```diff
for i, (doc, meta, dist) in enumerate(zip(docs, metas, dists), 1):

  • meta = meta or {}
  • doc = doc or ""
    similarity = round(max(0.0, 1 - dist), 3)
    source = Path(meta.get("source_file", "?")).name
    ```

Result: the hit still shows with its real distance, source/wing/room fall back to `"?"`, the document body prints empty. User sees that a hit exists even if its content row is missing — strictly more useful than crashing.

Test plan

Related

ChromaDB query() can return None in the documents and metadatas arrays
when a drawer's HNSW vector entry exists but its metadata/document rows
haven't been materialized yet — partial-flush states, mid-delete, or
schema upgrade boundaries. The search reader currently assumes meta is
always a dict and doc always a str, so a None entry raises:

  AttributeError: 'NoneType' object has no attribute 'get'

Two-line defensive coercion: meta = meta or {}, doc = doc or "". The
hit still appears with its real distance and index; source/wing/room
fall back to "?" where content is missing. That is strictly more useful
than crashing the whole search.

Hit frequently today on chromadb 1.5.x (see #1006 root cause). Even
after that pin lands, partial-state results remain possible on
interrupted mines and schema boundaries, so the defensive guard is
worth having on its own.

See #1007 for traceback and reproduction.
@bensig bensig requested a review from milla-jovovich as a code owner April 18, 2026 18:13
@bensig
Copy link
Copy Markdown
Collaborator Author

bensig commented Apr 18, 2026

Code review

Found 1 issue:

  1. A more comprehensive predecessor is already open: PR #999 (fix(searcher): guard against None metadata in CLI print path), opened ~72 minutes before this PR. fix(searcher): guard against None metadata in CLI print path #999 applies the same meta = meta or {} guard in search() plus extends coverage to search_memories() (the MCP-facing path) and miner.status(), and adds regression tests for both. A Copilot inline comment on #999 line 288 explicitly flagged the search_memories() gap, which fix(searcher): guard against None metadata in CLI print path #999 addresses but this PR does not. Since search_memories() is the production MCP path most likely to be hit, one of these PRs should be closed as superseded in favor of the other — the narrower scope here means the same AttributeError will still reach MCP tool consumers even after this PR merges.

for i, (doc, meta, dist) in enumerate(zip(docs, metas, dists), 1):
# ChromaDB may return None for doc/meta when a drawer's HNSW entry
# exists but its metadata/document rows haven't been materialized
# (partial-flush states, mid-delete, schema upgrade boundaries).
# Degrade gracefully rather than crash — the hit still appears with
# real distance; storage fields show "?" where content is missing.
meta = meta or {}
doc = doc or ""

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@bensig
Copy link
Copy Markdown
Collaborator Author

bensig commented Apr 18, 2026

Closing in favor of #999 (opened ~72 min earlier and more comprehensive).

#999 covers the same meta = meta or {} guard in search(), extends it to search_memories() (the MCP-facing path), and also guards miner.status(), with regression tests for both paths. A Copilot inline comment on #999 line 288 explicitly flagged the search_memories() gap — #999 addresses it; this narrower PR does not.

Since search_memories() is the production MCP path, leaving it unguarded would mean the same AttributeError is still reachable from MCP tool consumers even after this PR merges. No reason to duplicate the diff.

Issue #1007 remains open and is covered by #999.

@bensig bensig closed this Apr 18, 2026
bensig added a commit that referenced this pull request Apr 18, 2026
Same class of bug as #1007: ChromaDB's query() can return None in the
documents and metadatas arrays when a drawer's HNSW vector entry exists
but its metadata/document rows haven't been materialized. The code in
Layer3.search_raw (mempalace/layers.py) calls meta.get("wing", ...),
meta.get("room", ...), meta.get("source_file", ...) directly without
null safety, so it raises:

  AttributeError: 'NoneType' object has no attribute 'get'

Two-line defensive coercion matching the pattern in #1009 /
PR #999 for searcher.py: meta = meta or {}, doc = doc or "".
The hit still appears with its real distance; source/wing/room
fall back to their fallback values where the metadata row is missing.

Frequently hit on chromadb 1.5.x (root cause #1006). Even after the
chromadb floor lands (#1010), partial-state results remain possible
during interrupted mines and schema upgrade boundaries, so the guard
is worth having on its own.

Fixes #1011.
bensig added a commit that referenced this pull request Apr 18, 2026
Same class of bug as #1007: ChromaDB's query() can return None in the
documents and metadatas arrays when a drawer's HNSW vector entry exists
but its metadata/document rows haven't been materialized. The code in
Layer3.search_raw (mempalace/layers.py) calls meta.get("wing", ...),
meta.get("room", ...), meta.get("source_file", ...) directly without
null safety, so it raises:

  AttributeError: 'NoneType' object has no attribute 'get'

Two-line defensive coercion matching the pattern in #1009 /
PR #999 for searcher.py: meta = meta or {}, doc = doc or "".
The hit still appears with its real distance; source/wing/room
fall back to their fallback values where the metadata row is missing.

Frequently hit on chromadb 1.5.x (root cause #1006). Even after the
chromadb floor lands (#1010), partial-state results remain possible
during interrupted mines and schema upgrade boundaries, so the guard
is worth having on its own.

Fixes #1011.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: mempalace search crashes with AttributeError: 'NoneType' object has no attribute 'get' on partial-state results (searcher.py:286)

1 participant