Skip to content

refactor(backends/chroma): coerce None metadatas to {} at backend boundary (closes #1020)#1094

Open
jphein wants to merge 2 commits intoMemPalace:developfrom
jphein:pr/coerce-none-metadatas-at-boundary
Open

refactor(backends/chroma): coerce None metadatas to {} at backend boundary (closes #1020)#1094
jphein wants to merge 2 commits intoMemPalace:developfrom
jphein:pr/coerce-none-metadatas-at-boundary

Conversation

@jphein
Copy link
Copy Markdown
Collaborator

@jphein jphein commented Apr 22, 2026

Summary

Fixes #1020. Moves the None-metadata coercion out of per-read-site defensive guards (meta = meta or {} scattered across searcher.py, layers.py, palace_graph.py, miner.py, etc.) and into the single place the typed contract is produced: ChromaCollection.query() and ChromaCollection.get() in backends/chroma.py.

Why

ChromaDB 1.5.x can return None inside the metadatas list even when the write stored a dict. Three prior PRs already responded to this:

That scatter works but violates the typed contract QueryResult and GetResult declare: both advertise metadatas: list[dict], never list[Optional[dict]]. Every call site that forgets the guard is a latent AttributeError: 'NoneType' object has no attribute 'get'.

A boundary coercion is the right architectural fix: one place to handle the backend's quirk, downstream code gets the shape its type annotations promise.

What changed

  • mempalace/backends/chroma.py: added inner-list None-coercion inside ChromaCollection.query() (alongside the existing _none_list_to_empty outer coercion) and inside ChromaCollection.get() (after the existing {}-padding step). The guarantee becomes: whenever metadatas is requested, every element is a dict — never None.
  • tests/test_backends.py: +6 tests covering the shapes the production code actually sees.

What did NOT change

  • The existing per-site meta = meta or {} guards stay in place as defensive belt-and-suspenders. They're redundant after this coercion but removing them risks regressing any call path that bypasses the typed wrappers. A follow-up cleanup PR can remove them once we're confident nothing else reaches chromadb directly.
  • No behavior change on the write side. Writes still get whatever metadata the caller provides; this PR is strictly read-side.
  • No change to QueryResult / GetResult shape. Same fields, same types — just guaranteed honored at the boundary instead of honored-with-asterisks.

Test plan

  • pytest tests/test_backends.py35 passed (6 new, 29 existing)
  • pytest tests/ full suite — 1072 passed, 106 deselected
  • ruff check on modified files — clean
  • ruff format --check with pinned 0.4.10 — clean

New test cases specifically target:

Test Exercises
test_chroma_collection_query_coerces_none_metadatas_to_empty_dict mixed None/dict in a single-query batch
test_chroma_collection_query_coerces_all_none_inner_metadatas degenerate: every metadata is None
test_chroma_collection_query_coerces_none_across_multiple_queries None dicts spread across two queries in one call
test_chroma_collection_get_coerces_none_metadatas_to_empty_dict mixed None/dict on get() path
test_chroma_collection_get_coerces_padding_remains_dict regression guard: short metas list → {} pads stay {}
test_chroma_collection_get_without_metadatas_requested_stays_empty when metadatas not in include=, don't fabricate

Context

I'd committed to driving this on #1020 on 2026-04-18; #1021's merge this week cleared the blocker. @bensig — happy to split into two PRs (the coercion itself and a separate guard-removal follow-up) if you'd prefer landing them serially, but as written this PR keeps behavior strictly monotonic: everything that worked before still works, plus one class of latent None-metadata bugs can't fire anymore.

Copilot AI review requested due to automatic review settings April 22, 2026 02:33
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the ChromaDB adapter boundary to guarantee typed QueryResult / GetResult contracts by coercing None entries inside returned metadatas into {}, preventing downstream meta.get(...) crashes caused by ChromaDB 1.5.x.

Changes:

  • Coerce inner-list None metadata dicts to {} in ChromaCollection.query() and ChromaCollection.get().
  • Add backend-level regression tests covering mixed/degenerate None metadata shapes and include= behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
mempalace/backends/chroma.py Normalizes metadatas so callers always receive list[dict] (no inner None).
tests/test_backends.py Adds adapter-level tests validating the new coercion behavior for query() and get().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/test_backends.py
assert result.metadatas == [[{"wing": "w1"}, {}, {"wing": "w3"}]]
# Ids/docs/distances unaffected — their inner None handling is already
# covered by the outer-list coercion.
assert result.ids == [["a", "b", "c"]]
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test’s comment says ids/docs/distances are unaffected, but it never asserts result.documents. Adding a documents assertion would make the test actually cover that invariant and catch regressions in the query() return-shaping logic.

Suggested change
assert result.ids == [["a", "b", "c"]]
assert result.ids == [["a", "b", "c"]]
assert result.documents == [["da", "db", "dc"]]

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in bf91b27 — added assert result.documents == [["da", "db", "dc"]] so the comment's claim is actually pinned by the test. 12/12 backend tests pass.

jphein added a commit to jphein/mempalace that referenced this pull request Apr 22, 2026
…emPalace#1094 rows + MemPalace#659 CLEAN

Three open-PR rows were missing (cmd_export, cmd_purge, None-metadata
boundary). MemPalace#659 moved from MERGEABLE to CLEAN after the 2026-04-22
rebase onto current develop cleared its stale-merge blocker. Open-PR
count: 4 → 7.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@igorls igorls added bug Something isn't working storage labels Apr 24, 2026
…ndary

ChromaDB 1.5.x can return None inside the `metadatas` list even when
the write stored a dict. Three prior PRs added per-site `meta = meta
or {}` guards to compensate:

- MemPalace#999 — guards in searcher.py (CLI + API + closet-boost),
  miner.status(), and 4 mcp_server.py handlers
- MemPalace#1013 — guard in Layer3.search_raw()
- Various other paths picked up the same defensive pattern
  organically (e.g. palace_graph, layers.py L1 pre-filter)

That scatter works but violates the typed contract QueryResult and
GetResult declare: both advertise `metadatas: list[dict]`, never
`list[Optional[dict]]`. Every call site that forgot the guard is a
latent `AttributeError: 'NoneType' object has no attribute 'get'`.

This PR moves the coercion to the single place the type contract is
produced — ChromaCollection.query() and ChromaCollection.get() in
backends/chroma.py. Each individual None dict gets coerced to {} at
the same boundary where the outer list is already normalized via
_none_list_to_empty(). Downstream callers receive a guaranteed
list[dict] matching the declared return shape.

## Per-site guards

Leaving the existing guards in place as defensive belt-and-suspenders.
They become redundant after this boundary coercion but removing them
risks regressing any call path that might bypass the typed wrappers.
A follow-up cleanup PR can remove them once we're confident nothing
else reaches chromadb directly.

## Tests

Six new tests in tests/test_backends.py cover the boundary's behavior
on the known None-metadata shapes:
- query(): mixed None / dict in a single batch
- query(): all-None inner list
- query(): None dicts across multiple queries in one call
- get(): mixed None / dict
- get(): padding regression (short metas list → {} pads, never None)
- get(): metadatas not requested → empty list passthrough

Full test_backends.py suite: 35 passed (6 new).
Full repo suite: 1072 passed.
Ruff format + check clean.
@jphein jphein force-pushed the pr/coerce-none-metadatas-at-boundary branch from 2806d9a to 52e376f Compare April 24, 2026 21:25
jphein added a commit to jphein/mempalace that referenced this pull request Apr 25, 2026
ChromaDB can return None for drawers without metadata (legacy data,
partial writes — same root cause as upstream MemPalace#1020 / our PR MemPalace#1094).
build_graph at line 95 called meta.get("room", "") unconditionally,
which AttributeErrors on None and takes out every consumer of
build_graph for the whole call path: graph_stats, find_tunnels,
traverse, and (most visibly) the daemon's /stats endpoint.

Caught 2026-04-25 by palace-daemon's verify-routes.sh smoke test
against the canonical 151K-drawer palace — /stats was 500-ing on a
single None drawer.

Adds `if meta is None: continue` guard. Closes the same gap upstream's
MemPalace#999 None-metadata audit closed in searcher.py / mcp_server.py /
miner.status, just in a different file the audit didn't reach. The
graph-build is recoverable: skipping a single None drawer doesn't
distort the graph since build_graph already filters
`room and room != "general" and wing` — a missing-metadata drawer was
never going to participate anyway.

Test: TestBuildGraph::test_none_metadata_does_not_crash mixes a None
entry into a 3-drawer fixture and asserts the two real drawers are
processed normally.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
jphein added a commit to jphein/mempalace that referenced this pull request Apr 25, 2026
palace_graph.build_graph None-metadata guard fix (commit 5fd15db).
Caught by palace-daemon's verify-routes.sh on 2026-04-25; /stats
was 500-ing on a single legacy drawer.

Filed upstream as PR MemPalace#1201 — same lineage as MemPalace#999 (None-metadata
audit) and MemPalace#1094 (None-coercion at backend boundary).

Updates the row count: 14 merged, 8 open (added MemPalace#1201), 10 closed.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
jphein added a commit to jphein/mempalace that referenced this pull request Apr 26, 2026
…ases A–C)

Implements the structural fix promoted by the 2026-04-25 Cat 9 A/B
(632 vs 3 tokens per query): Stop-hook auto-save checkpoint diary
entries no longer share an index with content drawers.

Phase A — scaffolding (no behavior change):
- _SESSION_RECOVERY_COLLECTION constant + get_session_recovery_collection()
  in palace.py, mirroring get_collection's shape (cosine + thread-pin).
- New tests/test_session_recovery.py: 3 tests covering constant, metadata,
  multi-collection coexistence on one palace.

Phase B — write routing (the behavior change):
- tool_diary_write routes topic in _CHECKPOINT_TOPICS to the recovery
  collection; everything else stays in the main collection.
- _get_session_recovery_collection() in mcp_server.py with parallel cache.
- conftest._reset_mcp_cache resets the new cache between tests.
- 4 tests in TestCheckpointRouting — checkpoint→recovery, auto-save→recovery,
  general→main, search regression (checkpoints structurally invisible to
  mempalace_search now, not merely post-filtered).

Phase C — new read path:
- tool_session_recovery_read reads from the recovery collection only,
  with optional filters: session_id, agent, since, until, wing, limit.
- session_id added as optional metadata field on tool_diary_write so
  recovery reads can filter by Claude Code session.
- Defensive None-metadata handling per the MemPalace#999 / MemPalace#1094 / MemPalace#1201 family.
- Registered in TOOLS dict + documented in website/reference/mcp-tools.md
  (test_no_undocumented_tools regression caught the missing doc; fixed).
- 5 tests in TestSessionRecoveryRead — empty case, session_id filter,
  agent filter, limit, None-metadata defense.

Phases D (migration of existing checkpoints out of main collection) and
E (palace-daemon startup integration) explicitly deferred — multi-day
work, gated on a separate go-ahead.

Full suite 1360/1360 pass; 106 benchmark/stress deselected per default.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
jphein added a commit to jphein/mempalace that referenced this pull request Apr 26, 2026
…rry-pick of MemPalace#1094)

Fork main was carrying the per-site `meta = meta or {}` guards from MemPalace#999
in eight read paths but didn't have the boundary coercion that closes
the issue once for all callers.

Cherry-picked MemPalace#1094 (open upstream, jp-authored) so the typed
QueryResult/GetResult contract — both declare `metadatas: list[dict]`,
never `list[Optional[dict]]` — is honored at the chromadb adapter
boundary. Three same-family fork-ahead PRs we filed since (MemPalace#1198
_tokenize None-doc, MemPalace#1201 palace_graph None metadata, MemPalace#1199 review)
all pointed at gaps that would have been impossible if this pattern
had been in place. Future None-metadata bugs of the same shape are
now catchable at one site instead of N.

Per-site guards are kept as belt-and-suspenders for paths that might
bypass the typed wrappers (e.g. legacy `import chromadb` direct calls).
A future cleanup PR can remove them once we're confident no caller
reaches chromadb directly.

6 new tests in test_backends.py (MemPalace#1094 originals); composes cleanly with
fork main's _segment_appears_healthy + _quarantined_paths from earlier
today. Full suite 1383/1383.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
jphein added a commit to jphein/mempalace that referenced this pull request Apr 26, 2026
Fork-main now carries:
- MemPalace#1087 rewrite (cmd_purge: collection.delete(where=...) instead of nuke-and-rebuild) — commit 366a9ad
- MemPalace#1094 cherry-pick (coerce None metadatas at chromadb boundary) — commit 43d728d

YAML manifest gains 2 entries; FORK_CHANGELOG regenerated. README test
count bumped 1372 → 1383. CLAUDE.md gets:
- row updated for MemPalace#1087 (rewrite per @igorls's review)
- new "Closed by jphein-with-triage" subsection noting MemPalace#622 closure
  (architectural concern resolved by MemPalace#673; verified before clicking)

Triage perms make audits load-bearing. New feedback memory:
feedback_audit_comments_with_triage_perms.md — verify PR/commit claims
via gh before posting, gh api PATCH supports edits, track in-comment
promises in scratch/promises.md.

scripts/check-docs.sh ran clean across all 4 stages.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
igorls pushed a commit that referenced this pull request Apr 26, 2026
ChromaDB can return None for drawers without metadata (legacy data,
partial writes — same root cause as upstream #1020 / our PR #1094).
build_graph at line 95 called meta.get("room", "") unconditionally,
which AttributeErrors on None and takes out every consumer of
build_graph for the whole call path: graph_stats, find_tunnels,
traverse, and (most visibly) the daemon's /stats endpoint.

Caught 2026-04-25 by palace-daemon's verify-routes.sh smoke test
against the canonical 151K-drawer palace — /stats was 500-ing on a
single None drawer.

Adds `if meta is None: continue` guard. Closes the same gap upstream's
#999 None-metadata audit closed in searcher.py / mcp_server.py /
miner.status, just in a different file the audit didn't reach. The
graph-build is recoverable: skipping a single None drawer doesn't
distort the graph since build_graph already filters
`room and room != "general" and wing` — a missing-metadata drawer was
never going to participate anyway.

Test: TestBuildGraph::test_none_metadata_does_not_crash mixes a None
entry into a 3-drawer fixture and asserts the two real drawers are
processed normally.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
jphein added a commit to jphein/mempalace that referenced this pull request Apr 26, 2026
Brings in upstream's corpus-origin + privacy-warning track (PRs MemPalace#1211
MemPalace#1221 MemPalace#1223 MemPalace#1224 MemPalace#1225) plus the canonical merged versions of our four
PRs that landed today (21:22-21:41 UTC):

  MemPalace#1173  quarantine_stale_hnsw on make_client + cold-start gate +
         integrity sniff-test (PROTO/STOP byte check, no deserialization)
  MemPalace#1177  .blob_seq_ids_migrated marker guard, closes MemPalace#1090
  MemPalace#1198  _tokenize None-document guard in BM25 reranker
  MemPalace#1201  palace_graph.build_graph skips None metadata

Conflict resolution:

* mempalace/backends/chroma.py — took upstream as base (it has the
  igorls-review pickle-protocol docstring, thread-safety paragraph, and
  Path(marker).touch() style nit), then re-applied MemPalace#1094's _coerce_none_metas
  in query()/get() since MemPalace#1094 is still open and not yet in develop.

* mempalace/mcp_server.py — took upstream's clean form. Dropped the
  fork-only `palace_path=` kwarg from four ChromaCollection() call sites:
  the kwarg was load-bearing for MemPalace#1171's per-collection write lock, but
  MemPalace#1171 closed in favor of MemPalace#976's mine_global_lock + daemon-strict, so
  the kwarg has no remaining consumer. ChromaCollection.__init__ in
  upstream/develop is back to (self, collection); calling it with
  palace_path= raised TypeError → silently swallowed by the broad except
  in _get_collection() → returned None → tool_status() returned _no_palace().
  41 mcp_server tests went from failing-with-KeyError to passing.

* mempalace/cli.py — dropped fork-only `workers=args.workers` from the
  cmd_mine -> miner.mine() call. Pre-existing fork-side bug: the
  `--workers` argparse arg landed in 5cd14bd but miner.mine() never
  accepted a workers param, so production `mempalace mine` TypeError'd
  on every invocation. Removed the broken plumbing; tests/test_cli.py
  updated to match.

* CHANGELOG.md — took upstream verbatim. Fork-specific changelog lives
  in FORK_CHANGELOG.md (canonical: docs/fork-changes.yaml).

* CLAUDE.md — kept ours. Fork's CLAUDE.md is operational; upstream's
  added a "Design Principles / Contributing" charter, which lives in
  README.md on the fork.

* tests/test_backends.py — took upstream's ruff-formatted line widths.

docs/fork-changes.yaml flips the two MemPalace#1173 entries (hnsw-integrity-gate,
hnsw-cold-start-gate) and the MemPalace#1201 entry (palace-graph-none-guard) from
OPEN to MERGED 2026-04-26. MemPalace#1173 MemPalace#1177 MemPalace#1198 MemPalace#1201 added to the
merged_upstream archive at the bottom. FORK_CHANGELOG.md regenerated.

scripts/check-docs.sh: 4/4 clean.
Test suite: 1460/1460.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
lealvona pushed a commit to lealvona/mempalace that referenced this pull request Apr 29, 2026
ChromaDB can return None for drawers without metadata (legacy data,
partial writes — same root cause as upstream MemPalace#1020 / our PR MemPalace#1094).
build_graph at line 95 called meta.get("room", "") unconditionally,
which AttributeErrors on None and takes out every consumer of
build_graph for the whole call path: graph_stats, find_tunnels,
traverse, and (most visibly) the daemon's /stats endpoint.

Caught 2026-04-25 by palace-daemon's verify-routes.sh smoke test
against the canonical 151K-drawer palace — /stats was 500-ing on a
single None drawer.

Adds `if meta is None: continue` guard. Closes the same gap upstream's
MemPalace#999 None-metadata audit closed in searcher.py / mcp_server.py /
miner.status, just in a different file the audit didn't reach. The
graph-build is recoverable: skipping a single None drawer doesn't
distort the graph since build_graph already filters
`room and room != "general" and wing` — a missing-metadata drawer was
never going to participate anyway.

Test: TestBuildGraph::test_none_metadata_does_not_crash mixes a None
entry into a 3-drawer fixture and asserts the two real drawers are
processed normally.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Closes the test-coverage hint from @copilot-pull-request-reviewer
on MemPalace#1094: the surrounding comment claimed ids/docs/distances are
unaffected by metadata-coercion, but the test never asserted
\`result.documents\`. Adding it locks in the invariant — a future
regression in the query() return-shaping that incidentally drops
or rewrites documents would now fail this test rather than slip by.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working storage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

refactor: coerce None metadatas at ChromaDB backend boundary (supersedes per-site guards from #999, #1013)

3 participants