Skip to content

fix(palace_graph): skip None metadata in build_graph#1201

Merged
igorls merged 1 commit intoMemPalace:developfrom
jphein:fix/palace-graph-none-metadata
Apr 26, 2026
Merged

fix(palace_graph): skip None metadata in build_graph#1201
igorls merged 1 commit intoMemPalace:developfrom
jphein:fix/palace-graph-none-metadata

Conversation

@jphein
Copy link
Copy Markdown
Collaborator

@jphein jphein commented Apr 25, 2026

Problem

palace_graph.build_graph at line 95 calls meta.get("room", "") unconditionally on every entry in batch["metadatas"]. ChromaDB returns None for drawers without metadata (legacy data, partial writes — same root cause as the entries fixed in #999 and #1094 in different files), and a single None drawer crashes the entire graph build with AttributeError: 'NoneType' object has no attribute 'get'.

This takes out every consumer of build_graph: graph_stats, find_tunnels, traverse, and any palace-daemon-style HTTP endpoint that calls them. Caught 2026-04-25 by a smoke-test script against a 151K-drawer production palace — /stats was 500-ing on a single bad drawer.

Fix

Skip None entries with a continue early-return inside the loop. The graph build is recoverable: build_graph already filters room and room != "general" and wing immediately downstream, so a missing-metadata drawer was never going to participate in the result anyway. The skip is silent because logging every legacy-data drawer would be log spam on palaces that have many of them.

for meta in batch["metadatas"]:
    if meta is None:
        continue
    room = meta.get("room", "")
    ...

Closes the same gap as

palace_graph.py is a separate read path that the #999 audit didn't reach, and #1094 (when it merges) handles this at the backend layer for new writes — but legacy drawers already in the palace need this defensive check.

Tests

TestBuildGraph::test_none_metadata_does_not_crash constructs a fixture with a None entry mixed between two real drawers and asserts the build completes with the two real drawers processed normally.

🤖 Generated with Claude Code

ChromaDB can return None for drawers without metadata (legacy data,
partial writes — same root cause as upstream MemPalace#1020 / our PR MemPalace#1094).
build_graph at line 95 called meta.get("room", "") unconditionally,
which AttributeErrors on None and takes out every consumer of
build_graph for the whole call path: graph_stats, find_tunnels,
traverse, and (most visibly) the daemon's /stats endpoint.

Caught 2026-04-25 by palace-daemon's verify-routes.sh smoke test
against the canonical 151K-drawer palace — /stats was 500-ing on a
single None drawer.

Adds `if meta is None: continue` guard. Closes the same gap upstream's
MemPalace#999 None-metadata audit closed in searcher.py / mcp_server.py /
miner.status, just in a different file the audit didn't reach. The
graph-build is recoverable: skipping a single None drawer doesn't
distort the graph since build_graph already filters
`room and room != "general" and wing` — a missing-metadata drawer was
never going to participate anyway.

Test: TestBuildGraph::test_none_metadata_does_not_crash mixes a None
entry into a 3-drawer fixture and asserts the two real drawers are
processed normally.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Copilot AI review requested due to automatic review settings April 25, 2026 18:07
jphein added a commit to jphein/mempalace that referenced this pull request Apr 25, 2026
palace_graph.build_graph None-metadata guard fix (commit 5fd15db).
Caught by palace-daemon's verify-routes.sh on 2026-04-25; /stats
was 500-ing on a single legacy drawer.

Filed upstream as PR MemPalace#1201 — same lineage as MemPalace#999 (None-metadata
audit) and MemPalace#1094 (None-coercion at backend boundary).

Updates the row count: 14 merged, 8 open (added MemPalace#1201), 10 closed.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a production crash in palace_graph.build_graph when ChromaDB returns None entries in metadatas (legacy/partial-write drawers), allowing graph-based endpoints like /stats to remain available even with bad rows.

Changes:

  • Skip None metadata entries during build_graph iteration to avoid AttributeError.
  • Add a regression test ensuring build_graph completes and counts valid drawers correctly when metadatas contains None.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
mempalace/palace_graph.py Adds a defensive None check while iterating drawer metadatas to prevent crashes.
tests/test_palace_graph.py Adds regression coverage for mixed {dict, None, dict} metadata batches.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

jphein added a commit to jphein/mempalace that referenced this pull request Apr 26, 2026
…ases A–C)

Implements the structural fix promoted by the 2026-04-25 Cat 9 A/B
(632 vs 3 tokens per query): Stop-hook auto-save checkpoint diary
entries no longer share an index with content drawers.

Phase A — scaffolding (no behavior change):
- _SESSION_RECOVERY_COLLECTION constant + get_session_recovery_collection()
  in palace.py, mirroring get_collection's shape (cosine + thread-pin).
- New tests/test_session_recovery.py: 3 tests covering constant, metadata,
  multi-collection coexistence on one palace.

Phase B — write routing (the behavior change):
- tool_diary_write routes topic in _CHECKPOINT_TOPICS to the recovery
  collection; everything else stays in the main collection.
- _get_session_recovery_collection() in mcp_server.py with parallel cache.
- conftest._reset_mcp_cache resets the new cache between tests.
- 4 tests in TestCheckpointRouting — checkpoint→recovery, auto-save→recovery,
  general→main, search regression (checkpoints structurally invisible to
  mempalace_search now, not merely post-filtered).

Phase C — new read path:
- tool_session_recovery_read reads from the recovery collection only,
  with optional filters: session_id, agent, since, until, wing, limit.
- session_id added as optional metadata field on tool_diary_write so
  recovery reads can filter by Claude Code session.
- Defensive None-metadata handling per the MemPalace#999 / MemPalace#1094 / MemPalace#1201 family.
- Registered in TOOLS dict + documented in website/reference/mcp-tools.md
  (test_no_undocumented_tools regression caught the missing doc; fixed).
- 5 tests in TestSessionRecoveryRead — empty case, session_id filter,
  agent filter, limit, None-metadata defense.

Phases D (migration of existing checkpoints out of main collection) and
E (palace-daemon startup integration) explicitly deferred — multi-day
work, gated on a separate go-ahead.

Full suite 1360/1360 pass; 106 benchmark/stress deselected per default.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Copy link
Copy Markdown
Collaborator

@bensig bensig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve. Same shape as #1198 — defensive skip of `None` with a real reproduction case (palace-daemon's verify-routes.sh against the canonical 151K palace, 2026-04-25).

What landed

`build_graph` now skips `if meta is None: continue` rather than crashing on `AttributeError`. The comment is the right shape — names the symptom (`/stats` 500-ing, taking out every consumer of `build_graph`), points at the upstream issue family (#1020 / #999 / #1094), and dates the catch.

Tests

`test_none_metadata_does_not_crash` exercises a 3-drawer collection where the middle drawer has `None` metadata and asserts both real drawers still get processed. Specifically asserts `nodes["auth"]["count"] == 2` so a regression that silently drops both real drawers along with the None one would catch. `pytest tests/test_palace_graph.py` — 24 passed on this branch.

Observation, not actionable

This is the third `None` defense in three review sessions (this one + #1198 BM25 tokenize + the diary_read `wing=""` from #1147). The pattern keeps recurring because Chroma's columnar return shape doesn't carry typed nulls, and downstream code keeps assuming non-None where the data layer doesn't promise it. Worth a follow-up at some point: a `@dataclass` adapter at the `ChromaCollection` boundary that fills None metadata with `{}` and None documents with `""` so consumers don't each need their own defensive layer. Out of scope for this PR.

Ship it.

jphein added a commit to jphein/mempalace that referenced this pull request Apr 26, 2026
…#1173 status

CLAUDE.md row 28 documents the canonical-source pipeline (5a01aec)
that landed earlier today. PR table updated to reflect bensig's
2026-04-26 approvals on four of our open PRs:

  - MemPalace#1173 (HNSW quarantine wire): approved on the original 1-commit
    shape, then force-pushed two safety commits (cold-start gate +
    integrity sniff-test) after a production cold-start incident
    destroyed three healthy 253MB segments. Now mergeable=CONFLICTING
    against develop (which moved with MemPalace#1210, MemPalace#1205, MemPalace#976) — needs
    rebase + re-review.
  - MemPalace#1177, MemPalace#1198, MemPalace#1201: approved, mergeable, awaiting merge.

YAML manifest gets a new entry for the doc pipeline itself; FORK_CHANGELOG
regenerated to match. promises.md (in claude-config, not this repo)
got a long log entry covering today's full output.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@jphein
Copy link
Copy Markdown
Collaborator Author

jphein commented Apr 26, 2026

Thanks @bensig — ready for merge whenever convenient. Tests stable on the fork's daily-driver palace + smoke pass on the develop branch.

jphein added a commit to jphein/mempalace that referenced this pull request Apr 26, 2026
…rry-pick of MemPalace#1094)

Fork main was carrying the per-site `meta = meta or {}` guards from MemPalace#999
in eight read paths but didn't have the boundary coercion that closes
the issue once for all callers.

Cherry-picked MemPalace#1094 (open upstream, jp-authored) so the typed
QueryResult/GetResult contract — both declare `metadatas: list[dict]`,
never `list[Optional[dict]]` — is honored at the chromadb adapter
boundary. Three same-family fork-ahead PRs we filed since (MemPalace#1198
_tokenize None-doc, MemPalace#1201 palace_graph None metadata, MemPalace#1199 review)
all pointed at gaps that would have been impossible if this pattern
had been in place. Future None-metadata bugs of the same shape are
now catchable at one site instead of N.

Per-site guards are kept as belt-and-suspenders for paths that might
bypass the typed wrappers (e.g. legacy `import chromadb` direct calls).
A future cleanup PR can remove them once we're confident no caller
reaches chromadb directly.

6 new tests in test_backends.py (MemPalace#1094 originals); composes cleanly with
fork main's _segment_appears_healthy + _quarantined_paths from earlier
today. Full suite 1383/1383.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@igorls igorls merged commit 30fc0ab into MemPalace:develop Apr 26, 2026
9 of 10 checks passed
jphein added a commit to jphein/mempalace that referenced this pull request Apr 26, 2026
Brings in upstream's corpus-origin + privacy-warning track (PRs MemPalace#1211
MemPalace#1221 MemPalace#1223 MemPalace#1224 MemPalace#1225) plus the canonical merged versions of our four
PRs that landed today (21:22-21:41 UTC):

  MemPalace#1173  quarantine_stale_hnsw on make_client + cold-start gate +
         integrity sniff-test (PROTO/STOP byte check, no deserialization)
  MemPalace#1177  .blob_seq_ids_migrated marker guard, closes MemPalace#1090
  MemPalace#1198  _tokenize None-document guard in BM25 reranker
  MemPalace#1201  palace_graph.build_graph skips None metadata

Conflict resolution:

* mempalace/backends/chroma.py — took upstream as base (it has the
  igorls-review pickle-protocol docstring, thread-safety paragraph, and
  Path(marker).touch() style nit), then re-applied MemPalace#1094's _coerce_none_metas
  in query()/get() since MemPalace#1094 is still open and not yet in develop.

* mempalace/mcp_server.py — took upstream's clean form. Dropped the
  fork-only `palace_path=` kwarg from four ChromaCollection() call sites:
  the kwarg was load-bearing for MemPalace#1171's per-collection write lock, but
  MemPalace#1171 closed in favor of MemPalace#976's mine_global_lock + daemon-strict, so
  the kwarg has no remaining consumer. ChromaCollection.__init__ in
  upstream/develop is back to (self, collection); calling it with
  palace_path= raised TypeError → silently swallowed by the broad except
  in _get_collection() → returned None → tool_status() returned _no_palace().
  41 mcp_server tests went from failing-with-KeyError to passing.

* mempalace/cli.py — dropped fork-only `workers=args.workers` from the
  cmd_mine -> miner.mine() call. Pre-existing fork-side bug: the
  `--workers` argparse arg landed in 5cd14bd but miner.mine() never
  accepted a workers param, so production `mempalace mine` TypeError'd
  on every invocation. Removed the broken plumbing; tests/test_cli.py
  updated to match.

* CHANGELOG.md — took upstream verbatim. Fork-specific changelog lives
  in FORK_CHANGELOG.md (canonical: docs/fork-changes.yaml).

* CLAUDE.md — kept ours. Fork's CLAUDE.md is operational; upstream's
  added a "Design Principles / Contributing" charter, which lives in
  README.md on the fork.

* tests/test_backends.py — took upstream's ruff-formatted line widths.

docs/fork-changes.yaml flips the two MemPalace#1173 entries (hnsw-integrity-gate,
hnsw-cold-start-gate) and the MemPalace#1201 entry (palace-graph-none-guard) from
OPEN to MERGED 2026-04-26. MemPalace#1173 MemPalace#1177 MemPalace#1198 MemPalace#1201 added to the
merged_upstream archive at the bottom. FORK_CHANGELOG.md regenerated.

scripts/check-docs.sh: 4/4 clean.
Test suite: 1460/1460.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
jphein added a commit to jphein/mempalace that referenced this pull request Apr 27, 2026
Upstream merged MemPalace#1173, MemPalace#1177, MemPalace#1198, MemPalace#1201 into develop on 2026-04-26
(picked up by the 2026-04-27 develop sync). Strike out their fork-ahead
rows in CLAUDE.md and add to the "Merged into upstream" section. Update
PR table accordingly: 18 merged (was 14), 7 open (was 8). Bump version
note: upstream shipped v3.3.3 on 2026-04-24 (carries our MemPalace#659/MemPalace#1021);
v3.3.4 prep branch in flight.

README:
- "tracks upstream/develop through the 2026-04-27 sync" (was 2026-04-26)
- 17 fork-ahead changes (was 22)
- 1,510 tests pass on main (was 1,395 — upstream brought new suites)
- "Open upstream PRs" 7 entries (was 11)
- Drop merged rows from "Fork-ahead — open or pending" table; keep
  PreCompact recovery-marker row (still fork-only)

scripts/check-docs.sh: clean (90 PR refs match upstream state, 13 fork
hashes resolve, FORK_CHANGELOG renders clean from YAML).

docs/fork-changes.yaml: no edit needed — already had merged_date on the
4 entries from the 2026-04-26 commit `5fd15db`.
Scorpion1221 pushed a commit to Scorpion1221/mempalace that referenced this pull request May 2, 2026
Scorpion1221 pushed a commit to Scorpion1221/mempalace that referenced this pull request May 2, 2026
Sync with upstream/develop via cherry-pick of 9 critical bugfixes:
- HNSW index bloat prevention (MemPalace#1191)
- SIGSEGV guards on collection reopen (MemPalace#1262, MemPalace#1289)
- blob-seq marker fast-path (MemPalace#1177)
- palace_graph None-metadata guard (MemPalace#1201)
- palace_graph security tunnels (MemPalace#1168)
- hyphenated wing name normalization (MemPalace#1197)
- searcher _tokenize None guard (MemPalace#1198)
- mcp_server diary topic sanitize (MemPalace#936)

Tests: 1459 default + 113 benchmark + 6/7 stress all pass.
(random-kill stress test was failing on dev pre-merge; not regressed.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants