feat: GET /graph — single-shot structural snapshot for SME-style consumers#13
feat: GET /graph — single-shot structural snapshot for SME-style consumers#13jphein wants to merge 1 commit intorboarescu:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new GET /graph endpoint to provide a one-shot structural snapshot (wings, rooms-per-wing, tunnels, and KG data) for SME-style consumers, reducing round-trips and avoiding MCP fan-out bottlenecks by combining MCP “stats” calls with direct read-only SQLite scans.
Changes:
- Add
GET /graphroute that parallelizes MCP stats calls with direct sqlite reads for wings/rooms and KG entities/triples. - Add design/spec documentation for the endpoint and the
list_tunnelsdiscrepancy. - Add an “Unreleased” changelog entry describing the new endpoint and its behavior.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| main.py | Implements /graph plus direct sqlite readers for wings/rooms and KG entities/triples, executed in parallel with MCP stats calls. |
| docs/graph-endpoint.md | Documents /graph, rationale, response shape, and tunnel-source workaround notes. |
| CHANGELOG.md | Adds an Unreleased entry describing the new endpoint and its implementation approach. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| > sections below are kept as historical context and as the canonical | ||
| > reference for the `list_tunnels` bug write-up that `main.py:612` | ||
| > cites. |
There was a problem hiding this comment.
This doc references main.py:612 as a canonical citation target. Line numbers are unstable and will drift as main.py changes; prefer referencing the route/function name (e.g., GET /graph handler) or a section header instead of a specific line number.
There was a problem hiding this comment.
Done — replaced main.py:612 line-ref with GET /graph handler in the doc. Line numbers drift; route+function names don't.
| - **`GET /graph`** — single-shot structural snapshot for SME-style consumers. Mirrors `/stats`'s `asyncio.gather` shape but adds rooms-per-wing fan-out + a direct read-only sqlite read of `knowledge_graph.sqlite3`. Replaces what an adapter would otherwise compose serially over HTTP — on a 151K-drawer palace, `list_wings` alone takes ~30s, so a serial composition costs minutes. | ||
| - Response shape: `{ "wings": {...}, "rooms": [{"wing", "rooms"}], "tunnels": [...], "kg_entities": [...], "kg_triples": [...], "kg_stats": {...} }`. | ||
| - KG read uses URI-mode `?mode=ro` so the daemon can never accidentally write that file. Schema differences across mempalace versions tolerated via per-query `OperationalError` catch. |
There was a problem hiding this comment.
The "Response shape" example isn’t valid JSON / isn’t clear ([{"wing", "rooms"}] lacks keys/values). Consider rewriting it to an unambiguous example (e.g., { "wing": "<name>", "rooms": { ... } }) so changelog readers can understand the structure.
There was a problem hiding this comment.
Fixed — the Response-shape example in CHANGELOG.md is now valid syntax: { "wings": {<name>: <count>, ...}, "rooms": [{"wing": "<name>", "rooms": {<room>: <count>, ...}}, ...], "tunnels": [...], "kg_entities": [...], "kg_triples": [...], "kg_stats": {...} }.
| rooms = [{"wing": w, "rooms": rooms_by_wing.get(w, {})} for w in wings] | ||
| return wings, rooms |
There was a problem hiding this comment.
rooms is constructed by iterating only over wings. If the wings query fails (OperationalError) but the rooms-per-wing query succeeds, this will return rooms=[] and drop otherwise available data. Consider building the rooms list from rooms_by_wing keys (or the union of wings and rooms_by_wing) so partial success still returns rooms.
There was a problem hiding this comment.
Done — _read_wings_rooms_direct at main.py:505-558 now iterates the union of wings + rooms_by_wing keys (lines 552-557): all_wings = set(wings) | set(rooms_by_wing), and a partial-schema-drift (one query OperationalError-s, the other succeeds) no longer silently drops the half that worked. Each conn.execute is in its own try/except so neither failure cascades into the other.
| # MCP path (cheap tools): graph_stats and kg_stats are computed inside | ||
| # mempalace, not walked. Direct sqlite path (no semaphore, ~0.4s on | ||
| # the canonical palace): wings, rooms-per-wing, KG entities + triples. | ||
| graph_stats_task = _call(_mcp("mempalace_graph_stats", {}, 1)) | ||
| kg_stats_task = _call(_mcp("mempalace_kg_stats", {}, 2)) | ||
| wings_rooms_task = asyncio.to_thread(_read_wings_rooms_direct) | ||
| kg_direct_task = asyncio.to_thread(_read_kg_direct) | ||
|
|
There was a problem hiding this comment.
The direct sqlite reads in /graph bypass the daemon semaphores entirely. This breaks the guarantee provided by _exclusive_palace() (used by /repair mode=rebuild) that no daemon work runs during rebuild, and can race with delete/create operations on the underlying sqlite files. Consider gating these reads behind a semaphore that _exclusive_palace() also acquires, or explicitly blocking/503ing /graph when _repair_state["in_progress"] is true.
There was a problem hiding this comment.
Addressed — both direct-sqlite reads now run under _read_sem via the _direct_under_sem helper at main.py:658-663, and the docstring at lines 634-640 explicitly calls out the coordination with _exclusive_palace(). So /graph yields to a /repair mode=rebuild that's holding the exclusive lock and gets a consistent snapshot rather than racing with delete-then-create.
| graph_stats_task = _call(_mcp("mempalace_graph_stats", {}, 1)) | ||
| kg_stats_task = _call(_mcp("mempalace_kg_stats", {}, 2)) | ||
| wings_rooms_task = asyncio.to_thread(_read_wings_rooms_direct) | ||
| kg_direct_task = asyncio.to_thread(_read_kg_direct) | ||
|
|
There was a problem hiding this comment.
asyncio.to_thread(_read_wings_rooms_direct) and asyncio.to_thread(_read_kg_direct) run outside _read_sem, so /graph can spawn unbounded concurrent DB scans under load (2 threads per request). Consider adding a dedicated semaphore/rate limit for these direct sqlite scans (or caching the snapshot briefly) to avoid disk/CPU exhaustion.
There was a problem hiding this comment.
Same fix as the previous comment — _direct_under_sem wraps each asyncio.to_thread in async with _read_sem, so /graph requests now share the same concurrency budget as MCP reads (PALACE_MAX_READ_CONCURRENCY). A flood of /graph calls can't pile up unbounded threads.
| > **Status:** SHIPPED in palace-daemon **1.6.0** (commits `2003e80`, | ||
| > `127bf68`, `7ee7d0c`). This doc captured the original plan; the | ||
| > sections below are kept as historical context and as the canonical | ||
| > reference for the `list_tunnels` bug write-up that `main.py:612` | ||
| > cites. |
There was a problem hiding this comment.
The header claims this doc is "SHIPPED" in palace-daemon 1.6.0 and references specific commits, but this PR only adds an Unreleased changelog entry and main.py still reports VERSION=1.5.1. Please update the status/version language to reflect the current release state (e.g., "Planned" / "Unreleased") or bump VERSION + changelog to 1.6.0 consistently.
There was a problem hiding this comment.
Done — the doc now opens with > **Status:** Proposed (this PR). and the "Verification on a 151K-drawer palace" line is presented as the test result for the proposed implementation, not a shipped claim. Version bump + final "shipped in 1.6.0" framing will land when this PR merges.
beb6e46 to
708f966
Compare
…boarescu#10/rboarescu#12/rboarescu#13 to main The amendments to those open upstream PRs caught real bugs that exist on fork main too — production at disks runs fork main, so these were shipping live until now. ## main.py — /graph endpoint (from PR rboarescu#13) - `_read_wings_rooms_direct()`: build `rooms` from union of `wings.keys()` + `rooms_by_wing.keys()` rather than just wings. Otherwise a partial schema-drift (one query OperationalError-ed but the other succeeded) silently drops the half that worked. Real data-loss bug for the SME / /viz consumers. - `/graph` handler: gate the direct-sqlite reads on `_read_sem` via a `_direct_under_sem` helper. Coordinates with `_exclusive_palace()` used by `/repair mode=rebuild` (so /graph yields rather than racing delete-then-create on chroma.sqlite3) and rate-limits unbounded `asyncio.to_thread` reads under load (was 2 threads/request × N concurrent requests). - `_canonical_topic()` (from PR rboarescu#8): drop the `topic: str` annotation and add a non-string isinstance guard at the top, coerce `None`/numbers/lists to `CHECKPOINT_TOPIC` with a warning. A malformed `{"topic": null}` JSON payload no longer leaks `None` through to `tool_diary_write`. ## scripts/verify-routes.sh (from PR rboarescu#9) - `set -euo pipefail` (was `set -e`) — pipeline failures and unset vars now surface instead of being masked. - `curl -fsS` instead of `-sS` on every probe so HTTP non-2xx (e.g. `/health` 503 when palace is degraded) fails curl directly. Without this, a 503 body with `"daemon":"palace-daemon"` slips through the body-grep as a pass. - Limit-check rewrite: split curl exit code from JSON parse so a connection error or non-JSON response `?`-flags rather than fails-with-traceback (the previous `2>&1` redirect captured Python tracebacks into `$COUNT`, which then triggered the fail path with a confusing message). ## clients/palace-mcp-dispatch.sh (from PR rboarescu#10) - Replace `readlink -f` (GNU-specific, fails on macOS BSD readlink) with `cd "$(dirname …)" && pwd -P`. POSIX-portable; covers the actual common case (parent directory is a symlink under the Claude Code plugin cache). - Add explicit existence check on `$MCP_CLIENT` before exec, with a clear error message instead of a generic Python "can't open file" trace. ## clients/palace-mode (from PR rboarescu#12) - Make `PALACE_API_KEY` optional in `set_mode_remote`. Server-side `_check_auth` only enforces auth when its own `PALACE_API_KEY` is set, so a daemon running without auth accepts unauthenticated clients. The previous fail-fast was overly strict for that case. Warn-but-proceed instead. All five fixes are amendments on the corresponding upstream PR branches (force-pushed earlier today). No behavior change for the healthy path; the changes harden failure modes that were caught in review.
…umers
Adds a single endpoint that returns the palace's structural graph in
one HTTP roundtrip — wings + rooms-per-wing + tunnels + KG entities
+ KG triples + KG stats — instead of forcing every adapter to compose
4-6 MCP calls serially.
Schema-mining-extraction (SME) adapters and similar dashboards want a
single structural snapshot. Without `/graph`, they compose:
```
list_wings → list_rooms × N wings → list_tunnels → kg_stats → …
```
over MCP, serially, through the daemon's 4-slot read semaphore. On a
151K-drawer palace, `list_wings` alone takes ~30s; the full
composition stalls indefinitely under concurrent search load.
`/graph` runs four reads in parallel via `asyncio.gather`:
1. `mempalace_graph_stats` (cheap MCP — computed inside mempalace,
not walked) — gives tunnels via `top_tunnels`
2. `mempalace_kg_stats` (cheap MCP) — entities/triples summary
3. **Direct read-only sqlite scan** of `chroma.sqlite3.embedding_metadata`
for wings + rooms-per-wing (no semaphore, ~0.4s on 151K drawers)
4. **Direct read-only sqlite scan** of `knowledge_graph.sqlite3` for
KG entities + triples (no semaphore)
The two direct sqlite reads use URI-mode `?mode=ro` so the daemon can
never accidentally write either file, and they run in `asyncio.to_thread`
so the loop stays responsive. Schema differences across mempalace
versions (older palaces, in-progress migrations) are tolerated via
per-query `OperationalError` catches — `/graph` degrades to empty
sections rather than 500ing.
`mempalace_list_tunnels` and `mempalace_graph_stats` disagree on
what counts as a tunnel on mempalace 3.3.4 — `list_tunnels` returns
`[]` while `graph_stats.tunnel_rooms` reports the real count. Until
that's reconciled upstream, `/graph` shadows `top_tunnels` so the
response always agrees with `/stats.graph.tunnel_rooms`.
`docs/graph-endpoint.md` Part 2 documents the upstream bug + the
canonical-source data point.
On the 151K-drawer canonical palace under typical load:
- Pre-`/graph` SME composition: 60-120s, occasionally stalled
- `/graph`: ~0.4s (the gather is bottlenecked by the MCP graph_stats
call; wings/rooms direct read finishes in ~80ms, KG direct read
in ~50ms)
`+495 / -0`. Adds:
- `_kg_path()`, `_chroma_path()` — path helpers (~10 lines)
- `_read_wings_rooms_direct()` — chroma sqlite reader (~50 lines)
- `_read_kg_direct()` — KG sqlite reader (~55 lines)
- `@app.get("/graph")` endpoint (~65 lines)
- `docs/graph-endpoint.md` — spec + design notes (~307 lines)
- CHANGELOG entry
No changes to existing routes. No new dependencies. The schema
assumption (chromadb's internal embedding_metadata table layout)
is the brittlest piece; tolerated by graceful degradation.
- `curl -fs http://localhost:8085/graph | jq 'keys'` — expect
`["kg_entities", "kg_stats", "kg_triples", "rooms", "tunnels", "wings"]`
- Compare `/graph.tunnels` count to `/stats.graph.tunnel_rooms` —
expect equal (validates the graph_stats-as-source-of-truth choice)
- On a fresh empty palace: expect `{"wings": {}, "rooms": [], "tunnels": [], "kg_entities": [], "kg_triples": [], "kg_stats": {}}`
- Schema-drift test: rename `embedding_metadata` to `embedding_metadata_x`
and call /graph — expect empty wings/rooms, no 500.
Coordinates with the `multipass-structural-memory-eval` (SME) palace-daemon
adapter, which prefers `/graph` when available and falls back to MCP
composition otherwise.
Originally fork commits `2003e80` (initial), `127bf68` (tunnels-via-graph_stats),
`7ee7d0c` (perf: direct sqlite reads). Squashed for the upstream PR.
708f966 to
bf9816b
Compare
|
Quick housekeeping note: rebased this branch onto Also opened #15 ( Thanks for the patient review on this whole batch. |
…nding queue; trim patch Filed four upstream PRs on 2026-04-30: - rboarescu#15 feat: GET /viz status dashboard (stacks on rboarescu#13) - rboarescu#16 feat: GET /list — query-free metadata browse - rboarescu#17 feat: DELETE /memory + PATCH /memory - rboarescu#18 feat(lifespan): auto-migrate Stop-hook checkpoints on startup Also rebased PR rboarescu#13 onto upstream/main to clear a CHANGELOG conflict left by upstream's b4aee82 (patch sync) — state went CONFLICTING -> MERGEABLE / CLEAN. README: - Open upstream PRs table: four new rows (rboarescu#15-rboarescu#18) plus a 2026-04-30 note covering today's rebase + new PRs in one breath. - Pending PRs queue: now empty. Replaced the four stale rows (event-log-frame and graph-endpoint were already in flight via rboarescu#11 and rboarescu#13; mempal-fast.py was already upstream via the merged PR #4 omnibus; /viz is now PR rboarescu#15) with a brief empty-state note. CLAUDE.md: - Patch description trimmed to reflect that the hnsw:num_threads enforcement landed upstream via _pin_hnsw_threads(); only the log + retry-once slice remains. patches/mcp_server_get_collection.patch: - Regenerated against current mempalace develop. The patch is now just the "log exception + retry once on cache failure" change. Filed upstream as MemPalace/mempalace#1286; once that merges this patch retires. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
README: - Intro paragraph: appended GET /list (PR rboarescu#16), DELETE/PATCH /memory (PR rboarescu#17), and lifespan auto-migrate (PR rboarescu#18) to "what this fork adds" since they're already on fork main even though the PRs are awaiting review upstream. - Links bar: added docs/typescript-port-plan.md. - Added a "Cross-repo coordination" subsection under "Recently landed in upstream" — calls out the local patches/ directory and the in-flight MemPalace/mempalace#1286 that retires it. Also references #1142 (RELEASING.md) for completeness. - Requirements: dropped the stale "kind= searcher filter" justification for recommending the fork (kind= was retired in 1.7.1); replaced with daemon-strict hook mode + warnings/sqlite-fallback search path as the actual current reasons. Added the patches/ re-apply step. - API table: added /list, DELETE /memory/{id}, PATCH /memory/{id} rows so the table reflects what the fork main actually exposes. - Sources: cross-repo PR note for #1286. CHANGELOG: - Added [Unreleased] section: patch trim (Maintenance), TS port plan doc, hook-routing-fix.md status header, and a Docs entry covering today's README updates + PR rboarescu#13 rebase context. docs/hook-routing-fix.md: - Added a Status: SHIPPED header pointing at 62425e3 (2026-04-24 clients/hook.py introduction) and clarifying mempal-fast.py is the simpler successor — same shape as docs/graph-endpoint.md's status header. No daemon behaviour changes.
Eight comments addressed across main.py, static/viz.html, CHANGELOG.md. Comments on docs/graph-endpoint.md and main.py:640 were inherited from PR rboarescu#13's review and are already settled by the 152e428 backport on main; left untouched here. ## Auth (was: /viz served publicly) main.py: /viz now goes through _check_auth on the same code path as every other endpoint. Accepts the key from either the X-Api-Key header (preferred) or the ?key=… query param (bookmarkable shortcut). _check_auth is a no-op when PALACE_API_KEY is unset, so the zero-config local-dev experience is unchanged. ## CDN integrity (was: D3 + Mermaid loaded without SRI, version not pinned) static/viz.html: pinned to [email protected] and [email protected] with SHA-384 SRI hashes via cdn.jsdelivr.net. Added crossorigin="anonymous" (required for SRI on cross-origin scripts) and referrerpolicy="no-referrer". Comment block above the script tags explains the version-bump+hash- together rule for future maintainers. ## Mermaid sanitization (was: label sanitizer didn't strip control chars; securityLevel "loose") static/viz.html: - mermaidSafe() now strips ASCII control chars (\n, \r, \t, etc.) in addition to the existing parser-breaking chars. A label containing raw newlines used to render half a label on each side of the break. - mermaid.initialize({ securityLevel: "strict" }) — "loose" was over-permissive (relaxes label sanitization, enables clickable diagram nodes). The dashboard never needs node click handlers, and our own mermaidSafe() runs before labels reach Mermaid anyway, so strict is the right default. ## Mermaid render-error fallback (was: <div> appended inside <pre>) static/viz.html: target was <pre id="hierarchy">; appending a <div> inside it produced invalid HTML and inconsistent cross-browser rendering. Surface the error as plain text inside the existing element and tag the container with class="err" so CSS can style it. ## ?key= leakage warning static/viz.html: comment on the KEY parsing block calls out that ?key=… leaks into browser history, referer headers, and proxy logs. Same caveat surfaced in the /viz docstring and CHANGELOG entry. ## Docstring drift (was: "cached at module load" but actually lazy) main.py: docstring rewritten to describe the actual lazy-load behaviour ("lazy-loaded on first request and cached in-process thereafter; one disk read per daemon process"). ## CHANGELOG.md (was: two ### Added headings under [Unreleased]) The /viz changelog block and the /graph block both used "### Added"; merged them under one heading. Also folded the new security details (SRI pinning, securityLevel: strict, control-char strip, /viz auth) into the /viz security bullet so the audit trail of what shipped here is complete. No daemon behaviour changes outside /viz. /viz behaviour change is the auth gate — operators with PALACE_API_KEY set will get 401 on unauthenticated /viz requests, which matches every other endpoint.
Summary
Adds
GET /graph— a single endpoint that returns the palace's structural graph in one HTTP roundtrip: wings + rooms-per-wing + tunnels + KG entities + KG triples + KG stats. Replaces what an adapter would otherwise compose by serially callinglist_wings+list_rooms× N +list_tunnels+kg_statsover MCP.Why
Schema-mining / structural-snapshot adapters (e.g.
multipass-structural-memory-eval) want one structural snapshot per refresh. Without/graph, they compose 4-6 MCP calls serially through the daemon's 4-slot read semaphore. On a 151K-drawer palace,list_wingsalone takes ~30s; the full composition stalls under concurrent search load.What it does
/graphruns four reads in parallel viaasyncio.gather:mempalace_graph_stats(cheap MCP — computed inside mempalace) — gives tunnels viatop_tunnelsmempalace_kg_stats(cheap MCP) — entities/triples summarychroma.sqlite3.embedding_metadatafor wings + rooms-per-wing (no semaphore, ~0.4s on 151K drawers)knowledge_graph.sqlite3for KG entities + triples (no semaphore)Both direct sqlite reads use URI-mode
?mode=roso the daemon can never write either file. They run inasyncio.to_threadso the loop stays responsive. Schema differences across mempalace versions are tolerated via per-queryOperationalErrorcatches —/graphdegrades to empty sections rather than 500ing.Tunnels source
Tunnels come from
mempalace_graph_stats.top_tunnelsrather thanmempalace_list_tunnels. The two disagree on what counts as a tunnel on mempalace 3.3.4 —list_tunnelsreturns[],graph_stats.tunnel_roomsreports the real count.docs/graph-endpoint.mdPart 2 documents the upstream-mempalace bug and the canonical-source data point.Performance
On a 151K-drawer canonical palace under typical load:
/graphSME composition: 60-120s, occasionally stalled out/graph: ~0.4s (bottlenecked by the MCP graph_stats call; wings/rooms direct read finishes in ~80ms, KG direct read in ~50ms)Scope
+495 / -0— three files:main.py:_kg_path(),_chroma_path(),_read_wings_rooms_direct(),_read_kg_direct(),@app.get(\"/graph\")(~178 lines)docs/graph-endpoint.md: spec, design notes, thelist_tunnelsdiscrepancy writeup (~307 lines)CHANGELOG.md: "Unreleased" entryNo changes to existing routes. No new dependencies. The schema assumption (chromadb's internal
embedding_metadatatable layout) is the brittlest piece; tolerated by graceful degradation.Test plan
curl -fs http://localhost:8085/graph | jq 'keys'— expect[\"kg_entities\", \"kg_stats\", \"kg_triples\", \"rooms\", \"tunnels\", \"wings\"]/graph.tunnelscount to/stats.graph.tunnel_rooms— expect equalembedding_metadatatable → expect empty wings/rooms, no 500Originally fork commits
2003e80(initial),127bf68(tunnels-via-graph_stats),7ee7d0c(perf: direct sqlite reads). Squashed for review.