JP's production fork of rboarescu/palace-daemon
Fork of rboarescu/palace-daemon, tracking upstream/main through the 2026-04-27 sync (upstream is at v1.5.1; this fork is at v1.7.2 with the additional /graph endpoint, /viz status dashboard, auto-repair-on-startup, and the post-merge deployment tooling). Running in production since 2026-04-24, currently fronting the jphein/mempalace 150,891-drawer canonical palace on disks.jphe.in:8085. The bulk of the v1.5.0 daemon work (cold-start warmup, /repair, /silent-save, themed messages, --palace flag, MCP timeout) was contributed back to upstream as PR #4; rboarescu cherry-picked the contents into upstream main directly as ef6ac03 on 2026-04-25 and closed the PR.
What this fork adds that you won't get from upstream yet: a GET /viz status dashboard (self-contained HTML page that fetches /graph, /repair/status, and /health in parallel and renders five panels — status strip with repair pulse, D3 force-directed knowledge graph, Mermaid wing/room hierarchy, tunnels list, wings bar chart — D3 + Mermaid via CDN, no static-file deps); a GET /graph endpoint (single-shot structural snapshot for SME-style consumers, ~0.4s on the 151K-drawer palace via direct read-only sqlite reads of embedding_metadata and knowledge_graph.sqlite3 — vs. ~60-120s for the equivalent serial MCP composition under load); GET /list for query-free metadata browse by wing/room (wraps mempalace_list_drawers, the right path when /search would fall back to BM25 and ignore the wing filter); DELETE /memory/{id} + PATCH /memory/{id} REST CRUD over mempalace_delete_drawer / mempalace_update_drawer so curation UIs don't have to talk MCP just to fix a typo; lifespan auto-migrate of pre-3.3.4 Stop-hook checkpoints into mempalace_session_recovery on first restart post-upgrade (idempotent, ImportError-gated, env-overridable via PALACE_AUTO_MIGRATE_CHECKPOINTS=0); auto-repair-on-startup that detects degraded HNSW recall after restart and fires /repair {mode:rebuild} non-blocking in the background (workaround that bought time for the mempalace fork's 645ba20 integrity gate fix to land); the limit= parameter actually being honored (earlier versions silently capped at 5 due to a max_results→limit name mismatch the MCP tool's whitelist dropped); a scripts/deploy.sh that bundles git push → wait for sync → systemctl restart → /health poll → verify-routes smoke test into one command; scripts/verify-routes.sh as a curl-based smoke test for every public route; clients/palace-mode CLI for one-command local↔remote palace switching; clients/palace-mcp-dispatch.sh that picks daemon vs. in-process MCP based on PALACE_DAEMON_URL; and clients/mempal-fast.py — a stdlib-only Stop/PreCompact hook handler that POSTs to /silent-save without importing mempalace (so cold hook fires can't trigger ChromaDB's HNSW SIGSEGV class). Full list below.
v1.7.2 release notes · PR #4 — upstream contribution · Discussion #5 — Postgres backend · Discussion #6 — TS rewrite heads-up · docs/event-log-frame.md — daemon-as-view-coordinator architectural frame · docs/typescript-port-plan.md — TS rewrite planning artifact (no commitments, sections marked [OPEN]/[LEANING]/[DECIDED])
Per PR #4 issue comment, rboarescu welcomed post-1.5.0 work as small separate PRs at whatever cadence works.
| PR | Status | Description |
|---|---|---|
| #7 | OPEN, awaiting review | fix: honor limit= on /search and /context — two-line rename max_results → limit so the user-supplied value actually binds (the MCP tool's input_schema declares limit, so max_results was being silently dropped). |
| #8 | OPEN, awaiting review | feat: canonicalize Stop-hook topic at daemon boundary with warning log — _canonical_topic() rewrites legacy synonyms ("auto-save" → "checkpoint") on the /silent-save path and emits a warning so client-side drift is observable. Composes with upstream's 0060190 CHECKPOINT_TOPIC constant. |
| #9 | OPEN, awaiting review | chore(scripts): add verify-routes.sh smoke test — curl-based smoke test for every public read-only route. Universal, no fork-mempalace dependencies. |
| #10 | OPEN, awaiting review | fix(clients): resolve mempalace-mcp.py via readlink, not absolute path — bug fix: dispatcher in clients/palace-mcp-dispatch.sh as shipped in upstream main has a hardcoded /home/jp/Projects/... path (accidentally embedded during PR #4's extraction) and fails on every machine except mine. +6/-1 readlink -f sibling resolution. |
| #11 | OPEN, awaiting review | docs: event-log frame — palace-daemon as materialized-view coordinator — architectural reference doc (191 lines) articulating mempalace as Kleppmann-shaped (log + materialized views), the daemon as the view coordinator. Useful frame ahead of the multi-backend transition. |
| #12 | OPEN, awaiting review | fix(clients): remove embedded API key + URL defaults from palace-mode — clients/palace-mode shipped with a DEFAULT_URL pointing at JP's homelab and a real hex DEFAULT_KEY (rotated, but still in upstream's source). Reads both from env, fails fast in remote mode if either is unset. |
| #13 | OPEN, awaiting review | feat: GET /graph — single-shot structural snapshot for SME-style consumers — single endpoint returns wings + rooms-per-wing + tunnels + KG entities + triples + KG stats in ~0.4s on the canonical 151K palace; replaces the SME-style 60-120s serial MCP composition. Folds in the /graph.tunnels derive-from-graph_stats.top_tunnels fix so the response always agrees with /stats.graph.tunnel_rooms. Includes docs/graph-endpoint.md. +495/-0. |
| #14 | OPEN, awaiting review | chore(clients): add CHECKPOINT_TOPIC constant to mempal-fast.py — mirrors the constant already in clients/hook.py. Symmetry refactor; both client paths now source the canonical topic value from a per-file constant rather than mixing inline + constant. +8/-1. |
| #15 | OPEN, awaiting review | feat: GET /viz — self-contained status dashboard — single HTML page that fetches /graph, /repair/status, and /health and renders five panels (status strip, D3 KG, Mermaid wing/room tree, tunnels, wings bar). D3 + Mermaid via CDN, no new static-file plumbing. Stacks on #13 because the page consumes /graph. |
| #16 | OPEN, awaiting review | feat: GET /list — query-free metadata browse by wing/room — wraps mempalace_list_drawers so consumers can enumerate drawers in a wing without inventing an embeddable query (/search falls back to BM25 and ignores the wing filter when the query is non-embeddable). 34 lines of main.py. |
| #17 | OPEN, awaiting review | feat: DELETE /memory/{id} + PATCH /memory/{id} — REST CRUD over mempalace_delete_drawer / mempalace_update_drawer. Both tools have been in mempalace since 3.x; this just exposes them over HTTP for curation UIs. 29 lines of main.py. |
| #18 | OPEN, awaiting review | feat(lifespan): auto-migrate Stop-hook checkpoints to recovery collection on startup — calls mempalace.migrate.migrate_checkpoints_to_recovery() during lifespan startup so operators don't have to run the manual mempalace repair --mode reorganize after upgrading. ImportError-gated, env-overridable via PALACE_AUTO_MIGRATE_CHECKPOINTS=0. |
Note: PRs #8, #9, #10, #11, #12, #13 were each amended once on 2026-04-27 to address Copilot review feedback (force-pushed). Caught real bugs in several cases: PR #9's /health 503-hiding (curl -sS body grep masked HTTP status), PR #10's GNU-only readlink -f (failed on macOS), PR #13's rooms-from-wings-only logic bug (silent data loss on partial schema-drift) and _read_sem-bypass concurrency concern. Fixes also backported to fork main + deployed to disks (152e428). PR #13 was rebased on 2026-04-30 to clear a CHANGELOG.md conflict with upstream's b4aee82 patch sync; PRs #15–#18 followed the same day after the rebase cleared the way.
- PR #4 (cherry-picked into upstream
mainasef6ac03, 2026-04-25, then closed): cold-start warmup,/repair,/silent-save, themed messages,--palaceflag, MCP timeout. The bulk of the v1.5.0 daemon work originated here.
The daemon depends on a tiny mempalace patch that's also in flight upstream:
- MemPalace/mempalace#1286 —
fix(mcp_server): log exception + retry once on _get_collection failure(filed 2026-04-30, againstdevelop). Currently applied locally aspatches/mcp_server_get_collection.patchviascripts/apply_patches.shon everypipx upgrade mempalace. Once #1286 merges, the patch retires entirely (delete the file, drop the apply step from the upgrade workflow). - MemPalace/mempalace#1142 —
docs: add RELEASING.md with mempalace-mcp pre-release check(filed 2026-04-23, againstdevelop). Process doc, no daemon dependency.
Everything the fork has ahead of upstream that hasn't been filed as a PR yet. Ranked from most PR-ready to least.
As of 2026-04-30, the queue is empty — every generalisable change ahead of upstream/main is now an open PR (#7 through #18). The remaining fork-only work is captured below under Needs generalization before PR.
These have working fork-side implementations but bake in JP-specific assumptions (paths, hostnames, install layouts, fork-mempalace symbols) that would fail or surprise other operators. They're held until they can be split into a universally-applicable shape vs. a fork-private layer.
| Area | Change | What needs generalizing | Files |
|---|---|---|---|
| Tooling | scripts/deploy.sh — one-command git push → wait for sync → systemctl restart → /health poll → verify-routes deploy. |
Defaults to PALACE_HOST=disks; reads PALACE_API_KEY from ~/.claude/settings.local.json; assumes a Syncthing-mirrored source tree on the deploy host; ssh user paths hardcoded; the post-restart verify hook imports fork-mempalace-only symbols (_segment_appears_healthy, _quarantined_paths, _SESSION_RECOVERY_COLLECTION, migrate_checkpoints_to_recovery) that would fail on upstream-mempalace installs. Likely splits into "universal three-step deploy" + "private verify hook." |
scripts/deploy.sh |
| Clients | clients/palace-mode — install/verify subcommands that re-apply plugin-cache customizations after a Claude Code plugin update. The base mode-switching part shipped via PR #12. |
The install subcommand assumes the Claude Code plugin cache layout under ~/.claude/plugins/cache/mempalace/.... Needs to be parameterized or removed for the upstream version. |
clients/palace-mode |
| Ops | scripts/auto-repair-if-empty.sh — ExecStartPost script that probes /search after the daemon binds, detects the "vector ranked 0" warning, and fires /repair {mode:rebuild} non-blocking in the background. Now safety-net-only since mempalace 645ba20 (integrity gate) shipped — a healthy 151K palace no longer triggers it. |
Assumes a systemctl --user unit + a specific service unit shape with ExecStartPost. The probe-and-repair logic itself is generic; the systemd integration is what's JP-shaped. The ~4:48 HNSW-segment-load timeout (PALACE_AUTO_REPAIR_WAIT_SECS=240) is calibrated to the 151K canonical palace; smaller palaces can use the 30s default. |
scripts/auto-repair-if-empty.sh, palace-daemon.service |
The fork's /graph endpoint replaces what an SME-style adapter would otherwise compose by serially calling list_wings + list_rooms × N + list_tunnels + kg_stats over MCP:
$ time curl -sS -H "X-Api-Key: $KEY" https://palace.jphe.in/graph | jq '{
wings: (.wings | length),
pairs: ([.rooms[] | .rooms | length] | add),
tunnels: (.tunnels | length),
kg: {entities: (.kg_entities | length), triples: (.kg_triples | length)}
}'
{
"wings": 36,
"pairs": 165,
"tunnels": 9,
"kg": { "entities": 6, "triples": 3 }
}
real 0m0.876sDeploy is a single command that catches sync-lag footguns (Syncthing-mirrored deployment between dev and prod hosts):
$ scripts/deploy.sh
▸ 1/5 push to origin ✓ pushed 00ec6be → origin/main
▸ 2/5 wait for sync to disks ✓ remote at 00ec6be
▸ 3/5 restart palace-daemon ✓ restart issued
▸ 4/5 wait for daemon health ✓ healthy on v1.7.0 (after 3s)
▸ 5/5 smoke-test routes ✓ all 12 routes verified
✦ deploy complete: 00ec6be on http://disks.jphe.in:8085Local↔remote palace switching is one command:
$ palace-mode status
Mode: remote (http://disks.jphe.in:8085)
$ palace-mode local
→ local mode
$ palace-mode remote http://staging:8085
→ remote mode (PALACE_DAEMON_URL=http://staging:8085)A Stop hook fires from any Claude Code session and routes through the daemon without ever loading mempalace locally:
[06:29:17] Daemon silent-save: queued=False count=14 (fast-path)
[06:29:17] Skipping auto-ingest: PALACE_DAEMON_URL set, daemon owns writes
The /viz dashboard is a single bookmark for live state — drawer count, repair pulse, KG, wing/room tree, tunnels:
https://palace.jphe.in/viz?key=$KEY&refresh=15
Auto-repair self-heals after a daemon restart that leaves HNSW empty (the false-positive quarantine cascade — pre-fix shape):
06:56:42 systemd: Starting palace-daemon...
06:56:45 Quarantined 3 stale HNSW segment(s) — ChromaDB will rebuild indexes
06:57:19 [auto-repair] daemon up after 15s
06:57:20 [auto-repair] DETECTED degraded HNSW recall: vector ranked 0
06:57:20 [auto-repair] kicking off /repair {mode:"rebuild"} in background — daemon stays available
After the mempalace-fork integrity-gate fix (645ba20) deployed alongside, the same restart now logs the post-fix shape and the auto-repair script exits no-op:
HNSW mtime gap 11165s on .../f360e835-... exceeds threshold but segment metadata file is intact — flush-lag, not corruption. Leaving in place.
HNSW mtime gap 11165s on .../02660268-... — Leaving in place.
HNSW mtime gap 11166s on .../4697d280-... — Leaving in place.
[auto-repair] HNSW recall looks healthy (no 'vector ranked 0' warning)
The upstream daemon focused on stability — semaphore-coordinated reads/writes, mine isolation, MCP-safe API key auth. JP's fork extended that into production deployment patterns:
-
Single-source-of-truth daemon for distributed Claude Code sessions. Multiple Claude Code instances (different projects, different terminals, different machines) all routing through one daemon prevents the kind of concurrent-writer SQLite corruption that took down the canonical palace on 2026-04-24. The fork's daemon-strict mode (in jphein/mempalace) plus this daemon's queue-and-drain plus
mempal-fast.py's no-import path together make that single-writer guarantee enforceable. -
Structural snapshots for evaluation frameworks. When SME (multipass-structural-memory-eval) needed a structural view of the palace for diagnostics, composing it serially over MCP timed out at 60-120s. The fork added
GET /graphso an evaluator can pull wings, rooms, tunnels, KG entities, and KG triples in one HTTP roundtrip — sub-second on a 151K-drawer palace. -
Operational ergonomics.
palace-modefor switching local/remote,deploy.shfor the one-command release,verify-routes.shfor post-restart smoke testing — these are quality-of-life pieces for a daemon that's actually used day-to-day rather than just installed.
The architectural argument for why those pieces survive backend swaps (chroma → pgvector, etc.) is in docs/event-log-frame.md.
-
Single-writer enforced by design. SQLite + Syncthing replication + multiple writers = corruption. The daemon is the only process that writes to the palace; clients route through it via HTTP/MCP. The fork's
mempal-fast.pyandpalace-mcp-dispatch.shmake that property hold even for hooks and MCP servers. -
Direct sqlite reads for structural data.
embedding_metadataandknowledge_graph.sqlite3are read-only via?mode=roURI for/graph. Bypasses the MCP read semaphore entirely, ~200× faster than the equivalent fan-out under load. Same pattern, different table, for the KG. -
Themed messages for save/repair lifecycle.
messages.pyreturns user-facing strings insystemMessageso a Claude Code Stop hook surfaces✦ N memories woven into the palacewithout the client knowing the internal save/queue state. -
Coordinated rebuild with queue-and-drain.
/repair mode=rebuildholds every read/write/mine semaphore slot during the destructive collection swap;/silent-savequeues to<palace>/palace-daemon-pending.jsonland replays automatically post-rebuild. No saves lost during a rebuild window. -
Deploy and verify are the same command.
deploy.shexits non-zero on sync lag, restart failure, or any verify-routes regression. The default cadence for shipping a daemon change is push + restart + verify; if any step fails the deploy aborts, leaving the previous version running.
- Python 3.12+
- mempalace ≥ 3.3.2 — the fork is recommended if you want daemon-strict hook mode (single-writer enforcement) and the warnings/sqlite-fallback search path that aren't yet on
MemPalace/mempalace develop. Stock mempalace works for everything else; the fork-onlymigrate_checkpoints_to_recoverylifespan call isImportError-gated and degrades cleanly. - For the local mempalace patch (
patches/mcp_server_get_collection.patch— log + retry on_get_collectionfailure, in flight upstream as #1286): re-apply withscripts/apply_patches.shafter eachpipx upgrade mempalaceuntil #1286 merges.
git clone https://github.com/jphein/palace-daemon.git
cd palace-daemon
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt# Default: port 8085, palace at $PALACE_PATH or ~/.mempalace/palace
python main.py
# Custom palace path + auth
PALACE_API_KEY=$(openssl rand -hex 32) python main.py --palace /mnt/raid/projects/mempalace-data/palacemkdir -p ~/.config/systemd/user/
cp palace-daemon.service ~/.config/systemd/user/
systemctl --user daemon-reload
systemctl --user enable --now palace-daemonEdit the service file to set PALACE_API_KEY, MEMPALACE_PALACE, and any custom args before installing.
Warning
Never install both system AND user services. They'll fight for port 8085 and the second instance will crash-loop. Pick one.
Caution
Don't expose port 8085 without setting PALACE_API_KEY. The /mine endpoint accepts arbitrary filesystem paths.
Use palace-mode install to wire the mempalace plugin cache to talk to this daemon (after pointing PALACE_DAEMON_URL at it):
export PALACE_DAEMON_URL=http://your-host:8085
export PALACE_API_KEY=...
~/Projects/palace-daemon/clients/palace-mode install
~/Projects/palace-daemon/clients/palace-mode verifyThis installs mempal-fast.py as the Stop/PreCompact hook handler and palace-mcp-dispatch.sh as the MCP server command in the plugin cache. Idempotent — safe to re-run after plugin updates.
| Route | Method | Purpose |
|---|---|---|
/health |
GET | Liveness + version |
/search |
GET | Semantic search over mempalace_drawers; limit=N. (Stop-hook checkpoints live in mempalace_session_recovery — read via the mempalace_session_recovery_read MCP tool.) |
/context |
GET | Same as /search, formatted for LLM prompts |
/list |
GET | Query-free metadata browse — wraps mempalace_list_drawers. wing=…&room=…&limit=N&offset=N, all optional |
/stats |
GET | Aggregate KG + graph + status counts |
/graph |
GET | Single-shot structural snapshot (wings, rooms, tunnels, KG) — see docs/graph-endpoint.md |
/viz |
GET | Self-contained HTML status dashboard (D3 + Mermaid). Optional ?refresh=N, ?key=… |
/repair |
POST | Coordinate repair (mode=light|scan|prune|rebuild) |
/repair/status |
GET | Current repair state + pending-writes queue depth |
/silent-save |
POST | Stop-hook save path with queue-and-drain during rebuild |
/memory/{id} |
DELETE | Drop a drawer — wraps mempalace_delete_drawer |
/memory/{id} |
PATCH | Update drawer content / wing / room (all optional in body) — wraps mempalace_update_drawer |
/mine |
POST | Bulk import a directory (validated absolute path only) |
/flush |
POST | Force checkpoint of pending writes |
/reload |
POST | Invalidate cached client + collection |
/backup |
POST | SQLite snapshot to a sibling file |
/mcp |
POST | MCP-protocol passthrough |
All endpoints honor X-Api-Key when PALACE_API_KEY is set.
# Smoke-test the running daemon
PALACE_DAEMON_URL=http://localhost:8085 PALACE_API_KEY=... scripts/verify-routes.sh
# One-command deploy (push + sync-wait + restart + verify)
scripts/deploy.sh
# Switch local Claude Code sessions between modes
palace-mode {status,local,remote [URL],install,verify}- rboarescu/palace-daemon — upstream
- MemPalace/mempalace — the underlying memory system this daemon fronts
- jphein/mempalace — the production fork of mempalace this daemon is paired with
- multipass-structural-memory-eval — the SME framework whose palace-daemon adapter consumes
/graph - Apache AGE — graph extension for postgres, candidate KG view technology if mempalace's KG ever justifies it (currently doesn't)
- pgvector — vector extension for postgres, candidate semantic-search view technology under upstream MemPalace #665
- D3.js + Mermaid —
/vizdashboard rendering, both via CDN, no bundler / no static-asset deps - Upstream PRs that informed
/viz: #1022 (D3 KG viz, sangeethkc), #393 (Mermaid in docs, jravas), #431 (CLI stats, MiloszPodsiadly), #256 (sync_status MCP, rusel95), #601 (brief overview, mvanhorn) — synthesized, not cherry-picked - Cross-repo PRs that retire local code paths if/when they merge: MemPalace/mempalace#1286 — log + retry on
_get_collectionfailure (would retirepatches/mcp_server_get_collection.patch)
MIT — same as upstream.