feat: /repair + /silent-save endpoints, cold-start warmup, minor fixes#4
feat: /repair + /silent-save endpoints, cold-start warmup, minor fixes#4jphein wants to merge 15 commits intorboarescu:mainfrom
Conversation
Adds repair coordination and queue-safe Stop-hook save path. /repair (light/scan/prune/rebuild): coordinates repairs with live traffic. rebuild holds all semaphore slots via _exclusive_palace() to prevent delete_collection/create_collection from racing concurrent writes — those backend ops sit outside the ChromaCollection flock. /silent-save: HTTP path for Claude Code Stop-hook checkpoints. Writes normally when the palace is healthy; queues to palace-daemon-pending.jsonl during a rebuild window and drains automatically once the swap completes. Double-checked locking (pre- and post-semaphore) ensures no write slips through between the flag check and semaphore acquisition. /repair/status: current repair state + pending queue depth. messages.py: single source for all themed user-facing strings (✦ memory ops, ◈ palace ops). Hook opt-in via PALACE_DAEMON_URL env var; missing or unreachable daemon falls through to direct write with no behavior change. Lock path now port-scoped (/tmp/palace-daemon-<port>.lock) to allow running a test instance alongside production. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
ChromaDB's Rust HNSW binding occasionally segfaults on the very first request when opened cold. Pre-opening the client during lifespan startup (before yield) ensures PersistentClient is fully initialized before any traffic arrives. Warmup failure is non-fatal — daemon still starts. Remove the custom SIGINT/SIGTERM handlers that called sys.exit() from inside an asyncio callback. That pattern interrupted the event loop mid-coroutine, producing CancelledError/SystemExit tracebacks in the logs and skipping lifespan shutdown (the flush). Uvicorn already installs graceful shutdown handlers — no override needed. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Allows running the offline purge on deployments where the palace is not at ~/.mempalace/palace (e.g., /mnt/raid/projects/mempalace-data/palace on the disks server). Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Default 30s urlopen timeout was too short for read tools that walk a multi-GB palace (e.g. `mempalace_status`, `mempalace_list_wings`). On a 2.2GB palace these can take >30s under read-semaphore contention even when the daemon eventually returns 200 OK — observed 2026-04-24 with MCP timeouts client-side while daemon log showed POST /mcp 200 OK. New default 120s gives headroom for slow paths and short waits behind in-flight writes. `PALACE_MCP_TIMEOUT` env override for tuning. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
There was a problem hiding this comment.
Pull request overview
This PR bundles several production-driven improvements to palace-daemon: cold-start stabilization, new repair/save-coordination endpoints with themed user messages, a configurable palace path for the purge script, and a longer/tunable MCP HTTP timeout.
Changes:
- Warm ChromaDB client during FastAPI lifespan startup and remove custom signal handlers.
- Add
/repair,/repair/status, and/silent-saveendpoints plus centralized themed messages. - Add
--palaceflag toscripts/purge_wings.pyand make MCP proxy timeout env-tunable (default 120s).
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
main.py |
Implements warmup, repair state, pending-writes queue/drain, and new HTTP endpoints. |
messages.py |
Adds centralized themed system messages for saves/repair lifecycle. |
clients/mempalace-mcp.py |
Bumps HTTP timeout and allows override via PALACE_MCP_TIMEOUT. |
scripts/purge_wings.py |
Adds --palace argument to target non-default palace locations. |
README.md |
Documents the new endpoints and hook wiring flow. |
CHANGELOG.md |
Adds release notes for v1.5.0 features and fixes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def _enqueue_pending_write(payload: dict) -> None: | ||
| """Append a silent-save payload to the pending-writes queue.""" | ||
| path = _pending_writes_path() | ||
| line = json.dumps({"payload": payload, "enqueued_at": datetime.now().isoformat()}) | ||
| with open(path, "a", encoding="utf-8") as f: | ||
| f.write(line + "\n") | ||
|
|
There was a problem hiding this comment.
_enqueue_pending_write() performs synchronous disk I/O directly on the event loop thread. Under load (many concurrent /silent-save calls) this can stall all request handling. Consider moving the append into run_in_executor (or using an async file writer / background task + queue) so /silent-save stays non-blocking even during heavy enqueue bursts.
| for line in lines: | ||
| try: | ||
| entry = json.loads(line) | ||
| result = await _do_silent_save_write(entry["payload"]) | ||
| if result.get("success"): | ||
| count += 1 | ||
| else: | ||
| _log.warning("drain: replay failed: %s", result.get("error")) | ||
| except Exception: | ||
| _log.exception("drain: entry replay raised") | ||
| os.remove(proc_path) |
There was a problem hiding this comment.
_drain_pending_writes() calls _do_silent_save_write() without holding _write_sem, even though _do_silent_save_write() documents that the caller must hold it. This can bypass the daemon’s concurrency controls and race with other writers. Also, the code deletes the .processing file unconditionally at the end, which drops any entries that failed to replay (or had JSON errors), undermining the “no lost saves” guarantee. Please drain under _write_sem (per entry or for the whole drain) and persist/requeue any failed entries instead of deleting them.
| for line in lines: | |
| try: | |
| entry = json.loads(line) | |
| result = await _do_silent_save_write(entry["payload"]) | |
| if result.get("success"): | |
| count += 1 | |
| else: | |
| _log.warning("drain: replay failed: %s", result.get("error")) | |
| except Exception: | |
| _log.exception("drain: entry replay raised") | |
| os.remove(proc_path) | |
| failed_lines: list[str] = [] | |
| for line in lines: | |
| try: | |
| entry = json.loads(line) | |
| payload = entry["payload"] | |
| except Exception: | |
| _log.exception("drain: invalid entry; preserving for retry") | |
| failed_lines.append(line) | |
| continue | |
| try: | |
| async with _write_sem: | |
| result = await _do_silent_save_write(payload) | |
| if result.get("success"): | |
| count += 1 | |
| else: | |
| _log.warning("drain: replay failed: %s", result.get("error")) | |
| failed_lines.append(line) | |
| except Exception: | |
| _log.exception("drain: entry replay raised; preserving for retry") | |
| failed_lines.append(line) | |
| if not failed_lines: | |
| os.remove(proc_path) | |
| return count | |
| with open(proc_path, "w", encoding="utf-8") as f: | |
| f.writelines(failed_lines) | |
| try: | |
| with open(path, "a", encoding="utf-8") as f: | |
| f.writelines(failed_lines) | |
| os.remove(proc_path) | |
| except Exception: | |
| _log.exception("drain: failed to requeue; leaving %s in place", proc_path) |
| # occasionally segfaults on the very first request if opened cold; opening | ||
| # it here (before yield) ensures the PersistentClient is fully initialized. | ||
| try: | ||
| loop = asyncio.get_event_loop() |
There was a problem hiding this comment.
Inside an async lifespan function, prefer asyncio.get_running_loop() over asyncio.get_event_loop() (which is deprecated in newer Python versions and can behave differently depending on policy). Switching to get_running_loop() here avoids deprecation warnings and makes the intent explicit.
| loop = asyncio.get_event_loop() | |
| loop = asyncio.get_running_loop() |
| raise HTTPException(status_code=400, detail="'entry' is required") | ||
|
|
||
| themes = body.get("themes") or [] | ||
| msg_count = int(body.get("message_count") or 1) |
There was a problem hiding this comment.
msg_count = int(body.get("message_count") or 1) can raise ValueError if a client sends a non-integer (e.g., "15" is fine, but "15 msgs" or null-like values can break). Please validate/parse message_count with a try/except and return a 400 on invalid input instead of throwing a 500.
| msg_count = int(body.get("message_count") or 1) | |
| raw_msg_count = body.get("message_count") | |
| if raw_msg_count: | |
| try: | |
| msg_count = int(raw_msg_count) | |
| except (TypeError, ValueError): | |
| raise HTTPException(status_code=400, detail="invalid 'message_count'") | |
| else: | |
| msg_count = 1 |
| # 120s headroom for slow read tools (e.g. mempalace_status walking a | ||
| # multi-GB palace) plus short waits behind in-flight writes on the | ||
| # daemon's read semaphore. Override with PALACE_MCP_TIMEOUT for tuning. | ||
| timeout = int(os.getenv("PALACE_MCP_TIMEOUT", "120")) | ||
| with urllib.request.urlopen(req, timeout=timeout) as resp: |
There was a problem hiding this comment.
timeout = int(os.getenv("PALACE_MCP_TIMEOUT", "120")) will crash the proxy if the env var is set to a non-integer value. Please catch ValueError (and maybe enforce a sane min/max) so a misconfigured environment yields a clear error message or falls back to the default instead of exiting unexpectedly.
| # Fast-path: repair already in progress, queue immediately. | ||
| if _repair_state["in_progress"]: | ||
| _enqueue_pending_write(body) | ||
| return { |
There was a problem hiding this comment.
/silent-save queues whenever _repair_state["in_progress"] is true, but /repair sets in_progress for all modes (light/scan/prune/rebuild). That means silent saves will be deferred even during non-destructive/readonly repairs, which doesn’t match the PR description (“during /repair mode=rebuild”). Consider only enabling queuing for mode == "rebuild" (e.g., track a separate queue_writes flag or check _repair_state["mode"]).
| def _pending_writes_path() -> str: | ||
| """Location of the jsonl queue that holds silent-saves during rebuild.""" | ||
| palace_path = _mp._config.palace_path | ||
| parent = os.path.dirname(palace_path.rstrip("/")) or "/tmp" | ||
| return os.path.join(parent, "palace-daemon-pending.jsonl") |
There was a problem hiding this comment.
The PR description mentions the pending-writes queue living under ~/.cache/palace-daemon/..., but _pending_writes_path() currently writes to <palace_parent>/palace-daemon-pending.jsonl. Please either update the PR description (and any external docs) or adjust the implementation so the queue location matches the stated behavior.
Three deployment helpers that sit alongside mempalace-mcp.py: - palace-mode: shell CLI to switch between local (in-process) and remote (daemon) palace modes by toggling PALACE_DAEMON_URL in ~/.claude/settings.local.json. Subcommands: status, local, remote [URL], install (idempotent re-apply of cache+workspace customizations after a plugin update), verify. - palace-mcp-dispatch.sh: thin shell wrapper invoked by the MCP server command in plugin configs. Dispatches to mempalace-mcp.py --daemon $URL when PALACE_DAEMON_URL is set, otherwise falls back to in-process python -m mempalace.mcp_server. - mempal-fast.py: stdlib-only Stop/PreCompact hook handler. Counts human messages in the transcript, gates on SAVE_INTERVAL, POSTs to the daemon's /silent-save. No mempalace import, no chromadb load, so cold hook fires can't trigger the chromadb HNSW SIGSEGV that the warmup patch in main.py addresses for the daemon itself. Together these let a user single-command-flip between an in-process palace and a daemon-backed palace without editing any JSON by hand. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Six surgical fixes pulled from Copilot's review of #4: main.py: - Move /silent-save queue gating from `_repair_state['in_progress']` (any mode) to `mode == 'rebuild'` only. Light/scan/prune don't replace the collection, so silent-saves can proceed normally during those modes — matches the original PR intent. - Make _enqueue_pending_write async, dispatch the disk append via asyncio.to_thread so /silent-save stays non-blocking under bursty enqueue load. - Drain pending writes under _write_sem (per entry) — the previous version called _do_silent_save_write without holding the semaphore it documents as required, racing other writers. - Don't unconditionally remove the .processing file in drain — quarantine failed entries to a timestamped sibling so the next drain doesn't replay successful saves. - Validate body['message_count']: parse via try/except and return 400 on non-integer input rather than letting int() raise a 500. - asyncio.get_event_loop() → asyncio.get_running_loop() in the lifespan warmup. The former is deprecated in 3.12+ inside async functions. clients/mempalace-mcp.py: - Validate PALACE_MCP_TIMEOUT: catch ValueError, warn to stderr, fall back to 120s rather than crashing the proxy on a bad env value. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
palace-mode used to rewrite both ~/.claude/plugins/cache/... (the runtime that Claude Code actually uses) AND ~/Projects/memorypalace/.claude-plugin/ (the plugin source dev clone). The dev-clone rewrite produced a permanently dirty git working tree on every machine that ran `palace-mode remote`. Drop the workspace install and verify steps. The cache copy is the only one that affects runtime; the dev clone now stays clean for git operations (rebases, status checks, preflight). Existing installs are unaffected — runtime hooks already live in the cache. Also adds venv/ to .gitignore. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
All hook paths now write topic="checkpoint" so jphein/mempalace's search kind= filter (default kind="content") cleanly excludes them. Before this change, three different hook entry points wrote three different topic values: - mempalace.hooks_cli (silent_save fallback + daemon POST): topic="checkpoint" ✓ - clients/mempal-fast.py (the active fast-path): topic="checkpoint" ✓ - clients/hook.py (legacy stdlib runner via _post_mcp): topic="auto-save" ✗ fixed to "checkpoint" Defines a CHECKPOINT_TOPIC constant in both client files so future edits are grep-anchored and a future rename is one-place. The read-side filter in jphein/mempalace's search_memories (commit 8d02835) handles both "checkpoint" and "auto-save" as legacy synonyms, plus does text-prefix defense-in-depth — so existing palace data with the old topic value, or with no topic metadata at all, is also caught. Future writes from this commit forward will all use the canonical name. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Three fixes, one verify script.
1. /search and /context now accept a `kind=` query param mirroring the
mempalace_search MCP tool's input_schema enum:
- "content" (default) — excludes Stop-hook checkpoints (the
session-summary noise that drowns out real content under vector
similarity)
- "checkpoint" — only checkpoints (recovery / audit)
- "all" — no filter, pre-2026-04-25 behavior
Invalid values return 400. Backed by the read-side filter in
jphein/mempalace commit 8d02835 (search_memories kind= parameter).
2. **Bug fix**: /search and /context were passing `max_results` to the
mempalace_search MCP tool. The tool's input_schema declares `limit`,
not `max_results`, so mempalace.mcp_server.handle_request silently
dropped the param via its schema-property whitelist (line 1677), and
*every* /search response was capped at the default limit=5. Confirmed
on the running v1.5.0 daemon. Renamed to `limit` so the user-supplied
value actually binds.
3. /silent-save now canonicalizes the `topic` field at the daemon
boundary. Drift between client topic values (clients/hook.py wrote
"auto-save"; mempal-fast.py wrote "checkpoint") used to leak into
palace metadata, where the read-side checkpoint filter then had to
keep both as known-synonyms. New _canonical_topic() rewrites
"auto-save" → "checkpoint" with a warning log. Future drift gets the
same treatment.
scripts/verify-routes.sh is a curl-based smoke test that exercises
every public route post-deploy. Not in CI (depends on a live palace);
run manually after `systemctl --user restart palace-daemon`.
Bumps VERSION 1.5.0 → 1.5.1.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Five steps: git push origin → wait for Syncthing/git mirror → systemctl restart on the deploy host → poll /health until ready → run verify-routes.sh smoke test. Reads PALACE_API_KEY from ~/.claude/settings.local.json env block (palace-mode's source of truth) so it picks up the same key the local Claude Code sessions use. Defaults: PALACE_HOST=disks, PALACE_SYNC_GRACE=3s, PALACE_HEALTH_TIMEOUT=30s, all overridable via env. Replaces the previous manual sequence (push → ssh → restart → curl /health → hand-run smoke tests) with a single command that fails fast on any step. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Documents the v1.5.1 changes: - README API table updated with kind= column on /search and /context - New "kind= filter" section under /search explaining the three values (content/checkpoint/all) and why default-excluding checkpoints is the right call (80%+ of search results were checkpoint noise, side-by-side numbers from 2026-04-25 validation) - CHANGELOG entry for 1.5.1: kind= add, _canonical_topic helper, verify-routes.sh, and the silently-dropped limit= bug fix Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
/stats fan-outs to three MCP tools — mempalace_kg_stats (cached), mempalace_graph_stats (60s TTL cache via #661), mempalace_status (no cache, walks corpus paginated). On the deployed disks daemon with 151K drawers, cold /stats takes ~32s and warm-cache /stats takes ~30-56s because mempalace_status doesn't cache and the corpus walk is intrinsically slow. The previous 30s curl timeout meant every verify-routes run flagged a false positive on /stats. 90s gives headroom on the production palace. Companion README update: documents /stats as a heavy admin route, not a chat-turn path. End-to-end verified post-deploy 2026-04-25: all 11 assertions pass against the deployed v1.5.1 daemon, including the kind= filter (scope 150811 vs 151448 = ~637 checkpoints filtered) and the max_results→limit bug fix. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
rboarescu
left a comment
There was a problem hiding this comment.
Thanks for this — tested all new endpoints against a Docker container running the PR branch. /silent-save, /repair (all four modes), the queue/drain during rebuild, and the 409 on overlapping repairs all work correctly.
Three issues to fix before I can merge:
-
API key hardcoded in clients/palace-mode (lines 16–17)
DEFAULT_KEY="censored_although_too_late_should_rotate_API_KEY"
Personal API key in a public repo. Please remove — users should supply it via PALACE_API_KEY env var. Same for DEFAULT_URL, leave it empty or document it as a required env var. -
Hardcoded absolute path in clients/palace-mcp-dispatch.sh (line 8)
exec "$PYTHON" /home/jp/Projects/palace-daemon/clients/mempalace-mcp.py ...
Only works on your machine. palace-mode already resolves this correctly with readlink -f — palace-mcp-dispatch.sh should use the same pattern.
I never knew github could be such an exciting ride... and rabbit hole.
Thanks again!
Single-shot structural snapshot that mirrors /stats's asyncio.gather shape, plus: - parallel mempalace_list_rooms fan-out per wing - direct read-only sqlite read of knowledge_graph.sqlite3 (no extra MCP roundtrip) Replaces what multipass-structural-memory-eval (SME) would otherwise compose by serially calling list_wings + list_rooms × N + list_tunnels + kg_stats over HTTP. On the 151K-drawer canonical palace, list_wings alone takes ~30s — a serial composition costs minutes; this gathers in well under that. KG read uses URI-mode ?mode=ro so the daemon can't accidentally write. Schema differences tolerated via per-query OperationalError catch. _kg_path() derives location from _mp._config.palace_path so non-default deployments work unchanged. Spec: docs/graph-endpoint.md. SME adapter prefers /graph once daemon ≥ 1.6.0 and falls back to MCP composition otherwise — no coordinated deploy needed. Verify-routes.sh has a /graph probe. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…nnels mempalace_list_tunnels and mempalace_graph_stats disagree on what counts as a tunnel (mempalace 3.3.4 — list_tunnels returns [], graph_stats reports 9 on the 151K-drawer canonical palace). The fix in palace-daemon is to shadow graph_stats.top_tunnels for /graph.tunnels so /graph.tunnels and /stats.graph.tunnel_rooms always agree. Drops one MCP call from the gather (tunnels piggybacks on the graph_stats roundtrip we already need for kg_stats). Spec Part 2 tracks the upstream reconciliation separately. Live verified against canonical palace post-fix: tunnels has 9 entries including the diary, technical, architecture rooms — matching what /stats has been reporting all along.
Drops the list_wings + list_rooms × N MCP fan-out (which serialized through the 4-slot read semaphore and stalled indefinitely under concurrent search load — observed by SME adapter as 60s+ timeouts). New shape for /graph internals: - graph_stats (MCP, cheap, gives tunnels) - kg_stats (MCP, cheap) - wings + rooms-per-wing → direct sqlite read of embedding_metadata - kg_entities + triples → direct sqlite read of knowledge_graph.sqlite3 All four phases gather in parallel. Benchmarked on the canonical 151K-drawer palace: structural read drops from 60-120s under contention to ~0.4s, semaphore-free. The MCP roundtrips (graph_stats + kg_stats) remain the floor; both are typically sub-second. Safety: - Read-only via sqlite URI ?mode=ro (cannot acquire write locks) - Chroma uses WAL — concurrent writers see a consistent snapshot - Same pattern _read_kg_direct already uses since v1.6.0 (2003e80) - Per-query try/except OperationalError — schema drift across chroma versions degrades to empty wings/rooms (SME adapter then falls back to its MCP composition path) Couples to chroma's embedding_metadata internal layout, which is not a documented stable API. Acceptable trade given the 200× speedup and the graceful degradation path. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…ity) Aligns hook.py with the kind= search filter introduced in the /repair + /silent-save PR. Saves written with topic="auto-save" would not be returned by kind=checkpoint queries once that PR lands. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
|
Merged locally — conflicts resolved and pushed directly to main in ef6ac03. |
…key, portable paths Two issues called out by @rboarescu in the PR #4 review: 1. clients/palace-mode embedded a hardcoded PALACE_API_KEY default value at lines 16-17. The author key has been treated as compromised and will be rotated on next daemon restart; this commit stops the bleed by reading both URL and key from the environment with no in-source defaults. `palace-mode remote` now fails fast with a clear error if neither PALACE_DAEMON_URL nor PALACE_API_KEY is set, instead of silently writing empty strings into settings.local.json. 2. clients/palace-mcp-dispatch.sh hardcoded an absolute path to its sibling clients/mempalace-mcp.py, only working on the author's machine. Now resolves siblings via readlink -f / dirname (same pattern palace-mode already used), so the dispatcher works wherever the clients/ directory lives. No daemon restart needed for these client-side fixes. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
|
Thanks for the careful read — both real, both fixed in
The "exciting ride" line resonates. I only started actually doing this kind of collaboration around my birthday, so we're both fairly fresh at it. There's a real difference between making something work for yourself and making it portable, reviewable, and not actively hostile to other contributors — every time I think I've internalized that, something like the API key leak reminds me there's more. Re: the merge — thanks for picking up the substance and resolving conflicts directly. Happy to send any of the post-1.5.0 work on the fork ( |
Pulls in 8 commits we were behind on, most importantly: - v1.4.4: Dockerfile + docker-compose.yml + HNSW monitoring + granular concurrency env vars (PALACE_MAX_READ_CONCURRENCY / PALACE_MAX_WRITE_CONCURRENCY) - v1.4.5: time-based saves, session-end force save, state cleanup - 0060190: use CHECKPOINT_TOPIC constant for diary writes (PR #4 compatibility — composes cleanly with our existing CHECKPOINT_TOPIC + CHECKPOINT_TOPIC_SYNONYMS block) - a64244c: revert MCP-breaking toast injection, keep REST endpoint toasts - d0aabb9 v1.5.1: harden _get_collection (silent failures logged, cache self-healing retry, num_threads=1 enforced on every open), /health returns 503 on degraded palace, systemd watchdog via sd_notify, startup warmup opens the collection - patches/ + scripts/apply_patches.sh: upstream's patch-style backport system Conflict resolution: - main.py: kept VERSION at 1.7.2 (was 1.7.1; bumped to reflect the upstream merge). Kept our CHECKPOINT_TOPIC + CHECKPOINT_TOPIC_SYNONYMS block AND adopted upstream's PALACE_MAX_READ/WRITE_CONCURRENCY env var split. - palace-daemon.service: kept our placeholder (User=user, /home/user/...) and ExecStartPre/ExecStartPost hooks; adopted upstream's Type=notify + NotifyAccess=main for the watchdog. - README.md: kept fork-shaped narrative wholesale per the fork-README-handling pattern. Queue table needs a follow-up sweep to reflect what's now upstream (watchdog, HNSW threads enforcement, collection cache auto-retry, --force flag) — those rows can be retired. - CHANGELOG.md: added a v1.7.2 entry at the top listing what came in from upstream's v1.5.1; kept our own v1.5.1 entry below for fork history with a note about the v1.5.1 naming collision and PR #4's closed-not-merged status. No tests in this repo; smoke test will run via deploy.sh after this commit.
…wording Reordered the queue so the smallest, most universally-applicable changes sit at the top and the JP-personalized tooling sits at the bottom. The prior order ranked by raw size, which buried the actual best candidates (`limit=` bug fix, `_canonical_topic`, `/graph.tunnels`) under deploy tooling that needs significant generalization before it would be a reasonable PR. New top of queue (PR-ready, universal): 1. `limit=` bug fix on /search /context (tiny, clear bug) 2. `_canonical_topic()` synonym rewrite (composes with upstream's CHECKPOINT_TOPIC constant from 0060190) 3. `/graph.tunnels` derived from graph_stats (small fix) 4. `scripts/verify-routes.sh` (universal curl smoke test) 5. `GET /graph` (medium, depends on a sqlite-direct read pattern) 6. `GET /viz` (medium, depends on /graph) Marked **needs generalization** on three rows that have JP-specific assumptions baked in: - `clients/palace-mode` (Claude Code plugin cache layout) - `scripts/auto-repair-if-empty.sh` (systemd --user shape) - `scripts/deploy.sh` (PALACE_HOST=disks, ~/.claude/settings.local.json, Syncthing-mirrored source tree, fork-mempalace import-check that would fail on upstream installs) Also tightened the front-matter wording on PR #4: rboarescu cherry-picked the contents into upstream main directly (`ef6ac03`) rather than merging the PR via the GitHub UI, so "merged via PR #4" was technically misleading. Updated to reflect the actual merge mechanism. Bumped version 1.7.0 → 1.7.2 to match the post-upstream-merge state. Drawer count refreshed 150,814 → 150,891.
…eralization Mirrors the mempalace fork's README pattern: top-of-document section listing PRs we've filed and their state, then a "pending" queue for PR-ready items, then an explicit "needs generalization" table for items that bake in JP-specific assumptions and are held until they can be split into a universal shape. ## What's now where - **Open upstream PRs** (3): rboarescu#7 limit= fix, rboarescu#8 _canonical_topic, rboarescu#9 verify-routes.sh - **Recently landed in upstream**: PR #4 (cherry-picked into ef6ac03, closed) — the v1.5.0 daemon work - **Pending PRs — ready to file** (6): GET /graph (folds in /graph.tunnels fix), GET /viz, palace-mcp-dispatch.sh, mempal-fast.py, event-log-frame.md, graph-endpoint.md - **Needs generalization before PR** (3): deploy.sh (PALACE_HOST=disks + fork-mempalace import-check), palace-mode (Claude Code plugin cache layout), auto-repair-if-empty.sh (systemd --user shape) ## Why JP's note: "we can wait if we need to generalize things but make sure that is noted in the readme like we do with memorypalace." Filing PRs that have JP-specific assumptions baked in would either fail in contributor review or land features other operators couldn't use. Better to be explicit about which items are PR-ready, which are queued, and which need generalization work first — and what that work is — so anyone can pick up a generalization-ready item without re-discovering the constraint set. `/graph.tunnels` was previously listed as its own row but it's structurally a sub-feature of `/graph`; merged into that row to avoid implying it can ship independently.
`clients/palace-mode` shipped with two embedded user-specific defaults
on lines 19-20: a `DEFAULT_URL` pointing at my homelab daemon and a
real hex `DEFAULT_KEY`. Both are leftovers from when the script was
extracted from my own install — the URL is my homelab address, and
the hex string is a real API key I authored (it has since been
rotated, so it doesn't authenticate against my daemon anymore — but
the literal value is still in upstream main's source).
## Fix
Read both from environment, no in-source defaults:
```bash
DEFAULT_URL="${PALACE_DAEMON_URL_DEFAULT:-${PALACE_DAEMON_URL:-}}"
DEFAULT_KEY="${PALACE_API_KEY:-}"
```
Plus fail-fast guards in `set_mode_remote()` so silent empty-string
writes don't end up in `~/.claude/settings.local.json`. If either URL
or key is unset, the script prints a clear error pointing at the right
env vars and exits 2 instead of writing broken config.
## Why now
Three reasons even though the embedded key is no longer live:
1. Trains contributors that hex defaults are normal — anyone reading
the source might assume embedding their own key is fine.
2. Default `palace-mode remote` (no args) writes my homelab URL into
a fresh user's `settings.local.json`, sending their hooks to a
daemon they have no auth for.
3. Reviewable provenance — removing the literal makes future audits
cleaner.
## Scope
+11 / -1. Single file. No new dependencies. No behavior change for
operators who set `PALACE_DAEMON_URL` and `PALACE_API_KEY` in their
shell rc. Operators who relied on the embedded defaults get a clear
error pointing at what they need to set.
Originally part of fork commit `d450cef` ("address upstream review on
PR #4"); the portable-paths half of that commit shipped as PR rboarescu#10.
Sending the embedded-defaults half standalone.
The dispatcher in `clients/palace-mcp-dispatch.sh` has a hardcoded `/home/jp/Projects/palace-daemon/clients/mempalace-mcp.py` in the `--daemon` branch — a leftover from when it was extracted into PR #4. As shipped in `main` today, the dispatcher only works on my machine; on every other operator's machine the `exec` target doesn't exist and the daemon-proxy mode fails immediately. ## Fix Resolve the sibling client path at runtime: ```bash HERE="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" &> /dev/null && pwd -P)" MCP_CLIENT="$HERE/mempalace-mcp.py" ``` `cd "$(dirname …)" && pwd -P` is POSIX-portable. It resolves symlinks in the directory path via the filesystem, which covers the actual common case for plugin invocations (the script's parent directory may be a symlink under a Claude Code plugin cache). The earlier `readlink -f` approach failed on macOS — BSD `readlink` doesn't accept `-f`, so the dispatcher would silently `command not found` on macOS clients (caught in Copilot review). Also adds an explicit existence check on `$MCP_CLIENT` before exec, so the failure mode is a clear error message ("missing sibling client at …") rather than a generic Python "can't open file" trace. Same behavior on my machine; correct + portable behavior everywhere else. The "in-process" branch (`python -m mempalace.mcp_server`) was already portable, no change there. Originally extracted from a fork-side commit (`d450cef`) that landed during PR #4's review cycle but didn't make it into the final cherry-pick. Sending standalone per the "small separate PRs" preference.
A curl-based smoke test that exercises every public read-only route
against a running daemon. Designed for manual post-deploy validation
(`systemctl restart palace-daemon`-style), not CI — depends on a live
palace.
## What it probes
GET-only routes:
- /health (auth-bypass; always responds)
- /search?q=palace&limit=2
- /context?topic=palace&limit=2
- /stats
- /repair/status
Plus a regression check on `limit=3` returning 3 hits end-to-end (only
fails if the palace had ≥3 matches and the response was capped at fewer;
flags rather than fails on empty/unreachable palaces).
POST routes (/repair, /silent-save, /memory, /mine, /flush, /reload,
/backup) intentionally skipped — they mutate state and belong in a
dedicated integration run against a throwaway palace, not in a manual
smoke against production.
## Usage
PALACE_DAEMON_URL=http://your-host:8085 \
PALACE_API_KEY=... \
scripts/verify-routes.sh
Both env vars optional; defaults to localhost:8085 and no auth header.
## Why this shape
Easier than maintaining a full pytest harness for one-shot deploy
verification, and the actual failure surface (HTTP 500, malformed
JSON, missing routes after merge) is what curl + grep catches well.
The probe helpers (`probe`, `probe_json_field`) keep failure messages
short and actionable.
Originally extracted from a fork-side commit; sending standalone per
the "small separate PRs" preference noted on PR #4.
`clients/palace-mode` shipped with two embedded user-specific defaults
on lines 19-20: a `DEFAULT_URL` pointing at my homelab daemon and a
real hex `DEFAULT_KEY`. Both are leftovers from when the script was
extracted from my own install — the URL is my homelab address, and
the hex string is a real API key I authored (it has since been
rotated, so it doesn't authenticate against my daemon anymore — but
the literal value is still in upstream main's source).
## Fix
Read both from environment, no in-source defaults:
```bash
DEFAULT_URL="${PALACE_DAEMON_URL_DEFAULT:-${PALACE_DAEMON_URL:-}}"
DEFAULT_KEY="${PALACE_API_KEY:-}"
```
Plus fail-fast guards in `set_mode_remote()` so silent empty-string
writes don't end up in `~/.claude/settings.local.json`. If either URL
or key is unset, the script prints a clear error pointing at the right
env vars and exits 2 instead of writing broken config.
## Why now
Three reasons even though the embedded key is no longer live:
1. Trains contributors that hex defaults are normal — anyone reading
the source might assume embedding their own key is fine.
2. Default `palace-mode remote` (no args) writes my homelab URL into
a fresh user's `settings.local.json`, sending their hooks to a
daemon they have no auth for.
3. Reviewable provenance — removing the literal makes future audits
cleaner.
## Scope
+11 / -1. Single file. No new dependencies. No behavior change for
operators who set `PALACE_DAEMON_URL` and `PALACE_API_KEY` in their
shell rc. Operators who relied on the embedded defaults get a clear
error pointing at what they need to set.
Originally part of fork commit `d450cef` ("address upstream review on
PR #4"); the portable-paths half of that commit shipped as PR rboarescu#10.
Sending the embedded-defaults half standalone.
Adds `_canonical_topic()` between `/silent-save`'s payload parse and
`tool_diary_write`. When a topic comes in matching a known legacy
synonym (currently `"auto-save"`), it gets rewritten to the canonical
`CHECKPOINT_TOPIC` ("checkpoint") and a warning is logged.
## Why
The bundled clients (`clients/hook.py`, `clients/mempal-fast.py`)
already write the canonical value, so this is defense-in-depth. It
catches:
1. Future drift in those clients (someone edits one and forgets the
convention)
2. Third-party callers POSTing to `/silent-save` directly with the
older synonym
3. Buggy hook installs that haven't been updated
Without this, drift would silently leak into palace metadata and
mempalace's `tool_diary_write` topic-routing — which sends matching
topics to the dedicated `mempalace_session_recovery` collection rather
than the searchable main collection — would miss those drawers and
they'd dominate vector top-N in `mempalace_search` results. The
warning makes drift observable; the rewrite keeps the palace clean.
## Scope
- New module-level constants: `CHECKPOINT_TOPIC`, `CHECKPOINT_TOPIC_SYNONYMS`
- New helper: `_canonical_topic(topic: str) -> str` (10 lines)
- One call-site change in `_do_silent_save_write`
Non-checkpoint topics (e.g. `"musings"`, `"decisions"`) pass through
unchanged — the rewrite only fires on synonyms in the explicit list.
No new dependencies, no behavior change for already-canonical writes,
no breaking change for callers using the default.
Originally extracted from a larger fork-side commit (`b4b39fc`) that
bundled this with a now-retired `kind=` filter; sending it standalone
per the "small separate PRs" preference noted on PR #4.
…nding queue; trim patch Filed four upstream PRs on 2026-04-30: - rboarescu#15 feat: GET /viz status dashboard (stacks on rboarescu#13) - rboarescu#16 feat: GET /list — query-free metadata browse - rboarescu#17 feat: DELETE /memory + PATCH /memory - rboarescu#18 feat(lifespan): auto-migrate Stop-hook checkpoints on startup Also rebased PR rboarescu#13 onto upstream/main to clear a CHANGELOG conflict left by upstream's b4aee82 (patch sync) — state went CONFLICTING -> MERGEABLE / CLEAN. README: - Open upstream PRs table: four new rows (rboarescu#15-rboarescu#18) plus a 2026-04-30 note covering today's rebase + new PRs in one breath. - Pending PRs queue: now empty. Replaced the four stale rows (event-log-frame and graph-endpoint were already in flight via rboarescu#11 and rboarescu#13; mempal-fast.py was already upstream via the merged PR #4 omnibus; /viz is now PR rboarescu#15) with a brief empty-state note. CLAUDE.md: - Patch description trimmed to reflect that the hnsw:num_threads enforcement landed upstream via _pin_hnsw_threads(); only the log + retry-once slice remains. patches/mcp_server_get_collection.patch: - Regenerated against current mempalace develop. The patch is now just the "log exception + retry once on cache failure" change. Filed upstream as MemPalace/mempalace#1286; once that merges this patch retires. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Seven findings, all addressed: 1. **Path translation before is_dir check** (Copilot watcher.py:133). parse_watch_dirs() validated against the daemon filesystem before any path translation, so a PALACE_WATCH_DIRS value written in the client namespace was always rejected as "not a directory". Add a translator= kwarg; main.py now passes _translate_client_path so client-namespace entries reach the daemon's view first. 2. **Drop on_any_event in favor of specific subscriptions** (Copilot watcher.py:160). watchdog 3.x emits opened/closed events on plain file reads on Linux; routing those through the debounce would re-mine the project on every file open. Replaced on_any_event with explicit on_created / on_modified / on_moved / on_deleted. 3. **Check dest_path on FileMovedEvent** (Copilot watcher.py:171). Editor save-via-rename writes a temp file then renames to the real filename. The src_path on the move event is the temp; only dest_path has the real extension. Extracted _has_watchable_extension() helper; on_moved() checks both sides. 4. **Per-target schedule() failure isolation** (Copilot watcher.py:219). A bare for-loop let an exception in observer.schedule() (e.g. inotify watch limit on a large repo) abort the entire service. Wrap each schedule() in try/except; one failed target no longer disables the others. Logged at warning level. 5. **Cancel debounce timers on stop** (Copilot watcher.py:224). stop() tore down the observer but left armed threading.Timer objects to fire mid-teardown. Add cancel_pending() on the handler; service stop() walks all handlers first. 6. **Surface watcher-coroutine errors** (Copilot watcher.py:246). asyncio.run_coroutine_threadsafe() returns a Future the caller must observe; dropping it swallowed exceptions silently. Add a _log_future_exception done-callback. 7. **/watch reports running state accurately** (Copilot main.py:558). app.state.watcher was set unconditionally even when start() returned early (empty env, missing dep, all targets failed). Now only published to app.state when watcher.is_running. GET /watch reports running: false for idle/disabled instead of misleading running: true. Tests: 28 pass (was 24). New cases: - test_rename_to_watched_extension_fires (covers #3) - test_cancel_pending_drops_armed_timer (covers #5) - test_one_failed_target_doesnt_disable_others (covers #4) - test_translator_runs_before_is_dir (covers #1) Existing handler tests retargeted from on_any_event to on_modified since on_any_event no longer exists in this revision. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Seven findings, all addressed: 1. **Path translation before is_dir check** (Copilot watcher.py:133). parse_watch_dirs() validated against the daemon filesystem before any path translation, so a PALACE_WATCH_DIRS value written in the client namespace was always rejected as "not a directory". Add a translator= kwarg; main.py now passes _translate_client_path so client-namespace entries reach the daemon's view first. 2. **Drop on_any_event in favor of specific subscriptions** (Copilot watcher.py:160). watchdog 3.x emits opened/closed events on plain file reads on Linux; routing those through the debounce would re-mine the project on every file open. Replaced on_any_event with explicit on_created / on_modified / on_moved / on_deleted. 3. **Check dest_path on FileMovedEvent** (Copilot watcher.py:171). Editor save-via-rename writes a temp file then renames to the real filename. The src_path on the move event is the temp; only dest_path has the real extension. Extracted _has_watchable_extension() helper; on_moved() checks both sides. 4. **Per-target schedule() failure isolation** (Copilot watcher.py:219). A bare for-loop let an exception in observer.schedule() (e.g. inotify watch limit on a large repo) abort the entire service. Wrap each schedule() in try/except; one failed target no longer disables the others. Logged at warning level. 5. **Cancel debounce timers on stop** (Copilot watcher.py:224). stop() tore down the observer but left armed threading.Timer objects to fire mid-teardown. Add cancel_pending() on the handler; service stop() walks all handlers first. 6. **Surface watcher-coroutine errors** (Copilot watcher.py:246). asyncio.run_coroutine_threadsafe() returns a Future the caller must observe; dropping it swallowed exceptions silently. Add a _log_future_exception done-callback. 7. **/watch reports running state accurately** (Copilot main.py:558). app.state.watcher was set unconditionally even when start() returned early (empty env, missing dep, all targets failed). Now only published to app.state when watcher.is_running. GET /watch reports running: false for idle/disabled instead of misleading running: true. Tests: 28 pass (was 24). New cases: - test_rename_to_watched_extension_fires (covers #3) - test_cancel_pending_drops_armed_timer (covers #5) - test_one_failed_target_doesnt_disable_others (covers #4) - test_translator_runs_before_is_dir (covers #1) Existing handler tests retargeted from on_any_event to on_modified since on_any_event no longer exists in this revision. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
* feat(watcher): file-watcher service for auto-mining on file change Closes the "automining when files are added or updated" half of JP's ask. The other half (CLI add/remove of mined data) ships in jphein/mempalace#4 as `mempalace mined` + `mempalace purge --source-file`. Adds a watchdog-based file watcher that runs inside the daemon's lifespan. Configured via `PALACE_WATCH_DIRS` env var: PALACE_WATCH_DIRS="/home/jp/Projects/realmwatch=wing_realmwatch, /home/jp/Projects/oracle=wing_oracle" Each entry is `path` or `path=wing`. Bare path derives wing via `mempalace.config.normalize_wing_name(basename)` to match the local-spawn miner default. Non-existent paths are warned-and-skipped, not fatal — a misconfigured env var doesn't kill startup. Architecture: * watcher.py — pure-Python module: parse_watch_dirs(), _DebouncedMineHandler (subclass of FileSystemEventHandler with a per-target debounce timer), WatcherService (lifecycle wrapper around watchdog.Observer). * main.py lifespan — instantiate WatcherService, schedule one recursive watch per target, register stop() on shutdown. Failures are non-fatal; the daemon serves traffic regardless. * main.py — `_internal_mine()` async closure runs the same `mempalace mine ... --mode projects --wing X` argv as the existing /mine endpoint, gated by the same `_mine_sem`. * main.py — new `GET /watch` endpoint surfaces the running target list. No POST/DELETE — runtime add/remove requires daemon restart (operator just edits the systemd unit env and restarts). Debounce: events on the same target collapse via a 2-second per-target timer. Catches editor write+rename storms and *.swp / *.lock churn. Also pre-filters by extension allowlist (29 extensions: code, configs, markdown). Inotify fires for everything; the handler discards non-watched extensions before the timer schedules. Path translation: when a watch path lives in a Syncthing-replicated directory, the daemon-side path is translated through `PALACE_DAEMON_PATH_MAP` before the mine subprocess runs (same mechanism used by /mine). Tests: 11 new unittest cases in tests/test_watcher.py: - TestParseWatchDirs (7 cases): empty, single-with-derived-wing, explicit-wing, multiple, skips-nonexistent, skips-files, env-fallback - TestDebouncedMineHandler (4 cases): single-event-fires, burst-collapses-to-one, skips-directory-events, skips-bad-extensions Stacked on #1 (path-translation). When that merges, this PR auto-rebases against main; reviewers see only the new commits in this PR. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * fix(watcher): address Copilot review on #2 Seven findings, all addressed: 1. **Path translation before is_dir check** (Copilot watcher.py:133). parse_watch_dirs() validated against the daemon filesystem before any path translation, so a PALACE_WATCH_DIRS value written in the client namespace was always rejected as "not a directory". Add a translator= kwarg; main.py now passes _translate_client_path so client-namespace entries reach the daemon's view first. 2. **Drop on_any_event in favor of specific subscriptions** (Copilot watcher.py:160). watchdog 3.x emits opened/closed events on plain file reads on Linux; routing those through the debounce would re-mine the project on every file open. Replaced on_any_event with explicit on_created / on_modified / on_moved / on_deleted. 3. **Check dest_path on FileMovedEvent** (Copilot watcher.py:171). Editor save-via-rename writes a temp file then renames to the real filename. The src_path on the move event is the temp; only dest_path has the real extension. Extracted _has_watchable_extension() helper; on_moved() checks both sides. 4. **Per-target schedule() failure isolation** (Copilot watcher.py:219). A bare for-loop let an exception in observer.schedule() (e.g. inotify watch limit on a large repo) abort the entire service. Wrap each schedule() in try/except; one failed target no longer disables the others. Logged at warning level. 5. **Cancel debounce timers on stop** (Copilot watcher.py:224). stop() tore down the observer but left armed threading.Timer objects to fire mid-teardown. Add cancel_pending() on the handler; service stop() walks all handlers first. 6. **Surface watcher-coroutine errors** (Copilot watcher.py:246). asyncio.run_coroutine_threadsafe() returns a Future the caller must observe; dropping it swallowed exceptions silently. Add a _log_future_exception done-callback. 7. **/watch reports running state accurately** (Copilot main.py:558). app.state.watcher was set unconditionally even when start() returned early (empty env, missing dep, all targets failed). Now only published to app.state when watcher.is_running. GET /watch reports running: false for idle/disabled instead of misleading running: true. Tests: 28 pass (was 24). New cases: - test_rename_to_watched_extension_fires (covers #3) - test_cancel_pending_drops_armed_timer (covers #5) - test_one_failed_target_doesnt_disable_others (covers #4) - test_translator_runs_before_is_dir (covers #1) Existing handler tests retargeted from on_any_event to on_modified since on_any_event no longer exists in this revision. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> --------- Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
Six changes from running palace-daemon. Happy to split into separate PRs if you'd prefer.
fix: warm ChromaDB client on startup, remove broken signal handlers(2012f79)ChromaDB's Rust HNSW binding can SIGSEGV on the first request when started cold. Pre-warming the client in the FastAPI lifespan (before yield) avoids this.
Also drops the custom asyncio signal handlers that called
sys.exit()from inside an asyncio callback — this produced aCancelledError/SystemExittraceback on stop and skipped the lifespan shutdown flush. Uvicorn's built-in signal handling already does the right thing.feat: add /repair, /silent-save endpoints and themed messages(d2e867c)POST /repairwith four modes:light,scan,prune,rebuild. Acquires_exclusive_palace()(all read/write/mine semaphores) forrebuild.POST /silent-savewith double-checked locking around the repair flag. While a/repair mode=rebuildis in progress, saves are queued to a JSONL file at<palace_parent>/palace-daemon-pending.jsonland drained automatically after the repair completes.messages.pywith themed user-facing strings returned insystemMessage.feat: add --palace flag to purge_wings.py(34fee93)Script hardcoded
~/.mempalace/palace.--palace <path>lets it run against non-default deployments. Default unchanged.fix(mcp-client): bump HTTP timeout 30s→120s, env-tunable(352b25c)The 30s default in
clients/mempalace-mcp.pyis too short for cold-startmempalace_statuscalls on large palaces (HNSW rebuild on a few hundred thousand drawers can take several minutes). Default raised to 120s, overridable viaPALACE_MCP_TIMEOUT.feat(clients): palace-mode CLI + fast-path hook + MCP dispatcher(f8e0faa)Three deployment helpers in
clients/:PALACE_DAEMON_URLin user settings. Subcommands:status,local,remote [URL],install(idempotent re-apply after plugin updates),verify.mempalace-mcp.py --daemon $URLwhenPALACE_DAEMON_URLis set, otherwise falls back to in-processpython -m mempalace.mcp_server./silent-save. Imports nomempalace, loads no chromadb — so cold hook fires can't trigger the same SIGSEGV the warmup commit fixes for the daemon itself.Conceptually separable from the daemon changes — happy to split this commit into a follow-up PR if you'd rather review the daemon work first. (Bundled here because it sits in
clients/alongside the existingmempalace-mcp.py.)fix: address Copilot review on PR #4 + tighten queue gating(bd68632)Pulled from Copilot's review of an earlier push:
/silent-savequeue gating now triggers onmode == "rebuild"only —light/scan/prunedon't replace the collection, so saves can proceed normally during those._enqueue_pending_writeis now async + usesasyncio.to_threadso the disk append doesn't block the event loop under bursty enqueue load._drain_pending_writesholds_write_semper entry (matching the documented contract of_do_silent_save_write); failed entries are quarantined to a timestamped sibling so the next drain doesn't replay successful saves.body["message_count"]is parsed via try/except, returns 400 on non-integer input.asyncio.get_event_loop()→asyncio.get_running_loop()(deprecated in 3.12+ inside async functions).PALACE_MCP_TIMEOUTparse hardened in the proxy client — bad env value warns and falls back to 120s instead of crashing.