chore(scripts): add verify-routes.sh smoke test#9
chore(scripts): add verify-routes.sh smoke test#9jphein wants to merge 1 commit intorboarescu:mainfrom
Conversation
…eralization Mirrors the mempalace fork's README pattern: top-of-document section listing PRs we've filed and their state, then a "pending" queue for PR-ready items, then an explicit "needs generalization" table for items that bake in JP-specific assumptions and are held until they can be split into a universal shape. ## What's now where - **Open upstream PRs** (3): rboarescu#7 limit= fix, rboarescu#8 _canonical_topic, rboarescu#9 verify-routes.sh - **Recently landed in upstream**: PR #4 (cherry-picked into ef6ac03, closed) — the v1.5.0 daemon work - **Pending PRs — ready to file** (6): GET /graph (folds in /graph.tunnels fix), GET /viz, palace-mcp-dispatch.sh, mempal-fast.py, event-log-frame.md, graph-endpoint.md - **Needs generalization before PR** (3): deploy.sh (PALACE_HOST=disks + fork-mempalace import-check), palace-mode (Claude Code plugin cache layout), auto-repair-if-empty.sh (systemd --user shape) ## Why JP's note: "we can wait if we need to generalize things but make sure that is noted in the readme like we do with memorypalace." Filing PRs that have JP-specific assumptions baked in would either fail in contributor review or land features other operators couldn't use. Better to be explicit about which items are PR-ready, which are queued, and which need generalization work first — and what that work is — so anyone can pick up a generalization-ready item without re-discovering the constraint set. `/graph.tunnels` was previously listed as its own row but it's structurally a sub-feature of `/graph`; merged into that row to avoid implying it can ship independently.
There was a problem hiding this comment.
Pull request overview
Adds a manual, curl-based smoke test script to validate that palace-daemon’s public read-only HTTP routes are reachable and returning expected-shaped responses after a deploy (intended for post-restart checks against a live palace, not CI).
Changes:
- Add
scripts/verify-routes.shto probe/health,/search,/context,/stats, and/repair/status. - Add a small regression check that
limit=3can return 3 hits end-to-end.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| echo "→ palace-daemon at $URL" | ||
| echo | ||
|
|
||
| # /health — no auth, should always respond. | ||
| probe "GET /health" "palace-daemon" "$URL/health" |
There was a problem hiding this comment.
GET /health will pass even when the daemon reports a degraded palace (the endpoint returns HTTP 503 with the same JSON body including "daemon": "palace-daemon"). Since this script is intended as a post-deploy validation, it should assert the HTTP status is 200 and/or validate the JSON status field is ok so it can actually catch degraded deployments.
| echo "→ palace-daemon at $URL" | |
| echo | |
| # /health — no auth, should always respond. | |
| probe "GET /health" "palace-daemon" "$URL/health" | |
| probe_health() { | |
| local label="$1" | |
| shift | |
| local resp http_code body summary | |
| resp=$(curl -sS --max-time 90 -w $'\n%{http_code}' "${H_AUTH[@]}" "$@" 2>&1) || fail "$label — curl error" | |
| http_code="${resp##*$'\n'}" | |
| body="${resp%$'\n'*}" | |
| if [ "$http_code" != "200" ]; then | |
| fail "$label — expected HTTP 200, got $http_code: ${body:0:200}" | |
| fi | |
| summary=$(printf '%s' "$body" | python3 -c " | |
| import json, sys | |
| try: | |
| d = json.load(sys.stdin) | |
| print(f\"{d.get('daemon', '')}\t{d.get('status', '')}\") | |
| except Exception as e: | |
| print(f'PARSE-ERROR:{e}', file=sys.stderr) | |
| " 2>&1) | |
| if [ -z "$summary" ] || [ "${summary:0:13}" = "PARSE-ERROR:" ]; then | |
| fail "$label — bad JSON response: $summary" | |
| fi | |
| local daemon status | |
| IFS=$'\t' read -r daemon status <<< "$summary" | |
| if [ "$daemon" = "palace-daemon" ] && [ "$status" = "ok" ]; then | |
| pass "$label (daemon=$daemon, status=$status, HTTP $http_code)" | |
| else | |
| fail "$label — expected daemon=palace-daemon and status=ok, got daemon=${daemon:-<missing>} status=${status:-<missing>}" | |
| fi | |
| } | |
| echo "→ palace-daemon at $URL" | |
| echo | |
| # /health — no auth, should always respond healthy. | |
| probe_health "GET /health" "$URL/health" |
There was a problem hiding this comment.
Addressed — every curl call now uses curl -fsS so /health 503 fails the script directly instead of letting a 503 body containing the expected substring slip through as a pass. Comment block at scripts/verify-routes.sh:28-30 documents this contract.
| COUNT=$(curl -sS --max-time 90 "${H_AUTH[@]}" "$URL/search?q=palace&limit=3" \ | ||
| | python3 -c "import json, sys; print(len(json.load(sys.stdin).get('results', [])))" 2>&1) | ||
| if [ "$COUNT" = "3" ]; then | ||
| pass "limit=3 returns 3 hits" | ||
| elif [ "$COUNT" = "0" ] || [ -z "$COUNT" ]; then | ||
| echo " ? limit=3 returned 0 — palace may be empty or unreachable, can't confirm" | ||
| else | ||
| fail "limit=3 returned $COUNT hits — expected 3 (or 0 on empty palace)" |
There was a problem hiding this comment.
The comment claims this check will ?-flag (not fail) when the palace is empty/unreachable, but the current logic only treats COUNT==0 or empty as non-fatal. Connection errors or JSON parse errors will produce a non-empty $COUNT and trigger the failure path. Either treat curl/parse errors as a ? outcome, or update the comment to match the behavior.
| COUNT=$(curl -sS --max-time 90 "${H_AUTH[@]}" "$URL/search?q=palace&limit=3" \ | |
| | python3 -c "import json, sys; print(len(json.load(sys.stdin).get('results', [])))" 2>&1) | |
| if [ "$COUNT" = "3" ]; then | |
| pass "limit=3 returns 3 hits" | |
| elif [ "$COUNT" = "0" ] || [ -z "$COUNT" ]; then | |
| echo " ? limit=3 returned 0 — palace may be empty or unreachable, can't confirm" | |
| else | |
| fail "limit=3 returned $COUNT hits — expected 3 (or 0 on empty palace)" | |
| SEARCH_RESP=$(curl -sS --max-time 90 "${H_AUTH[@]}" "$URL/search?q=palace&limit=3" 2>&1) || SEARCH_RESP="CURL-ERROR:$SEARCH_RESP" | |
| if [ "${SEARCH_RESP#CURL-ERROR:}" != "$SEARCH_RESP" ]; then | |
| echo " ? limit=3 could not be verified — palace may be empty or unreachable: ${SEARCH_RESP#CURL-ERROR:}" | |
| else | |
| COUNT=$(printf '%s' "$SEARCH_RESP" | python3 -c " | |
| import json, sys | |
| try: | |
| print(len(json.load(sys.stdin).get('results', []))) | |
| except Exception as e: | |
| print(f'PARSE-ERROR:{e}') | |
| ") | |
| if [ "$COUNT" = "3" ]; then | |
| pass "limit=3 returns 3 hits" | |
| elif [ "$COUNT" = "0" ] || [ -z "$COUNT" ]; then | |
| echo " ? limit=3 returned 0 — palace may be empty or unreachable, can't confirm" | |
| elif [ "${COUNT:0:12}" = "PARSE-ERROR:" ]; then | |
| echo " ? limit=3 could not be verified — palace may be empty or unreachable: $COUNT" | |
| else | |
| fail "limit=3 returned $COUNT hits — expected 3 (or 0 on empty palace)" | |
| fi |
There was a problem hiding this comment.
Resolved by the curl -fsS change above — connection errors and JSON parse errors now fail the script via curl's error path rather than reaching the COUNT==0 detection, so unexpected failure modes don't silently no-op. The ?-flag-on-empty path is now narrow to the documented case (palace is open + healthy + has zero drawers).
| # PALACE_API_KEY=... \ | ||
| # scripts/verify-routes.sh | ||
|
|
||
| set -e |
There was a problem hiding this comment.
This script uses set -e only; in this repo other bash scripts use strict mode (set -euo pipefail, e.g. scripts/apply_patches.sh:9). Without -u/pipefail, unset vars and pipeline failures (notably the curl | python3 pipelines below) can be masked, making the smoke test less reliable. Consider switching to set -euo pipefail (and adjusting any lines that rely on unset vars).
There was a problem hiding this comment.
Done — set -euo pipefail at line 18 (was set -e only). Mirrors scripts/apply_patches.sh:9's convention, so unset variables and pipeline failures (not just final-command exit codes) now fail fast and visibly.
A curl-based smoke test that exercises every public read-only route
against a running daemon. Designed for manual post-deploy validation
(`systemctl restart palace-daemon`-style), not CI — depends on a live
palace.
## What it probes
GET-only routes:
- /health (auth-bypass; always responds)
- /search?q=palace&limit=2
- /context?topic=palace&limit=2
- /stats
- /repair/status
Plus a regression check on `limit=3` returning 3 hits end-to-end (only
fails if the palace had ≥3 matches and the response was capped at fewer;
flags rather than fails on empty/unreachable palaces).
POST routes (/repair, /silent-save, /memory, /mine, /flush, /reload,
/backup) intentionally skipped — they mutate state and belong in a
dedicated integration run against a throwaway palace, not in a manual
smoke against production.
## Usage
PALACE_DAEMON_URL=http://your-host:8085 \
PALACE_API_KEY=... \
scripts/verify-routes.sh
Both env vars optional; defaults to localhost:8085 and no auth header.
## Why this shape
Easier than maintaining a full pytest harness for one-shot deploy
verification, and the actual failure surface (HTTP 500, malformed
JSON, missing routes after merge) is what curl + grep catches well.
The probe helpers (`probe`, `probe_json_field`) keep failure messages
short and actionable.
Originally extracted from a fork-side commit; sending standalone per
the "small separate PRs" preference noted on PR #4.
c81a457 to
5064167
Compare
…boarescu#10/rboarescu#12/rboarescu#13 to main The amendments to those open upstream PRs caught real bugs that exist on fork main too — production at disks runs fork main, so these were shipping live until now. ## main.py — /graph endpoint (from PR rboarescu#13) - `_read_wings_rooms_direct()`: build `rooms` from union of `wings.keys()` + `rooms_by_wing.keys()` rather than just wings. Otherwise a partial schema-drift (one query OperationalError-ed but the other succeeded) silently drops the half that worked. Real data-loss bug for the SME / /viz consumers. - `/graph` handler: gate the direct-sqlite reads on `_read_sem` via a `_direct_under_sem` helper. Coordinates with `_exclusive_palace()` used by `/repair mode=rebuild` (so /graph yields rather than racing delete-then-create on chroma.sqlite3) and rate-limits unbounded `asyncio.to_thread` reads under load (was 2 threads/request × N concurrent requests). - `_canonical_topic()` (from PR rboarescu#8): drop the `topic: str` annotation and add a non-string isinstance guard at the top, coerce `None`/numbers/lists to `CHECKPOINT_TOPIC` with a warning. A malformed `{"topic": null}` JSON payload no longer leaks `None` through to `tool_diary_write`. ## scripts/verify-routes.sh (from PR rboarescu#9) - `set -euo pipefail` (was `set -e`) — pipeline failures and unset vars now surface instead of being masked. - `curl -fsS` instead of `-sS` on every probe so HTTP non-2xx (e.g. `/health` 503 when palace is degraded) fails curl directly. Without this, a 503 body with `"daemon":"palace-daemon"` slips through the body-grep as a pass. - Limit-check rewrite: split curl exit code from JSON parse so a connection error or non-JSON response `?`-flags rather than fails-with-traceback (the previous `2>&1` redirect captured Python tracebacks into `$COUNT`, which then triggered the fail path with a confusing message). ## clients/palace-mcp-dispatch.sh (from PR rboarescu#10) - Replace `readlink -f` (GNU-specific, fails on macOS BSD readlink) with `cd "$(dirname …)" && pwd -P`. POSIX-portable; covers the actual common case (parent directory is a symlink under the Claude Code plugin cache). - Add explicit existence check on `$MCP_CLIENT` before exec, with a clear error message instead of a generic Python "can't open file" trace. ## clients/palace-mode (from PR rboarescu#12) - Make `PALACE_API_KEY` optional in `set_mode_remote`. Server-side `_check_auth` only enforces auth when its own `PALACE_API_KEY` is set, so a daemon running without auth accepts unauthenticated clients. The previous fail-fast was overly strict for that case. Warn-but-proceed instead. All five fixes are amendments on the corresponding upstream PR branches (force-pushed earlier today). No behavior change for the healthy path; the changes harden failure modes that were caught in review.
Summary
Adds
scripts/verify-routes.sh— a curl-based smoke test for every public read-only route. Designed for manual post-deploy validation (systemctl restart palace-daemon-style), not CI: depends on a live palace.What it probes
GET-only routes:
/health(auth-bypass; always responds)/search?q=palace&limit=2/context?topic=palace&limit=2/stats/repair/statusPlus a regression check on
limit=3returning 3 hits end-to-end (only fails if the palace had ≥3 matches and the response was capped at fewer; flags rather than fails on empty/unreachable palaces).POST routes (
/repair,/silent-save,/memory,/mine,/flush,/reload,/backup) intentionally skipped — they mutate state and belong in a dedicated integration run against a throwaway palace, not in a manual smoke against production.Usage
PALACE_DAEMON_URL=http://your-host:8085 \ PALACE_API_KEY=... \ scripts/verify-routes.shBoth env vars optional; defaults to
localhost:8085and no auth header.Why
Easier than maintaining a full pytest harness for one-shot deploy verification, and the actual failure surface (HTTP 500, malformed JSON, missing routes after a merge) is what
curl + grepcatches well. The probe helpers (probe,probe_json_field) keep failure messages short and actionable.Scope
+93 / -0. Single new shell script underscripts/. No changes tomain.pyor anything else. No new dependencies (curl + python3 stdlib only).Originally extracted from a fork-side commit; sending standalone per the "small separate PRs" preference noted on PR #4.