Skip to content

chore(scripts): add verify-routes.sh smoke test#9

Open
jphein wants to merge 1 commit intorboarescu:mainfrom
jphein:chore/verify-routes-smoke-script
Open

chore(scripts): add verify-routes.sh smoke test#9
jphein wants to merge 1 commit intorboarescu:mainfrom
jphein:chore/verify-routes-smoke-script

Conversation

@jphein
Copy link
Copy Markdown
Contributor

@jphein jphein commented Apr 27, 2026

Summary

Adds scripts/verify-routes.sh — a curl-based smoke test for every public read-only route. Designed for manual post-deploy validation (systemctl restart palace-daemon-style), not CI: depends on a live palace.

What it probes

GET-only routes:

  • /health (auth-bypass; always responds)
  • /search?q=palace&limit=2
  • /context?topic=palace&limit=2
  • /stats
  • /repair/status

Plus a regression check on limit=3 returning 3 hits end-to-end (only fails if the palace had ≥3 matches and the response was capped at fewer; flags rather than fails on empty/unreachable palaces).

POST routes (/repair, /silent-save, /memory, /mine, /flush, /reload, /backup) intentionally skipped — they mutate state and belong in a dedicated integration run against a throwaway palace, not in a manual smoke against production.

Usage

PALACE_DAEMON_URL=http://your-host:8085 \
PALACE_API_KEY=... \
    scripts/verify-routes.sh

Both env vars optional; defaults to localhost:8085 and no auth header.

Why

Easier than maintaining a full pytest harness for one-shot deploy verification, and the actual failure surface (HTTP 500, malformed JSON, missing routes after a merge) is what curl + grep catches well. The probe helpers (probe, probe_json_field) keep failure messages short and actionable.

Scope

+93 / -0. Single new shell script under scripts/. No changes to main.py or anything else. No new dependencies (curl + python3 stdlib only).

Originally extracted from a fork-side commit; sending standalone per the "small separate PRs" preference noted on PR #4.

Copilot AI review requested due to automatic review settings April 27, 2026 16:19
jphein added a commit to jphein/palace-daemon that referenced this pull request Apr 27, 2026
…eralization

Mirrors the mempalace fork's README pattern: top-of-document section
listing PRs we've filed and their state, then a "pending" queue for
PR-ready items, then an explicit "needs generalization" table for
items that bake in JP-specific assumptions and are held until they
can be split into a universal shape.

## What's now where

- **Open upstream PRs** (3): rboarescu#7 limit= fix, rboarescu#8 _canonical_topic, rboarescu#9
  verify-routes.sh
- **Recently landed in upstream**: PR #4 (cherry-picked into ef6ac03,
  closed) — the v1.5.0 daemon work
- **Pending PRs — ready to file** (6): GET /graph (folds in
  /graph.tunnels fix), GET /viz, palace-mcp-dispatch.sh, mempal-fast.py,
  event-log-frame.md, graph-endpoint.md
- **Needs generalization before PR** (3): deploy.sh (PALACE_HOST=disks
  + fork-mempalace import-check), palace-mode (Claude Code plugin
  cache layout), auto-repair-if-empty.sh (systemd --user shape)

## Why

JP's note: "we can wait if we need to generalize things but make sure
that is noted in the readme like we do with memorypalace." Filing PRs
that have JP-specific assumptions baked in would either fail in
contributor review or land features other operators couldn't use.
Better to be explicit about which items are PR-ready, which are queued,
and which need generalization work first — and what that work is —
so anyone can pick up a generalization-ready item without re-discovering
the constraint set.

`/graph.tunnels` was previously listed as its own row but it's
structurally a sub-feature of `/graph`; merged into that row to
avoid implying it can ship independently.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a manual, curl-based smoke test script to validate that palace-daemon’s public read-only HTTP routes are reachable and returning expected-shaped responses after a deploy (intended for post-restart checks against a live palace, not CI).

Changes:

  • Add scripts/verify-routes.sh to probe /health, /search, /context, /stats, and /repair/status.
  • Add a small regression check that limit=3 can return 3 hits end-to-end.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/verify-routes.sh
Comment on lines +61 to +65
echo "→ palace-daemon at $URL"
echo

# /health — no auth, should always respond.
probe "GET /health" "palace-daemon" "$URL/health"
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GET /health will pass even when the daemon reports a degraded palace (the endpoint returns HTTP 503 with the same JSON body including "daemon": "palace-daemon"). Since this script is intended as a post-deploy validation, it should assert the HTTP status is 200 and/or validate the JSON status field is ok so it can actually catch degraded deployments.

Suggested change
echo "→ palace-daemon at $URL"
echo
# /health — no auth, should always respond.
probe "GET /health" "palace-daemon" "$URL/health"
probe_health() {
local label="$1"
shift
local resp http_code body summary
resp=$(curl -sS --max-time 90 -w $'\n%{http_code}' "${H_AUTH[@]}" "$@" 2>&1) || fail "$label — curl error"
http_code="${resp##*$'\n'}"
body="${resp%$'\n'*}"
if [ "$http_code" != "200" ]; then
fail "$label — expected HTTP 200, got $http_code: ${body:0:200}"
fi
summary=$(printf '%s' "$body" | python3 -c "
import json, sys
try:
d = json.load(sys.stdin)
print(f\"{d.get('daemon', '')}\t{d.get('status', '')}\")
except Exception as e:
print(f'PARSE-ERROR:{e}', file=sys.stderr)
" 2>&1)
if [ -z "$summary" ] || [ "${summary:0:13}" = "PARSE-ERROR:" ]; then
fail "$label — bad JSON response: $summary"
fi
local daemon status
IFS=$'\t' read -r daemon status <<< "$summary"
if [ "$daemon" = "palace-daemon" ] && [ "$status" = "ok" ]; then
pass "$label (daemon=$daemon, status=$status, HTTP $http_code)"
else
fail "$label — expected daemon=palace-daemon and status=ok, got daemon=${daemon:-<missing>} status=${status:-<missing>}"
fi
}
echo "→ palace-daemon at $URL"
echo
# /health — no auth, should always respond healthy.
probe_health "GET /health" "$URL/health"

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed — every curl call now uses curl -fsS so /health 503 fails the script directly instead of letting a 503 body containing the expected substring slip through as a pass. Comment block at scripts/verify-routes.sh:28-30 documents this contract.

Comment thread scripts/verify-routes.sh Outdated
Comment on lines +82 to +89
COUNT=$(curl -sS --max-time 90 "${H_AUTH[@]}" "$URL/search?q=palace&limit=3" \
| python3 -c "import json, sys; print(len(json.load(sys.stdin).get('results', [])))" 2>&1)
if [ "$COUNT" = "3" ]; then
pass "limit=3 returns 3 hits"
elif [ "$COUNT" = "0" ] || [ -z "$COUNT" ]; then
echo " ? limit=3 returned 0 — palace may be empty or unreachable, can't confirm"
else
fail "limit=3 returned $COUNT hits — expected 3 (or 0 on empty palace)"
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment claims this check will ?-flag (not fail) when the palace is empty/unreachable, but the current logic only treats COUNT==0 or empty as non-fatal. Connection errors or JSON parse errors will produce a non-empty $COUNT and trigger the failure path. Either treat curl/parse errors as a ? outcome, or update the comment to match the behavior.

Suggested change
COUNT=$(curl -sS --max-time 90 "${H_AUTH[@]}" "$URL/search?q=palace&limit=3" \
| python3 -c "import json, sys; print(len(json.load(sys.stdin).get('results', [])))" 2>&1)
if [ "$COUNT" = "3" ]; then
pass "limit=3 returns 3 hits"
elif [ "$COUNT" = "0" ] || [ -z "$COUNT" ]; then
echo " ? limit=3 returned 0 — palace may be empty or unreachable, can't confirm"
else
fail "limit=3 returned $COUNT hits — expected 3 (or 0 on empty palace)"
SEARCH_RESP=$(curl -sS --max-time 90 "${H_AUTH[@]}" "$URL/search?q=palace&limit=3" 2>&1) || SEARCH_RESP="CURL-ERROR:$SEARCH_RESP"
if [ "${SEARCH_RESP#CURL-ERROR:}" != "$SEARCH_RESP" ]; then
echo " ? limit=3 could not be verified — palace may be empty or unreachable: ${SEARCH_RESP#CURL-ERROR:}"
else
COUNT=$(printf '%s' "$SEARCH_RESP" | python3 -c "
import json, sys
try:
print(len(json.load(sys.stdin).get('results', [])))
except Exception as e:
print(f'PARSE-ERROR:{e}')
")
if [ "$COUNT" = "3" ]; then
pass "limit=3 returns 3 hits"
elif [ "$COUNT" = "0" ] || [ -z "$COUNT" ]; then
echo " ? limit=3 returned 0 — palace may be empty or unreachable, can't confirm"
elif [ "${COUNT:0:12}" = "PARSE-ERROR:" ]; then
echo " ? limit=3 could not be verified — palace may be empty or unreachable: $COUNT"
else
fail "limit=3 returned $COUNT hits — expected 3 (or 0 on empty palace)"
fi

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved by the curl -fsS change above — connection errors and JSON parse errors now fail the script via curl's error path rather than reaching the COUNT==0 detection, so unexpected failure modes don't silently no-op. The ?-flag-on-empty path is now narrow to the documented case (palace is open + healthy + has zero drawers).

Comment thread scripts/verify-routes.sh Outdated
# PALACE_API_KEY=... \
# scripts/verify-routes.sh

set -e
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script uses set -e only; in this repo other bash scripts use strict mode (set -euo pipefail, e.g. scripts/apply_patches.sh:9). Without -u/pipefail, unset vars and pipeline failures (notably the curl | python3 pipelines below) can be masked, making the smoke test less reliable. Consider switching to set -euo pipefail (and adjusting any lines that rely on unset vars).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — set -euo pipefail at line 18 (was set -e only). Mirrors scripts/apply_patches.sh:9's convention, so unset variables and pipeline failures (not just final-command exit codes) now fail fast and visibly.

A curl-based smoke test that exercises every public read-only route
against a running daemon. Designed for manual post-deploy validation
(`systemctl restart palace-daemon`-style), not CI — depends on a live
palace.

## What it probes

GET-only routes:
- /health (auth-bypass; always responds)
- /search?q=palace&limit=2
- /context?topic=palace&limit=2
- /stats
- /repair/status

Plus a regression check on `limit=3` returning 3 hits end-to-end (only
fails if the palace had ≥3 matches and the response was capped at fewer;
flags rather than fails on empty/unreachable palaces).

POST routes (/repair, /silent-save, /memory, /mine, /flush, /reload,
/backup) intentionally skipped — they mutate state and belong in a
dedicated integration run against a throwaway palace, not in a manual
smoke against production.

## Usage

    PALACE_DAEMON_URL=http://your-host:8085 \
    PALACE_API_KEY=... \
        scripts/verify-routes.sh

Both env vars optional; defaults to localhost:8085 and no auth header.

## Why this shape

Easier than maintaining a full pytest harness for one-shot deploy
verification, and the actual failure surface (HTTP 500, malformed
JSON, missing routes after merge) is what curl + grep catches well.
The probe helpers (`probe`, `probe_json_field`) keep failure messages
short and actionable.

Originally extracted from a fork-side commit; sending standalone per
the "small separate PRs" preference noted on PR #4.
@jphein jphein force-pushed the chore/verify-routes-smoke-script branch from c81a457 to 5064167 Compare April 27, 2026 17:50
jphein added a commit to jphein/palace-daemon that referenced this pull request Apr 27, 2026
…boarescu#10/rboarescu#12/rboarescu#13 to main

The amendments to those open upstream PRs caught real bugs that exist
on fork main too — production at disks runs fork main, so these were
shipping live until now.

## main.py — /graph endpoint (from PR rboarescu#13)

- `_read_wings_rooms_direct()`: build `rooms` from union of
  `wings.keys()` + `rooms_by_wing.keys()` rather than just wings.
  Otherwise a partial schema-drift (one query OperationalError-ed
  but the other succeeded) silently drops the half that worked.
  Real data-loss bug for the SME / /viz consumers.
- `/graph` handler: gate the direct-sqlite reads on `_read_sem` via
  a `_direct_under_sem` helper. Coordinates with `_exclusive_palace()`
  used by `/repair mode=rebuild` (so /graph yields rather than racing
  delete-then-create on chroma.sqlite3) and rate-limits unbounded
  `asyncio.to_thread` reads under load (was 2 threads/request × N
  concurrent requests).
- `_canonical_topic()` (from PR rboarescu#8): drop the `topic: str` annotation
  and add a non-string isinstance guard at the top, coerce
  `None`/numbers/lists to `CHECKPOINT_TOPIC` with a warning. A
  malformed `{"topic": null}` JSON payload no longer leaks `None`
  through to `tool_diary_write`.

## scripts/verify-routes.sh (from PR rboarescu#9)

- `set -euo pipefail` (was `set -e`) — pipeline failures and unset
  vars now surface instead of being masked.
- `curl -fsS` instead of `-sS` on every probe so HTTP non-2xx
  (e.g. `/health` 503 when palace is degraded) fails curl directly.
  Without this, a 503 body with `"daemon":"palace-daemon"` slips
  through the body-grep as a pass.
- Limit-check rewrite: split curl exit code from JSON parse so a
  connection error or non-JSON response `?`-flags rather than
  fails-with-traceback (the previous `2>&1` redirect captured
  Python tracebacks into `$COUNT`, which then triggered the fail
  path with a confusing message).

## clients/palace-mcp-dispatch.sh (from PR rboarescu#10)

- Replace `readlink -f` (GNU-specific, fails on macOS BSD readlink)
  with `cd "$(dirname …)" && pwd -P`. POSIX-portable; covers the
  actual common case (parent directory is a symlink under the
  Claude Code plugin cache).
- Add explicit existence check on `$MCP_CLIENT` before exec, with
  a clear error message instead of a generic Python "can't open
  file" trace.

## clients/palace-mode (from PR rboarescu#12)

- Make `PALACE_API_KEY` optional in `set_mode_remote`. Server-side
  `_check_auth` only enforces auth when its own `PALACE_API_KEY` is
  set, so a daemon running without auth accepts unauthenticated
  clients. The previous fail-fast was overly strict for that case.
  Warn-but-proceed instead.

All five fixes are amendments on the corresponding upstream PR
branches (force-pushed earlier today). No behavior change for the
healthy path; the changes harden failure modes that were caught in
review.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants