Skip to content

fix(rocky-server): require auth + restrict CORS + spawn_blocking on sync redb opens#291

Merged
hugocorreia90 merged 1 commit intomainfrom
fix/rocky-server-hardening
Apr 29, 2026
Merged

fix(rocky-server): require auth + restrict CORS + spawn_blocking on sync redb opens#291
hugocorreia90 merged 1 commit intomainfrom
fix/rocky-server-hardening

Conversation

@hugocorreia90
Copy link
Copy Markdown
Contributor

Summary

rocky serve previously bound 0.0.0.0 with CorsLayer::permissive() and zero auth — model SQL, file paths, the DAG, and run history all leaked to anyone on the same network, and POST /api/v1/compile was CSRFable. The HTTP runtime also ran sync redb opens directly on the async executor, so heavy state reads could stall HTTP handlers.

This PR closes both classes of issue:

Auth + bind hardening

  • Default bind moves to 127.0.0.1:8080. New --host flag opts into a non-loopback bind.
  • Non-loopback bind requires a Bearer token; serve() refuses to start otherwise.
  • New Bearer-token middleware on every route except /api/v1/health (kept exempt so liveness probes don't need the secret). Token sources, in priority order: --token <secret> flag, then ROCKY_SERVE_TOKEN env var. Token comparison is constant-time.
  • CorsLayer::permissive() is replaced by an explicit allowlist: empty by default (same-origin only), populated via --allowed-origin <ORIGIN> (repeatable). Methods restricted to GET/POST/OPTIONS; headers to Authorization/Content-Type.

Async runtime hygiene

  • Wraps four sync sites in tokio::task::spawn_blocking so they don't starve the executor:
    • state::recompile — the CPU-heavy compile pass
    • state::load_cached_source_schemas — redb open + scan
    • api::list_runs, api::model_history, api::model_metrics — three handlers that opened redb on the async runtime
  • Mirrors the LSP pattern landed in fix(engine/rocky-server): move LSP schema-cache redb work onto blocking pool #263.

Notes for reviewers

  • The dashboard at / and /dashboard is inside the auth middleware. With a token configured, browser navigation returns 401 — that's intentional (the dashboard renders model SQL). For browser usability stick to the loopback default, which is exactly what's now shipped by default.
  • TOML [serve] allowed_origins = [...] is deliberately deferred. The CLI flag + env-var pair is sufficient for closing the launch-blocker class of bug; a future PR can fold these into RockyConfig if multi-machine deployments demand persistent config.

Test plan

  • cd engine && cargo test -p rocky-server (47 tests pass — six new ones cover the auth happy path, missing/wrong-token rejection, the health-endpoint exemption, the non-loopback refusal, and the loopback-without-token sanity case)
  • cd engine && cargo clippy -p rocky-server --all-targets -- -D warnings
  • cd engine && cargo fmt --check
  • cd engine && cargo build -p rocky-lsp (the slim adapter-free LSP binary that links rocky-server still compiles)
  • cd engine && cargo clippy -p rocky-cli -p rocky --all-targets -- -D warnings (CLI + main binary integrate with the new serve() signature)

…ync redb opens

`rocky serve` previously bound `0.0.0.0` with `CorsLayer::permissive()` and
zero auth, leaking model SQL, file paths, the DAG, and run history to the
LAN; `POST /compile` was CSRFable.

- Bind defaults to `127.0.0.1`. A non-loopback host (e.g. `--host 0.0.0.0`)
  requires `--token <secret>` (or `ROCKY_SERVE_TOKEN`) — `serve()` returns
  an error otherwise.
- Bearer-token middleware on every `/api/v1/*` route (and the dashboard);
  `/api/v1/health` stays auth-exempt for liveness probes. Token comparison
  is constant-time.
- `CorsLayer::permissive()` is gone. Default allowlist is empty (same-origin
  only); cross-origin clients enumerate via `--allowed-origin <ORIGIN>` (the
  flag is repeatable). Methods restricted to GET/POST/OPTIONS, headers to
  Authorization/Content-Type.
- The four sync redb / compile sites now run on `tokio::task::spawn_blocking`
  so they don't stall the async runtime: `state::recompile`'s compile pass,
  `state::load_cached_source_schemas`, and the three handlers `list_runs`,
  `model_history`, `model_metrics`. Mirrors the LSP pattern from #263.

Six new tests cover the auth happy path, missing/wrong tokens, the
health-endpoint exemption, the loopback-without-token sanity case, and the
non-loopback refusal.
@hugocorreia90 hugocorreia90 merged commit e90c242 into main Apr 29, 2026
12 checks passed
@hugocorreia90 hugocorreia90 deleted the fix/rocky-server-hardening branch April 29, 2026 18:16
hugocorreia90 added a commit that referenced this pull request Apr 29, 2026
Engine 1.18.0 ships the rocky preview workflow end-to-end (#279, #280,
#281, #282), the [budget].max_bytes_scanned threshold (#288), the
audit-sweep closeout (#283, #285#287, #290#293), and the rocky-server
auth + CORS gate (#291).

Dagster 1.15.0 picks up the regenerated Pydantic models for the rocky
preview surface and ships the P1 cluster (#289) + FR-014 follow-on
(#284).

VS Code 1.10.0 regenerates TypeScript bindings for rocky preview and
RunCostSummary.total_bytes_scanned.

See per-artifact CHANGELOG entries for the full breakdown.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant