docs: cover rocky cost + rocky branch compare + Arc 1 wave 2 cascade by hugocorreia90 · Pull Request #204 · rocky-data/rocky

hugocorreia90 · 2026-04-21T16:11:38Z

Summary

Documents the commands that landed in PRs #199-#203 and populates the CHANGELOG's [Unreleased] section for the next coordinated release cut (engine-v1.12.0 / dagster-v1.8.0 / vscode-v1.6.2).

What's added

rocky cost <run_id|latest> — full reference section in administration.md with JSON + table examples plus adapter-coverage notes (Databricks/Snowflake duration-based; BigQuery bytes; DuckDB zero; discovery adapters skipped). Cross-linked from rocky trace's related-commands list so the three sibling RunRecord readers (replay/trace/cost) all reference each other.
rocky branch compare <name> — added to the branch subcommand header in core-pipeline.md with a short usage example explaining the ShadowConfig.schema_override mechanism it shares with rocky run --branch.
all-features.md — added cost to the administration command roster and a bullet under Observability for the historical-rollup surface.
engine/CHANGELOG.md — populated [Unreleased] with all five PRs from 2026-04-21 (SIGPIPE, branch compare, POC portability cleanup, rocky cost, record_run wiring) plus the Pydantic soft-swap + fixture normalizer internals.

What's not changed

No changes to the rocky replay / rocky trace existing sections — they already described the intended behaviour; PR feat(engine): Trust-system Arc 1 wave 2 — persist RunRecord on rocky run #203 made them actually work without the docs needing to move. Reviewing those pages now reads as an accurate description of shipped behaviour.
No dagster integration docs changes for rocky cost — PR feat(engine): Trust-system Arc 2 wave 2 — rocky cost <run_id|latest> #202 explicitly deferred the RockyResource.cost() wiring to a follow-up. When that lands, dagster/resource docs get the method reference.

Test plan

cd docs && npm run build — 69 pages built, no warnings, no broken links.
Reviewer sanity-check on the rocky cost example JSON — the Databricks numbers shown ($0.101 total, stg_orders at $0.034, fct_revenue at $0.067) are illustrative of the duration × DBU × $/DBU math, not captured from a real run. Flag if they look unrealistic.

Adds reference + features-page coverage for the commands that landed in PRs #199-#203, and populates the CHANGELOG's [Unreleased] section for the upcoming coordinated release cut. - `docs/reference/commands/administration.md` — new `rocky cost <run_id|latest>` section with JSON + table examples and adapter coverage notes (Databricks/Snowflake duration-based; BigQuery bytes; DuckDB zero; discovery adapters skipped). Added to the related- commands list on `rocky trace` so the three sibling readers (replay/trace/cost) cross-link. - `docs/reference/commands/core-pipeline.md` — added `rocky branch compare <name>` to the branch subcommand header and a short usage example explaining the `ShadowConfig.schema_override` mechanism it shares with `rocky run --branch`. - `docs/features/all-features.md` — added `cost` to the administration command roster and a bullet under Observability for the historical- rollup surface. - `engine/CHANGELOG.md` — populated [Unreleased] with the five PRs shipped 2026-04-21 (SIGPIPE, branch compare, POC portability cleanup, rocky cost, record_run wiring) plus the Pydantic soft-swap + fixture normalizer internals. Ready for the next coordinated release cut. Docs build: `cd docs && npm run build` — 69 pages built, no warnings.

* chore(engine): release 1.12.0 Arc 1 wave 2 + cleanup cascade. Eight PRs since v1.11.0: - #199 SIGPIPE handler - #200 rocky branch compare - #201 POC target_dialect cleanup - #202 rocky cost <run_id|latest> (Arc 2 wave 2 first PR) - #203 rocky run persists RunRecord (Arc 1 wave 2 load-bearing fix) - #204 docs + CHANGELOG [Unreleased] cascade - #205 demo-branches-replay.gif refresh - #206 real per-model started_at on MaterializationOutput rocky history / replay / trace / cost now return real data end-to-end for the first time. Full notes in CHANGELOG. * feat(state): configurable transfer timeout + tracing span + Valkey wrap - `StateConfig.transfer_timeout_seconds` (default 300s) replaces the hard- coded `STATE_TRANSFER_TIMEOUT`. Operators can now tune the wall-clock budget in `rocky.toml` for very large state or slow networks without recompiling. `StateConfig` gains a manual `Default` impl so `StateConfig::default()` yields 300s (not u64's zero). - `state.upload` / `state.download` tracing spans wrap every transfer carrying `backend`, `bucket`, and `size_bytes`. The in-elapse warn event inherits those fields automatically, so hung transfers are diagnosable from stderr logs alone (which dagster-rocky streams into the Dagster run viewer). - Structured `warn!` on timeout elapse ("state transfer exceeded timeout budget") with a `duration_ms` field — replaces silent `Timeout(_)`. - Valkey read/write paths audited and closed: `redis::Client::get_connection` + `redis::cmd(...).query()` are sync and blocked the tokio runtime thread; no outer `tokio::time::timeout` could rescue them. Both `upload_to_valkey` and `download_from_valkey` now run under `tokio::task::spawn_blocking` inside `with_transfer_timeout`, closing the same class of hang the object-store paths were already protected against. - `default_client_options()` in `object_store.rs` honours the standard `object_store`-crate env vars `AWS_ALLOW_HTTP` / `AZURE_ALLOW_HTTP` / `GOOGLE_STORAGE_ALLOW_HTTP`. Always off in production; the new integration test uses it to front-end the S3 SDK with a plain-HTTP wiremock server without bypassing the credential chain. - New `tests/state_sync_timeout_test.rs` integration test: a wiremock S3 endpoint that holds PutObject for 1h proves `upload_state` returns `StateSyncError::Timeout` within the configured 2s budget (+grace). A prompt-endpoint negative control guards against regressions. - CHANGELOG entries added under [1.12.0]. Example config in `engine/examples/dagster-integration/rocky.toml` surfaces the new key. cargo fmt clean; `clippy --workspace --all-targets -- -D warnings` clean; all 977 rocky-core unit tests + 30 e2e + 20 integration + the 2 new timeout tests pass. * chore(codegen): regenerate schemas + pydantic types for StateConfig.transfer_timeout_seconds * chore(fixtures): regenerate dagster test fixtures for 1.12.0 `just regen-fixtures` — version string bump only (1.11.0 → 1.12.0) across 35 captured fixtures under integrations/dagster/tests/fixtures_generated/.

) Closes two silent-drift classes that hit three times in a single day (2026-04-21) during the Arc 1 wave 2 cascade: ## Fix 1 — codegen-vscode auto-installs deps Before: `just codegen-vscode` assumed `editors/vscode/node_modules` was already populated. When it wasn't (fresh worktree, CI runner, release PR checkout), the npx json2ts step failed with "could not determine executable to run" — but `codegen-rust` + `codegen-dagster` had already written their outputs, so half the pipeline had run. If the dev didn't notice the error scroll and committed anyway, the partial state landed on main and codegen-drift.yml fired on the next PR. Happened to me on PR #204, and to whoever ran codegen for the v1.12.0 release PR (#207 → #210 hotfix). After: the recipe checks for node_modules and runs `npm install` automatically. ~15s cost on a cold cache, skipped when deps are present. Same self-heal pattern codegen-dagster already uses via `uv run`. ## Fix 2 — new `codegen-all` recipe bundles regen-fixtures Before: `just codegen` regenerated schemas + bindings but not fixtures. Engine version bumps cascaded every fixture's `"version"` field out of sync (hit on PR #207 release). The muscle memory was "run codegen and you're done", which was wrong for anything touching output shapes or version strings. After: `just codegen-all` runs codegen + regen-fixtures together. Kept `codegen` as the fast dev loop (~30s) and `codegen-all` as the slower release loop (~1-2 min of extra fixture regen). Docstring on `codegen` now points at `codegen-all` for the cases where it's the right tool. ## Verification - [x] Deleted `editors/vscode/node_modules`, ran `just codegen` — self-heal kicked in, npm install ran, codegen completed cleanly, `git status` shows only `justfile` modified (idempotent against main). - [x] `just --list` shows both `codegen` and `codegen-all`. ## Not in scope (noted for follow-up if needed) - Upfront preflight check (Fix B from the discussion) — decided against, because the self-heal in Fix 1 makes it redundant. Both dagster (uv-managed) and vscode (npm-managed) codegen steps now recover from missing deps on their own. - Pre-commit hook additions — `.git-hooks/pre-commit` already runs codegen drift detection when schema files are staged; opt-in via `just install-hooks`. Making it default-on is a separate discussion about solo-project friction.

hugocorreia90 mentioned this pull request Apr 21, 2026

docs(public): refresh demo-branches-replay.gif with replay + cost steps #205

Merged

3 tasks

hugocorreia90 merged commit 6262dca into main Apr 21, 2026
12 checks passed

hugocorreia90 deleted the docs/arc1-wave2-cascade branch April 21, 2026 16:21

hugocorreia90 mentioned this pull request Apr 21, 2026

chore(justfile): self-heal codegen-vscode + add codegen-all recipe #212

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: cover rocky cost + rocky branch compare + Arc 1 wave 2 cascade#204

docs: cover rocky cost + rocky branch compare + Arc 1 wave 2 cascade#204
hugocorreia90 merged 1 commit intomainfrom
docs/arc1-wave2-cascade

hugocorreia90 commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

hugocorreia90 commented Apr 21, 2026

Summary

What's added

What's not changed

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant