feat: rocky preview — PR-time preview environments + data diff + cost delta#279
Merged
hugocorreia90 merged 4 commits intomainfrom Apr 29, 2026
Merged
feat: rocky preview — PR-time preview environments + data diff + cost delta#279hugocorreia90 merged 4 commits intomainfrom
hugocorreia90 merged 4 commits intomainfrom
Conversation
Phase 0 of the PR-preview bundle: introduces `rocky preview <create|diff|cost>` subcommands with output structs (PreviewCreateOutput, PreviewDiffOutput, PreviewCostOutput) wired through the codegen cascade — JSON schemas exported to schemas/, Pydantic models regenerated for dagster, TypeScript interfaces regenerated for vscode. Command handlers bail with a Phase-1-pointer message; subsequent commits in this PR fill them in. Output structs cover: prune-set + copy-set + skipped-set tracking on create; structural + sampled row diff with explicit sampling-window correctness hedge on diff; per-model bytes/duration/USD delta with copy-savings rollup on cost. Schema-registration + 7 unit tests passing.
Phase 1 of the PR-preview bundle. End-to-end planning of the prune-and-copy decision against the working DAG: - Git plumbing: three-dot `base...HEAD` diff (two-dot fallback for shallow clones); filters to .sql/.rocky/.toml under models_dir. - Working DAG: scan models_dir non-recursively, parse sidecar depends_on into a forward-edge graph; tolerant of malformed TOML. - Prune set: changed models + transitive downstream via reverse-edge closure. Topological ordering via Kahn's (alphabetical tie-break). - Copy set: every working-DAG model not in the prune set, emitted with copy_strategy="planned" — Phase 1.5 lifts to adapter-driven CTAS / SHALLOW CLONE / zero-copy CLONE. - Branch register: reuses run_branch_create; default branch name derived from current git branch (`pr-preview-<slug>`) or timestamp. - Output: PreviewCreateOutput with run_status="planned" so the scope is honest. 11 new unit tests covering DAG topological order, transitive downstream traversal, malformed sidecar tolerance, models-dir filter, git path parsing.
Three new doc surfaces: - docs/src/content/docs/concepts/preview-internals.md — explains the prune-and-copy substrate, comparison to Fivetran's Smart Run, and the sampling correctness ceiling. - docs/src/content/docs/guides/preview-a-pr.md — how-to walking through preview create / diff / cost on a feature branch. - docs/src/content/docs/reference/commands/modeling.md — appended rocky preview subcommand reference; cross-link from rocky ci-diff. POC scaffolding at examples/playground/pocs/06-developer-experience/ 10-pr-preview-and-data-diff/ — a 5-model DuckDB transformation pipeline with a synthetic-PR variant (fct_revenue.sql.changed) and aspirational expected outputs matching the wire schemas. ./run.sh exercises the full create+diff+cost flow today and surfaces stub responses without failing. Pivot note in the POC README: models are .sql + .toml (not .rocky DSL) because the existing playground had no working precedent for transformation+.rocky+rocky run. The preview workflow is surface-agnostic so this doesn't degrade what the POC demonstrates. examples/playground/.gitignore: gitignore .rocky_state* directories; whitelist expected/*.example.json so aspirational fixtures stay tracked.
Phase 0b of the PR-preview bundle: - Dagster: re-export PreviewCreateOutput / PreviewDiffOutput / PreviewCostOutput from types_generated/ via top-level types.py, matching the soft-swap pattern used for run/cost/compliance. - VS Code: register three new commands (rocky.previewCreate, rocky.previewDiff, rocky.previewCost) with command-palette entries and runRockyCli helpers; new src/commands/preview.ts module. - Cleanup: clippy-clean preview module (str::to_string, doc-list indent, allow too-many-arguments on PreviewCreateOutput::new).
This was referenced Apr 29, 2026
hugocorreia90
added a commit
that referenced
this pull request
Apr 29, 2026
Engine 1.18.0 ships the rocky preview workflow end-to-end (#279, #280, #281, #282), the [budget].max_bytes_scanned threshold (#288), the audit-sweep closeout (#283, #285–#287, #290–#293), and the rocky-server auth + CORS gate (#291). Dagster 1.15.0 picks up the regenerated Pydantic models for the rocky preview surface and ships the P1 cluster (#289) + FR-014 follow-on (#284). VS Code 1.10.0 regenerates TypeScript bindings for rocky preview and RunCostSummary.total_bytes_scanned. See per-artifact CHANGELOG entries for the full breakdown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Introduces
rocky preview <create|diff|cost>— a PR-time preview workflow built on Rocky's branches + state-store + cost-attribution primitives. Closest commercial analogue is Fivetran Smart Run; Rocky's compiler-IR + branch substrate are structural advantages that workflow can't match.This PR delivers the vertical slice scoped during planning: Phases 0a + 0b + 1 + 2 + 3 + POC + docs, all CI-green-verifiable on a fresh PR. Phases 4 (GitHub Action + auto PR comment) and 5 (adapter parity beyond DuckDB + warehouse-native clones) ship as follow-up PRs after this lands.
What's in this PR
Phase 0a — engine codegen (
f4f543c): newPreviewCreateOutput/PreviewDiffOutput/PreviewCostOutputstructs incrates/rocky-cli/src/output.rs, registered inexport_schemas, with JSON schemas + Pydantic + TypeScript bindings regenerated end-to-end viajust codegen.Phase 0b — Dagster + VS Code wiring (
fc1071e): Dagstertypes.pyre-exports the 15 new symbols; VS Code registersrocky.previewCreate/rocky.previewDiff/rocky.previewCostcommands with palette entries and a newcommands/preview.tshandler module.Phase 1 —
rocky preview create(79e857a): working planning kernel that:git diff --name-status base...HEAD(two-dot fallback for shallow clones).depends_onarrays.run_branch_create(default name slugged from current git branch).PreviewCreateOutputwithrun_status = "planned"— Phase 1.5 lifts the copy step to adapter-driven CTAS /SHALLOW CLONE/ zero-copyCLONEexecution; until then the user invokesrocky run --branch <name> --model <name>per prune-set model.Phase 2 —
rocky preview diff: command + JSON wire contract shipped; emits a validPreviewDiffOutputwith a deterministic Markdown stub. Substance (extending therocky_core::compareshadow kernel with sampled row-level deltas + the explicitcoverage_warningsampling-window field) is the Phase 2 follow-up — schema is ready.Phase 3 —
rocky preview cost: command + JSON wire contract shipped; emits a validPreviewCostOutputwith a deterministic Markdown stub. Substance (RunRecord-based per-model cost-delta math over the existingrocky cost latestinfrastructure from Arc 2 wave 2 / PR #202) is the Phase 3 follow-up.POC (
bc1da8c): newexamples/playground/pocs/06-developer-experience/10-pr-preview-and-data-diff/— 5-model DuckDB transformation pipeline (raw_orders, raw_customers, stg_orders, dim_customers, fct_revenue) with a synthetic-PR variant (fct_revenue.sql.changed)../run.shexercises the full create+diff+cost flow end-to-end, surfaces stub responses without failing, exits 0 in ~0.3s.Docs (
bc1da8c): concept page (concepts/preview-internals.md), how-to (guides/preview-a-pr.md), CLI reference (reference/commands/modeling.md).Comparison to Fivetran Smart Run
The Fivetran blog post describes "Smart Run": git-diff models,
COPYunchanged upstream from prod into dev, re-run only changed + downstream. Rocky's primitives strictly dominate that technique:SHALLOW CLONE/ Snowflake zero-copyCLONE(metadata-only), strictly cheaper thanCOPY.Out of scope (follow-up PRs)
SHALLOW CLONE, Snowflake zero-copyCLONE, BigQuery table copy). Adapter conformance suite runs inengine-weekly.ymlnot per-PR; merging would ship un-CI-verified adapter code.coverage_warningfield. Schema is shipped; math lands next.Test plan
just codegenis idempotent —codegen-drift.ymlwill pass.registered_schemas_match_committed_files) green.cargo test -p rocky-cli(284 tests) passes.cargo clippy -p rocky-cli --all-targets -- -D warningsclean.uv run pytest(458 tests) passes.uv run ruff checkclean.npm run compileclean.npm run test:unit(20 tests) passes.npm run lintclean../run.shexits 0 (stubs surface gracefully).