Skip to content

feat: rocky preview create executes copy-from-base — Phase 1.5#282

Merged
hugocorreia90 merged 1 commit intomainfrom
feat/rocky-preview-1-5
Apr 29, 2026
Merged

feat: rocky preview create executes copy-from-base — Phase 1.5#282
hugocorreia90 merged 1 commit intomainfrom
feat/rocky-preview-1-5

Conversation

@hugocorreia90
Copy link
Copy Markdown
Contributor

Summary

Lifts `rocky preview create` from planning-only output to actual copy execution. For each copy-set model, resolves the source schema from the model sidecar and issues CTAS (CREATE OR REPLACE TABLE <branch_schema>.<model> AS SELECT * FROM <source_schema>.<model>) through the configured warehouse adapter.

This is the natural follow-up to PRs #279 + #280 — the planning kernel and the diff/cost math were already shipped, this one makes the prune-and-copy substrate concrete.

What's in this PR

  • `execute_copy_from_base(config, models_dir, branch_schema, copy_names)` — pure builder that:
    1. Loads `rocky.toml` and builds the `AdapterRegistry`.
    2. Picks the `default` warehouse adapter (or first-declared, mirroring `rocky cost`).
    3. Loads model sidecars to resolve each copy-set model's source schema.
    4. Issues `CREATE SCHEMA IF NOT EXISTS branch_schema` once.
    5. For each model, runs CTAS via `WarehouseAdapter::execute_statement`.
    6. Per-model failures (missing-in-base, schema mismatch) surface as `copy_strategy = "failed"` without aborting.
  • Registry-build failures fall back to planning-only output — the prune-set decision still ships even if the adapter is unconfigured.
  • No new trait method. `WarehouseAdapter::execute_statement` already covers what we need. Introducing `clone_table_from` on the trait would block Phase 5's native-clone landing on a wider refactor; that trait method lands with Phase 5 (`SHALLOW CLONE` / zero-copy `CLONE`) using CTAS as the default impl.

What's NOT in this PR (deliberate)

  • Phase 5 — adapter parity (Databricks `SHALLOW CLONE`, Snowflake zero-copy `CLONE`, BigQuery copy). CTAS works on every adapter today; Phase 5 lifts to native clones for metadata-only cost.
  • Auto-invocation of `rocky run --branch ` for the prune set. Still left to the caller (Phase 4 GitHub Action orchestrates this — separate PR feat(ci): rocky preview GitHub Action + auto PR comment — Phase 4 #281).
  • Phase 2.5 — checksum-bisection sampling. Independent.

Test plan

  • 2 new unit tests (26 total in `preview::tests`):
    • `copy_from_base_executes_ctas_on_duckdb` — full pipeline through `AdapterRegistry` on an in-memory DuckDB. Asserts `copy_strategy="ctas"` for the happy model, `copy_strategy="failed"` for a missing-in-base model, and verifies the actual table landed in the branch schema by reading it back.
    • `copy_from_base_empty_copy_set_short_circuits` — empty copy set returns immediately without touching the adapter.
  • `cargo test -p rocky-cli --lib` — 296 tests pass.
  • `cargo clippy -p rocky-cli --all-targets -- -D warnings` clean.
  • No schema changes — no codegen-drift risk.

Lifts run_preview_create from planning-only to actual copy execution.
For each copy-set model, builds the AdapterRegistry from rocky.toml,
resolves the source schema from the model sidecar, then issues:

  CREATE SCHEMA IF NOT EXISTS branch_schema
  CREATE OR REPLACE TABLE branch_schema.<model> AS
    SELECT * FROM source_schema.<model>

through WarehouseAdapter::execute_statement. Per-model failures (model
missing in base, schema mismatch) surface as copy_strategy = "failed"
without aborting — the rest of the copy set still runs so the PR-comment
surface stays informative. Registry-build failures fall back to
planning_only entries (the prune-set decision still ships).

Why no new trait method: WarehouseAdapter::execute_statement already
covers the CTAS surface. Introducing clone_table_from on the trait
would block the Phase 5 native-clone landing on a wider refactor.
That trait method lands when Phase 5 ships SHALLOW CLONE / zero-copy
CLONE adapter implementations with CTAS as the default impl.

2 new tests:
- copy_from_base_executes_ctas_on_duckdb: full pipeline through
  AdapterRegistry; asserts copy_strategy="ctas" on the happy model,
  copy_strategy="failed" on a missing-in-base model, and verifies
  the actual table landed in the branch schema.
- copy_from_base_empty_copy_set_short_circuits: empty copy set
  returns immediately without touching the adapter (matters when every
  working-DAG model is changed).

Total preview unit tests: 26 (was 24).
@hugocorreia90 hugocorreia90 merged commit 7161eac into main Apr 29, 2026
12 checks passed
@hugocorreia90 hugocorreia90 deleted the feat/rocky-preview-1-5 branch April 29, 2026 12:50
hugocorreia90 added a commit that referenced this pull request Apr 29, 2026
Engine 1.18.0 ships the rocky preview workflow end-to-end (#279, #280,
#281, #282), the [budget].max_bytes_scanned threshold (#288), the
audit-sweep closeout (#283, #285#287, #290#293), and the rocky-server
auth + CORS gate (#291).

Dagster 1.15.0 picks up the regenerated Pydantic models for the rocky
preview surface and ships the P1 cluster (#289) + FR-014 follow-on
(#284).

VS Code 1.10.0 regenerates TypeScript bindings for rocky preview and
RunCostSummary.total_bytes_scanned.

See per-artifact CHANGELOG entries for the full breakdown.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant