refactor(rocky-core): move is_safe_type_widening + alter_column_type_sql onto SqlDialect trait#332
Merged
hugocorreia90 merged 1 commit intomainfrom May 1, 2026
Merged
Conversation
…sql onto SqlDialect trait Pre-flight against the BigQuery sandbox surfaced two structural mismatches: BQ rejects the existing `ALTER TABLE x ALTER COLUMN y TYPE z` emit (it requires `SET DATA TYPE`), and `is_safe_type_widening` in `drift.rs` lists Databricks/Spark-flavored type names — none of BQ's canonical names (`INT64`, `NUMERIC`, `FLOAT64`, `BIGNUMERIC`) appear. Bolting BQ types onto the global allowlist would lock in a structural compromise: e.g. `BIGINT → STRING` and `INT64 → STRING` coexisting as duplicate-meaning entries because the two adapters spell the same logical conversion differently. This PR moves the two surfaces onto the `SqlDialect` trait so each dialect can declare its own widening semantics: - `SqlDialect::is_safe_type_widening(&str, &str) -> bool` with a default impl that preserves the current Databricks/Spark allowlist (INT/BIGINT/FLOAT/DOUBLE/DECIMAL/VARCHAR widenings + numeric-to-STRING). - `SqlDialect::alter_column_type_sql(&str, &str, &str) -> AdapterResult<String>` with a default impl emitting the existing ANSI form. Validation (identifier + type allowlist) lives inside the default impl so dialect overrides inherit it for free. - `drift::detect_drift` now takes `&dyn SqlDialect` and routes the widening check through the trait method. `drift::generate_alter_column_sql` delegates the SQL emit to the trait method. Free function `is_safe_type_widening` in `drift.rs` becomes `default_is_safe_type_widening`, `pub(crate)` so the default trait impl can call it without exposing a public alias. No behavior change today — the default impls preserve every existing allowlist entry and emit string. This is a refactor PR; the receipt is the regression suite (1129 unit tests + 30 e2e + 20 integration all pass without modification beyond threading the new parameter). The follow-up PR adds the BigQuery overrides (`ALTER COLUMN ... SET DATA TYPE` shape; BQ-specific widening allowlist), wires the runtime to actually emit the AlterColumnTypes branch, and ships a live smoke test.
5 tasks
hugocorreia90
added a commit
that referenced
this pull request
May 1, 2026
…errides (#333) Closes the second drift-evolution gap from #328. PR #332 lifted `is_safe_type_widening` + `alter_column_type_sql` onto the `SqlDialect` trait with default impls preserving Databricks/Spark semantics. This PR adds the BigQuery-specific overrides + the runtime wiring to actually emit `AlterColumnTypes` SQL. Engine changes: - `BigQueryDialect::alter_column_type_sql` overrides the default ANSI form with BQ's required `ALTER COLUMN x SET DATA TYPE y`. The default `ALTER COLUMN x TYPE y` shape returns `Expected keyword DROP or keyword SET` from BigQuery. - `BigQueryDialect::is_safe_type_widening` declares a strict BQ-specific allowlist: - `INT64 → NUMERIC` (lossless: INT64 fits in NUMERIC precision 38) - `INT64 → BIGNUMERIC` (lossless) - `NUMERIC → BIGNUMERIC` (strict precision widening) Excluded by design: - `… → FLOAT64`: lossy for absolute values > 2^53. BigQuery accepts via SET DATA TYPE but Rocky's "safe" contract is strict (matches the default Databricks/Spark allowlist that omits `INT → FLOAT`). - `… → STRING`: BigQuery's `ALTER COLUMN SET DATA TYPE` rejects these with `existing column type X is not assignable to STRING` even though STRING is lossless at the value level. The default allowlist's "any numeric → STRING" pattern doesn't transfer to BQ. Discovered via live verification — initial draft included these patterns and live runs surfaced the error. - `run.rs::process_table` adds the `DriftAction::AlterColumnTypes` branch between `DropAndRecreate` and the existing `added_columns` branch. Emits via `drift::generate_alter_column_sql` (which now routes through `dialect.alter_column_type_sql`) and surfaces as `action: "alter_column_types"` in the run output. If the same drift round also surfaced added columns, both apply before the INSERT continues. Smoke test: extends `live/drift/run.sh` to a four-stage flow: 1. Initial 4-column source → replicate clean 2. ALTER source ADD COLUMN region → `add_columns` action 3. DROP+CREATE source with id INT64→STRING → `drop_and_recreate` 4. DROP+CREATE source with score INT64→NUMERIC → `alter_column_types`; target's score column is now NUMERIC
This was referenced May 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
First of two PRs enabling per-dialect drift evolution. This one is purely a trait-surface refactor with default impls preserving every existing behavior; the follow-up adds the BigQuery override + runtime wiring + live smoke.
Why a refactor
Pre-flight against BigQuery surfaced two structural mismatches that ruled out a narrower fix:
ALTER TABLE x ALTER COLUMN y TYPE z(the existing engine emit). It requiresALTER COLUMN y SET DATA TYPE z.is_safe_type_wideningindrift.rslists Databricks/Spark-flavored names (INT/BIGINT/SMALLINT/FLOAT/DOUBLE/DECIMAL/VARCHAR). None of BigQuery's canonical names (INT64/NUMERIC/FLOAT64/BIGNUMERIC) appear.Bolting BQ types onto the global allowlist would lock in a structural compromise:
BIGINT → STRINGandINT64 → STRINGwould coexist as duplicate-meaning entries because the two adapters spell the same logical conversion differently. Once that ships, every future adapter inherits the same asymmetry.What this changes
SqlDialect::is_safe_type_widening(&str, &str) -> boolwith a default impl that preserves the current Databricks/Spark allowlist verbatim.SqlDialect::alter_column_type_sql(&str, &str, &str) -> AdapterResult<String>with a default impl emitting the existing ANSI form. Validation (identifier check + type allowlist) lives inside the default impl so any future override inherits it for free.drift::detect_drifttakes&dyn SqlDialectand routes the widening check through the trait method.drift::generate_alter_column_sqldelegates the SQL emit to the trait method.is_safe_type_wideningbecomesdefault_is_safe_type_widening(pub(crate)) so the default trait impl can call it without exposing a public alias.What this does NOT change
is_experimentalstatus — still flagged.Receipt
This is a refactor; the receipt is the regression suite, not a new live test:
cargo test -p rocky-core --lib— 1129 passedcargo test -p rocky-core --test e2e— 30 passedcargo test -p rocky-core --test integration— 20 passedcargo clippy --all-targets -- -D warningscleancargo fmt --all --checkcleanThe live drift smoke test (
live/drift/run.sh) still passes with no changes — verified end-to-end against the BQ sandbox.Next PR
Adds the BigQuery dialect override:
BigQueryDialect::alter_column_type_sql— emits BQ's requiredALTER COLUMN ... SET DATA TYPEformBigQueryDialect::is_safe_type_widening— declares BQ's widening allowlist (INT64 → NUMERIC,INT64 → BIGNUMERIC,NUMERIC → BIGNUMERIC, plus numeric-to-STRINGfor BQ types)run.rscurrently only handlesDropAndRecreate; the next PR adds theAlterColumnTypesbranch that callsgenerate_alter_column_sql.live/drift/run.shgains a fourth stage exercising a safe widening (e.g.INT64 → NUMERIC) end-to-end.Open question to settle before that PR: is
INT64 → FLOAT64"safe widening" under Rocky's definition? Lossy for values > 2^53.Test plan
rocky-coretests pass with the new signaturecargo clippy --all-targets -- -D warningscleancargo fmt --all --checkcleanlive/drift/run.shsmoke test against the sandbox still passes (3-stage flow unchanged)