refactor(rocky-core): move is_safe_type_widening + alter_column_type_sql onto SqlDialect trait by hugocorreia90 · Pull Request #332 · rocky-data/rocky

hugocorreia90 · 2026-05-01T14:51:34Z

First of two PRs enabling per-dialect drift evolution. This one is purely a trait-surface refactor with default impls preserving every existing behavior; the follow-up adds the BigQuery override + runtime wiring + live smoke.

Why a refactor

Pre-flight against BigQuery surfaced two structural mismatches that ruled out a narrower fix:

BQ rejects ALTER TABLE x ALTER COLUMN y TYPE z (the existing engine emit). It requires ALTER COLUMN y SET DATA TYPE z.
is_safe_type_widening in drift.rs lists Databricks/Spark-flavored names (INT/BIGINT/SMALLINT/FLOAT/DOUBLE/DECIMAL/VARCHAR). None of BigQuery's canonical names (INT64/NUMERIC/FLOAT64/BIGNUMERIC) appear.

Bolting BQ types onto the global allowlist would lock in a structural compromise: BIGINT → STRING and INT64 → STRING would coexist as duplicate-meaning entries because the two adapters spell the same logical conversion differently. Once that ships, every future adapter inherits the same asymmetry.

What this changes

SqlDialect::is_safe_type_widening(&str, &str) -> bool with a default impl that preserves the current Databricks/Spark allowlist verbatim.
SqlDialect::alter_column_type_sql(&str, &str, &str) -> AdapterResult<String> with a default impl emitting the existing ANSI form. Validation (identifier check + type allowlist) lives inside the default impl so any future override inherits it for free.
drift::detect_drift takes &dyn SqlDialect and routes the widening check through the trait method.
drift::generate_alter_column_sql delegates the SQL emit to the trait method.
Free function is_safe_type_widening becomes default_is_safe_type_widening (pub(crate)) so the default trait impl can call it without exposing a public alias.

What this does NOT change

Any existing behavior. Default impls preserve every allowlist entry and emit string verbatim.
BigQuery's is_experimental status — still flagged.
Any adapter implementation. The defaults mean Databricks/Snowflake/DuckDB/BigQuery all keep the same drift semantics they had yesterday.

Receipt

This is a refactor; the receipt is the regression suite, not a new live test:

cargo test -p rocky-core --lib — 1129 passed
cargo test -p rocky-core --test e2e — 30 passed
cargo test -p rocky-core --test integration — 20 passed
cargo clippy --all-targets -- -D warnings clean
cargo fmt --all --check clean

The live drift smoke test (live/drift/run.sh) still passes with no changes — verified end-to-end against the BQ sandbox.

Next PR

Adds the BigQuery dialect override:

BigQueryDialect::alter_column_type_sql — emits BQ's required ALTER COLUMN ... SET DATA TYPE form
BigQueryDialect::is_safe_type_widening — declares BQ's widening allowlist (INT64 → NUMERIC, INT64 → BIGNUMERIC, NUMERIC → BIGNUMERIC, plus numeric-to-STRING for BQ types)
Runtime wiring: run.rs currently only handles DropAndRecreate; the next PR adds the AlterColumnTypes branch that calls generate_alter_column_sql.
Live smoke: live/drift/run.sh gains a fourth stage exercising a safe widening (e.g. INT64 → NUMERIC) end-to-end.

Open question to settle before that PR: is INT64 → FLOAT64 "safe widening" under Rocky's definition? Lossy for values > 2^53.

Test plan

All rocky-core tests pass with the new signature
cargo clippy --all-targets -- -D warnings clean
cargo fmt --all --check clean
Existing live/drift/run.sh smoke test against the sandbox still passes (3-stage flow unchanged)

…sql onto SqlDialect trait Pre-flight against the BigQuery sandbox surfaced two structural mismatches: BQ rejects the existing `ALTER TABLE x ALTER COLUMN y TYPE z` emit (it requires `SET DATA TYPE`), and `is_safe_type_widening` in `drift.rs` lists Databricks/Spark-flavored type names — none of BQ's canonical names (`INT64`, `NUMERIC`, `FLOAT64`, `BIGNUMERIC`) appear. Bolting BQ types onto the global allowlist would lock in a structural compromise: e.g. `BIGINT → STRING` and `INT64 → STRING` coexisting as duplicate-meaning entries because the two adapters spell the same logical conversion differently. This PR moves the two surfaces onto the `SqlDialect` trait so each dialect can declare its own widening semantics: - `SqlDialect::is_safe_type_widening(&str, &str) -> bool` with a default impl that preserves the current Databricks/Spark allowlist (INT/BIGINT/FLOAT/DOUBLE/DECIMAL/VARCHAR widenings + numeric-to-STRING). - `SqlDialect::alter_column_type_sql(&str, &str, &str) -> AdapterResult<String>` with a default impl emitting the existing ANSI form. Validation (identifier + type allowlist) lives inside the default impl so dialect overrides inherit it for free. - `drift::detect_drift` now takes `&dyn SqlDialect` and routes the widening check through the trait method. `drift::generate_alter_column_sql` delegates the SQL emit to the trait method. Free function `is_safe_type_widening` in `drift.rs` becomes `default_is_safe_type_widening`, `pub(crate)` so the default trait impl can call it without exposing a public alias. No behavior change today — the default impls preserve every existing allowlist entry and emit string. This is a refactor PR; the receipt is the regression suite (1129 unit tests + 30 e2e + 20 integration all pass without modification beyond threading the new parameter). The follow-up PR adds the BigQuery overrides (`ALTER COLUMN ... SET DATA TYPE` shape; BQ-specific widening allowlist), wires the runtime to actually emit the AlterColumnTypes branch, and ships a live smoke test.

…errides (#333) Closes the second drift-evolution gap from #328. PR #332 lifted `is_safe_type_widening` + `alter_column_type_sql` onto the `SqlDialect` trait with default impls preserving Databricks/Spark semantics. This PR adds the BigQuery-specific overrides + the runtime wiring to actually emit `AlterColumnTypes` SQL. Engine changes: - `BigQueryDialect::alter_column_type_sql` overrides the default ANSI form with BQ's required `ALTER COLUMN x SET DATA TYPE y`. The default `ALTER COLUMN x TYPE y` shape returns `Expected keyword DROP or keyword SET` from BigQuery. - `BigQueryDialect::is_safe_type_widening` declares a strict BQ-specific allowlist: - `INT64 → NUMERIC` (lossless: INT64 fits in NUMERIC precision 38) - `INT64 → BIGNUMERIC` (lossless) - `NUMERIC → BIGNUMERIC` (strict precision widening) Excluded by design: - `… → FLOAT64`: lossy for absolute values > 2^53. BigQuery accepts via SET DATA TYPE but Rocky's "safe" contract is strict (matches the default Databricks/Spark allowlist that omits `INT → FLOAT`). - `… → STRING`: BigQuery's `ALTER COLUMN SET DATA TYPE` rejects these with `existing column type X is not assignable to STRING` even though STRING is lossless at the value level. The default allowlist's "any numeric → STRING" pattern doesn't transfer to BQ. Discovered via live verification — initial draft included these patterns and live runs surfaced the error. - `run.rs::process_table` adds the `DriftAction::AlterColumnTypes` branch between `DropAndRecreate` and the existing `added_columns` branch. Emits via `drift::generate_alter_column_sql` (which now routes through `dialect.alter_column_type_sql`) and surfaces as `action: "alter_column_types"` in the run output. If the same drift round also surfaced added columns, both apply before the INSERT continues. Smoke test: extends `live/drift/run.sh` to a four-stage flow: 1. Initial 4-column source → replicate clean 2. ALTER source ADD COLUMN region → `add_columns` action 3. DROP+CREATE source with id INT64→STRING → `drop_and_recreate` 4. DROP+CREATE source with score INT64→NUMERIC → `alter_column_types`; target's score column is now NUMERIC

hugocorreia90 merged commit 8ee6016 into main May 1, 2026
12 checks passed

hugocorreia90 deleted the fix/drift-alter-column-types branch May 1, 2026 15:04

hugocorreia90 mentioned this pull request May 1, 2026

fix(rocky-bigquery): wire alter_column_types drift via BQ-specific overrides #333

Merged

5 tasks

This was referenced May 1, 2026

docs(rocky-bigquery): conformance audit + recommendation to drop is_experimental #334

Merged

feat(rocky-bigquery): drop is_experimental — adapter promoted #335

Merged

chore: release engine-v1.21.0 + dagster-v1.19.0 #340

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(rocky-core): move is_safe_type_widening + alter_column_type_sql onto SqlDialect trait#332

refactor(rocky-core): move is_safe_type_widening + alter_column_type_sql onto SqlDialect trait#332
hugocorreia90 merged 1 commit intomainfrom
fix/drift-alter-column-types

hugocorreia90 commented May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

hugocorreia90 commented May 1, 2026

Why a refactor

What this changes

What this does NOT change

Receipt

Next PR

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant