Skip to content

refactor(rocky-core): move is_safe_type_widening + alter_column_type_sql onto SqlDialect trait#332

Merged
hugocorreia90 merged 1 commit intomainfrom
fix/drift-alter-column-types
May 1, 2026
Merged

refactor(rocky-core): move is_safe_type_widening + alter_column_type_sql onto SqlDialect trait#332
hugocorreia90 merged 1 commit intomainfrom
fix/drift-alter-column-types

Conversation

@hugocorreia90
Copy link
Copy Markdown
Contributor

First of two PRs enabling per-dialect drift evolution. This one is purely a trait-surface refactor with default impls preserving every existing behavior; the follow-up adds the BigQuery override + runtime wiring + live smoke.

Why a refactor

Pre-flight against BigQuery surfaced two structural mismatches that ruled out a narrower fix:

  • BQ rejects ALTER TABLE x ALTER COLUMN y TYPE z (the existing engine emit). It requires ALTER COLUMN y SET DATA TYPE z.
  • is_safe_type_widening in drift.rs lists Databricks/Spark-flavored names (INT/BIGINT/SMALLINT/FLOAT/DOUBLE/DECIMAL/VARCHAR). None of BigQuery's canonical names (INT64/NUMERIC/FLOAT64/BIGNUMERIC) appear.

Bolting BQ types onto the global allowlist would lock in a structural compromise: BIGINT → STRING and INT64 → STRING would coexist as duplicate-meaning entries because the two adapters spell the same logical conversion differently. Once that ships, every future adapter inherits the same asymmetry.

What this changes

  • SqlDialect::is_safe_type_widening(&str, &str) -> bool with a default impl that preserves the current Databricks/Spark allowlist verbatim.
  • SqlDialect::alter_column_type_sql(&str, &str, &str) -> AdapterResult<String> with a default impl emitting the existing ANSI form. Validation (identifier check + type allowlist) lives inside the default impl so any future override inherits it for free.
  • drift::detect_drift takes &dyn SqlDialect and routes the widening check through the trait method.
  • drift::generate_alter_column_sql delegates the SQL emit to the trait method.
  • Free function is_safe_type_widening becomes default_is_safe_type_widening (pub(crate)) so the default trait impl can call it without exposing a public alias.

What this does NOT change

  • Any existing behavior. Default impls preserve every allowlist entry and emit string verbatim.
  • BigQuery's is_experimental status — still flagged.
  • Any adapter implementation. The defaults mean Databricks/Snowflake/DuckDB/BigQuery all keep the same drift semantics they had yesterday.

Receipt

This is a refactor; the receipt is the regression suite, not a new live test:

  • cargo test -p rocky-core --lib — 1129 passed
  • cargo test -p rocky-core --test e2e — 30 passed
  • cargo test -p rocky-core --test integration — 20 passed
  • cargo clippy --all-targets -- -D warnings clean
  • cargo fmt --all --check clean

The live drift smoke test (live/drift/run.sh) still passes with no changes — verified end-to-end against the BQ sandbox.

Next PR

Adds the BigQuery dialect override:

  • BigQueryDialect::alter_column_type_sql — emits BQ's required ALTER COLUMN ... SET DATA TYPE form
  • BigQueryDialect::is_safe_type_widening — declares BQ's widening allowlist (INT64 → NUMERIC, INT64 → BIGNUMERIC, NUMERIC → BIGNUMERIC, plus numeric-to-STRING for BQ types)
  • Runtime wiring: run.rs currently only handles DropAndRecreate; the next PR adds the AlterColumnTypes branch that calls generate_alter_column_sql.
  • Live smoke: live/drift/run.sh gains a fourth stage exercising a safe widening (e.g. INT64 → NUMERIC) end-to-end.

Open question to settle before that PR: is INT64 → FLOAT64 "safe widening" under Rocky's definition? Lossy for values > 2^53.

Test plan

  • All rocky-core tests pass with the new signature
  • cargo clippy --all-targets -- -D warnings clean
  • cargo fmt --all --check clean
  • Existing live/drift/run.sh smoke test against the sandbox still passes (3-stage flow unchanged)

…sql onto SqlDialect trait

Pre-flight against the BigQuery sandbox surfaced two structural
mismatches: BQ rejects the existing `ALTER TABLE x ALTER COLUMN y
TYPE z` emit (it requires `SET DATA TYPE`), and `is_safe_type_widening`
in `drift.rs` lists Databricks/Spark-flavored type names — none of
BQ's canonical names (`INT64`, `NUMERIC`, `FLOAT64`, `BIGNUMERIC`)
appear. Bolting BQ types onto the global allowlist would lock in a
structural compromise: e.g. `BIGINT → STRING` and `INT64 → STRING`
coexisting as duplicate-meaning entries because the two adapters
spell the same logical conversion differently.

This PR moves the two surfaces onto the `SqlDialect` trait so each
dialect can declare its own widening semantics:

- `SqlDialect::is_safe_type_widening(&str, &str) -> bool` with a
  default impl that preserves the current Databricks/Spark allowlist
  (INT/BIGINT/FLOAT/DOUBLE/DECIMAL/VARCHAR widenings + numeric-to-STRING).
- `SqlDialect::alter_column_type_sql(&str, &str, &str) -> AdapterResult<String>`
  with a default impl emitting the existing ANSI form. Validation
  (identifier + type allowlist) lives inside the default impl so
  dialect overrides inherit it for free.
- `drift::detect_drift` now takes `&dyn SqlDialect` and routes the
  widening check through the trait method. `drift::generate_alter_column_sql`
  delegates the SQL emit to the trait method.

Free function `is_safe_type_widening` in `drift.rs` becomes
`default_is_safe_type_widening`, `pub(crate)` so the default trait
impl can call it without exposing a public alias.

No behavior change today — the default impls preserve every existing
allowlist entry and emit string. This is a refactor PR; the receipt
is the regression suite (1129 unit tests + 30 e2e + 20 integration
all pass without modification beyond threading the new parameter).

The follow-up PR adds the BigQuery overrides (`ALTER COLUMN ... SET
DATA TYPE` shape; BQ-specific widening allowlist), wires the runtime
to actually emit the AlterColumnTypes branch, and ships a live smoke
test.
@hugocorreia90 hugocorreia90 merged commit 8ee6016 into main May 1, 2026
12 checks passed
@hugocorreia90 hugocorreia90 deleted the fix/drift-alter-column-types branch May 1, 2026 15:04
hugocorreia90 added a commit that referenced this pull request May 1, 2026
…errides (#333)

Closes the second drift-evolution gap from #328. PR #332 lifted
`is_safe_type_widening` + `alter_column_type_sql` onto the
`SqlDialect` trait with default impls preserving Databricks/Spark
semantics. This PR adds the BigQuery-specific overrides + the
runtime wiring to actually emit `AlterColumnTypes` SQL.

Engine changes:

- `BigQueryDialect::alter_column_type_sql` overrides the default
  ANSI form with BQ's required `ALTER COLUMN x SET DATA TYPE y`. The
  default `ALTER COLUMN x TYPE y` shape returns
  `Expected keyword DROP or keyword SET` from BigQuery.
- `BigQueryDialect::is_safe_type_widening` declares a strict
  BQ-specific allowlist:
    - `INT64 → NUMERIC` (lossless: INT64 fits in NUMERIC precision 38)
    - `INT64 → BIGNUMERIC` (lossless)
    - `NUMERIC → BIGNUMERIC` (strict precision widening)
  Excluded by design:
    - `… → FLOAT64`: lossy for absolute values > 2^53. BigQuery
      accepts via SET DATA TYPE but Rocky's "safe" contract is strict
      (matches the default Databricks/Spark allowlist that omits
      `INT → FLOAT`).
    - `… → STRING`: BigQuery's `ALTER COLUMN SET DATA TYPE` rejects
      these with `existing column type X is not assignable to
      STRING` even though STRING is lossless at the value level. The
      default allowlist's "any numeric → STRING" pattern doesn't
      transfer to BQ. Discovered via live verification — initial
      draft included these patterns and live runs surfaced the error.
- `run.rs::process_table` adds the `DriftAction::AlterColumnTypes`
  branch between `DropAndRecreate` and the existing `added_columns`
  branch. Emits via `drift::generate_alter_column_sql` (which now
  routes through `dialect.alter_column_type_sql`) and surfaces as
  `action: "alter_column_types"` in the run output. If the same
  drift round also surfaced added columns, both apply before the
  INSERT continues.

Smoke test: extends `live/drift/run.sh` to a four-stage flow:

1. Initial 4-column source → replicate clean
2. ALTER source ADD COLUMN region → `add_columns` action
3. DROP+CREATE source with id INT64→STRING → `drop_and_recreate`
4. DROP+CREATE source with score INT64→NUMERIC →
   `alter_column_types`; target's score column is now NUMERIC
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant