Skip to content

fix(rocky-bigquery): wire alter_column_types drift via BQ-specific overrides#333

Merged
hugocorreia90 merged 1 commit intomainfrom
fix/bq-alter-column-types
May 1, 2026
Merged

fix(rocky-bigquery): wire alter_column_types drift via BQ-specific overrides#333
hugocorreia90 merged 1 commit intomainfrom
fix/bq-alter-column-types

Conversation

@hugocorreia90
Copy link
Copy Markdown
Contributor

Closes the second drift-evolution gap from #328. PR #332 lifted is_safe_type_widening + alter_column_type_sql onto the SqlDialect trait with default impls preserving Databricks/Spark semantics. This PR adds the BigQuery overrides + the runtime wiring to actually emit AlterColumnTypes SQL.

Engine changes

  • BigQueryDialect::alter_column_type_sql overrides the default ANSI form with BQ's required ALTER COLUMN x SET DATA TYPE y. The default ALTER COLUMN x TYPE y shape returns Expected keyword DROP or keyword SET from BigQuery.

  • BigQueryDialect::is_safe_type_widening declares a strict BQ-specific allowlist:

    • INT64 → NUMERIC (lossless: INT64 fits in NUMERIC precision 38)
    • INT64 → BIGNUMERIC (lossless)
    • NUMERIC → BIGNUMERIC (strict precision widening)

    Excluded by design:

    • … → FLOAT64: lossy for absolute values > 2^53. BigQuery accepts via SET DATA TYPE but Rocky's "safe" contract is strict (matches the default Databricks/Spark allowlist that omits INT → FLOAT).
    • … → STRING: BigQuery's ALTER COLUMN SET DATA TYPE rejects these with existing column type X is not assignable to STRING even though STRING is lossless at the value level. The default allowlist's "any numeric → STRING" pattern doesn't transfer to BQ. Discovered via live verification — initial draft included these patterns and live runs surfaced the error against stage 3 of the existing smoke test.
  • run.rs::process_table adds the DriftAction::AlterColumnTypes branch between DropAndRecreate and the existing added_columns branch. Emits via drift::generate_alter_column_sql (which routes through dialect.alter_column_type_sql) and surfaces as action: "alter_column_types" in the run output. If the same drift round also surfaced added columns, both apply before the INSERT continues.

Smoke test

Extends live/drift/run.sh to a four-stage flow:

  1. Initial 4-column source (with score INT64) → replicate clean
  2. ALTER source ADD COLUMN regionadd_columns action
  3. DROP+CREATE source with id INT64→STRINGdrop_and_recreate (unsafe per BQ allowlist)
  4. DROP+CREATE source with score INT64→NUMERICalter_column_types (safe widening)
==> stage 4 drift: action=alter_column_types OK (["column 'score' widened INT64 → NUMERIC"])
==> target customers.score is now NUMERIC (alter_column_types took effect)

Idempotent across consecutive runs.

Test plan

  • cargo test -p rocky-bigquery --lib — 68 passed (8 new dialect tests)
  • cargo clippy -p rocky-bigquery -p rocky-cli -p rocky-core --all-targets -- -D warnings clean
  • cargo fmt -p rocky-bigquery -p rocky-cli -p rocky-core --check clean
  • live/drift/run.sh against the BQ sandbox — exits 0, all 4 stages pass
  • Two consecutive runs both pass (idempotency)

…errides

Closes the second drift-evolution gap from #328. PR #332 lifted
`is_safe_type_widening` + `alter_column_type_sql` onto the
`SqlDialect` trait with default impls preserving Databricks/Spark
semantics. This PR adds the BigQuery-specific overrides + the
runtime wiring to actually emit `AlterColumnTypes` SQL.

Engine changes:

- `BigQueryDialect::alter_column_type_sql` overrides the default
  ANSI form with BQ's required `ALTER COLUMN x SET DATA TYPE y`. The
  default `ALTER COLUMN x TYPE y` shape returns
  `Expected keyword DROP or keyword SET` from BigQuery.
- `BigQueryDialect::is_safe_type_widening` declares a strict
  BQ-specific allowlist:
    - `INT64 → NUMERIC` (lossless: INT64 fits in NUMERIC precision 38)
    - `INT64 → BIGNUMERIC` (lossless)
    - `NUMERIC → BIGNUMERIC` (strict precision widening)
  Excluded by design:
    - `… → FLOAT64`: lossy for absolute values > 2^53. BigQuery
      accepts via SET DATA TYPE but Rocky's "safe" contract is strict
      (matches the default Databricks/Spark allowlist that omits
      `INT → FLOAT`).
    - `… → STRING`: BigQuery's `ALTER COLUMN SET DATA TYPE` rejects
      these with `existing column type X is not assignable to
      STRING` even though STRING is lossless at the value level. The
      default allowlist's "any numeric → STRING" pattern doesn't
      transfer to BQ. Discovered via live verification — initial
      draft included these patterns and live runs surfaced the error.
- `run.rs::process_table` adds the `DriftAction::AlterColumnTypes`
  branch between `DropAndRecreate` and the existing `added_columns`
  branch. Emits via `drift::generate_alter_column_sql` (which now
  routes through `dialect.alter_column_type_sql`) and surfaces as
  `action: "alter_column_types"` in the run output. If the same
  drift round also surfaced added columns, both apply before the
  INSERT continues.

Smoke test: extends `live/drift/run.sh` to a four-stage flow:

1. Initial 4-column source → replicate clean
2. ALTER source ADD COLUMN region → `add_columns` action
3. DROP+CREATE source with id INT64→STRING → `drop_and_recreate`
4. DROP+CREATE source with score INT64→NUMERIC →
   `alter_column_types`; target's score column is now NUMERIC
@hugocorreia90 hugocorreia90 merged commit f7b3049 into main May 1, 2026
12 checks passed
@hugocorreia90 hugocorreia90 deleted the fix/bq-alter-column-types branch May 1, 2026 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant