test(07-adapters/05-bigquery-native-queries): add live drift smoke test by hugocorreia90 · Pull Request #328 · rocky-data/rocky

hugocorreia90 · 2026-05-01T13:35:38Z

First end-to-end replication-from-BQ smoke test. Exercises per-table drift detection on the BigQuery adapter — possible now that the DiscoveryAdapter shipped in #327, which unblocked replication-from-BQ pipelines.

What it covers

live/drift/run.sh:

Seeds a 3-column source table (id INT64, name STRING, _updated_at TIMESTAMP) in a hc_phase13_drift_src_orders dataset.
Runs rocky run — initial full_refresh since target doesn't exist; drift.tables_drifted == 0.
Drops + recreates the source with id changed from INT64 to STRING (BigQuery doesn't support ALTER COLUMN TYPE for non-widening conversions, so DROP/CREATE is the only path).
Runs rocky run again — drift detected, classified as drop_and_recreate per detect_drift, target dropped + re-replicated with the new schema.
Asserts the JSON output includes drop_and_recreate in drift.actions_taken and the target's id column is now STRING.

==> drift run summary: tables_checked=1, tables_drifted=1
==> action: drop_and_recreate, reason=column 'id' changed STRING → INT64
==> target customers.id is now STRING (drop_and_recreate took effect)

Idempotent across consecutive runs.

Findings surfaced (documented in `live/README.md`)

Finding chore(deps): bump criterion from 0.5.1 to 0.7.0 in /engine #8: detect_drift (engine/crates/rocky-core/src/drift.rs) ignores added columns. The function's docstring claims "they appear naturally via SELECT *", but the incremental strategy then issues INSERT INTO target SELECT * FROM source against a target whose schema is fixed, and BigQuery rejects with Inserted row has wrong column count. The smoke test sidesteps this by changing an existing column's type instead. Worth a separate fix-with-receipt PR (graduated drift evolution: emit ALTER TABLE ADD COLUMN or restrict source SELECT to known target columns).
The runtime at run.rs:4137 only wires the drop_and_recreate branch from DriftAction; alter_column_types (the safe-widening case) is detected but takes no action. Same shape as chore(deps): bump criterion from 0.5.1 to 0.7.0 in /engine #8 — also worth a follow-up.

These are pre-existing gaps in rocky-core, surfaced by the first live drift exercise. Not blocking this PR.

Test plan

live/drift/run.sh against the sandbox: exits 0, drift detected, target reflects new schema
Idempotency: two consecutive runs both pass
Stale dataset cleanup via trap — bq rm -r -f -d runs even on failure

First end-to-end replication-from-BQ smoke test, exercising per-table drift detection on the BigQuery adapter. Now possible because the DiscoveryAdapter shipped in PR #327. The driver: 1. Seeds a 3-column source table (`id INT64, name STRING, _updated_at TIMESTAMP`) in a `hc_phase13_drift_src_orders` dataset. 2. Runs `rocky run` — replicates source to target via initial `full_refresh` (no drift, target_exists=false). 3. Drops + recreates the source with `id` changed from `INT64` to `STRING` (an unsafe type swap — BigQuery doesn't support ALTER COLUMN TYPE for non-widening conversions). 4. Runs `rocky run` again — drift is detected, classified as `drop_and_recreate` (per `detect_drift` in `rocky-core/src/drift.rs`), target is dropped and re-replicated with the new schema. 5. Asserts `drift.tables_drifted == 1` and the `drop_and_recreate` action is in `drift.actions_taken`, plus the target's `id` column is now STRING. Adds finding #8 to `live/README.md`: `detect_drift` ignores added columns by design, but the incremental strategy then fails with `Inserted row has wrong column count` when the target's fixed schema can't accept the source's expanded `SELECT *`. Captured as a separate gap because fixing it requires a design call (graduated drift evolution: emit `ALTER TABLE ADD COLUMN`, or restrict source SELECT to known target columns). Same goes for `alter_column_types` — the detection branch exists but the runtime only wires `drop_and_recreate`, so safe widenings fall through. Idempotent across consecutive runs.

…errides (#333) Closes the second drift-evolution gap from #328. PR #332 lifted `is_safe_type_widening` + `alter_column_type_sql` onto the `SqlDialect` trait with default impls preserving Databricks/Spark semantics. This PR adds the BigQuery-specific overrides + the runtime wiring to actually emit `AlterColumnTypes` SQL. Engine changes: - `BigQueryDialect::alter_column_type_sql` overrides the default ANSI form with BQ's required `ALTER COLUMN x SET DATA TYPE y`. The default `ALTER COLUMN x TYPE y` shape returns `Expected keyword DROP or keyword SET` from BigQuery. - `BigQueryDialect::is_safe_type_widening` declares a strict BQ-specific allowlist: - `INT64 → NUMERIC` (lossless: INT64 fits in NUMERIC precision 38) - `INT64 → BIGNUMERIC` (lossless) - `NUMERIC → BIGNUMERIC` (strict precision widening) Excluded by design: - `… → FLOAT64`: lossy for absolute values > 2^53. BigQuery accepts via SET DATA TYPE but Rocky's "safe" contract is strict (matches the default Databricks/Spark allowlist that omits `INT → FLOAT`). - `… → STRING`: BigQuery's `ALTER COLUMN SET DATA TYPE` rejects these with `existing column type X is not assignable to STRING` even though STRING is lossless at the value level. The default allowlist's "any numeric → STRING" pattern doesn't transfer to BQ. Discovered via live verification — initial draft included these patterns and live runs surfaced the error. - `run.rs::process_table` adds the `DriftAction::AlterColumnTypes` branch between `DropAndRecreate` and the existing `added_columns` branch. Emits via `drift::generate_alter_column_sql` (which now routes through `dialect.alter_column_type_sql`) and surfaces as `action: "alter_column_types"` in the run output. If the same drift round also surfaced added columns, both apply before the INSERT continues. Smoke test: extends `live/drift/run.sh` to a four-stage flow: 1. Initial 4-column source → replicate clean 2. ALTER source ADD COLUMN region → `add_columns` action 3. DROP+CREATE source with id INT64→STRING → `drop_and_recreate` 4. DROP+CREATE source with score INT64→NUMERIC → `alter_column_types`; target's score column is now NUMERIC

hugocorreia90 merged commit d3aa332 into main May 1, 2026
7 checks passed

hugocorreia90 deleted the feat/bq-live-smoke-drift branch May 1, 2026 13:40

hugocorreia90 mentioned this pull request May 1, 2026

fix(rocky-bigquery): wire alter_column_types drift via BQ-specific overrides #333

Merged

5 tasks

This was referenced May 1, 2026

docs(rocky-bigquery): conformance audit + recommendation to drop is_experimental #334

Merged

feat(rocky-bigquery): drop is_experimental — adapter promoted #335

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test(07-adapters/05-bigquery-native-queries): add live drift smoke test#328

test(07-adapters/05-bigquery-native-queries): add live drift smoke test#328
hugocorreia90 merged 1 commit intomainfrom
feat/bq-live-smoke-drift

hugocorreia90 commented May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

hugocorreia90 commented May 1, 2026

What it covers

Findings surfaced (documented in live/README.md)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Findings surfaced (documented in `live/README.md`)