Skip to content

test(07-adapters/05-bigquery-native-queries): add live drift smoke test#328

Merged
hugocorreia90 merged 1 commit intomainfrom
feat/bq-live-smoke-drift
May 1, 2026
Merged

test(07-adapters/05-bigquery-native-queries): add live drift smoke test#328
hugocorreia90 merged 1 commit intomainfrom
feat/bq-live-smoke-drift

Conversation

@hugocorreia90
Copy link
Copy Markdown
Contributor

First end-to-end replication-from-BQ smoke test. Exercises per-table drift detection on the BigQuery adapter — possible now that the DiscoveryAdapter shipped in #327, which unblocked replication-from-BQ pipelines.

What it covers

live/drift/run.sh:

  1. Seeds a 3-column source table (id INT64, name STRING, _updated_at TIMESTAMP) in a hc_phase13_drift_src_orders dataset.
  2. Runs rocky run — initial full_refresh since target doesn't exist; drift.tables_drifted == 0.
  3. Drops + recreates the source with id changed from INT64 to STRING (BigQuery doesn't support ALTER COLUMN TYPE for non-widening conversions, so DROP/CREATE is the only path).
  4. Runs rocky run again — drift detected, classified as drop_and_recreate per detect_drift, target dropped + re-replicated with the new schema.
  5. Asserts the JSON output includes drop_and_recreate in drift.actions_taken and the target's id column is now STRING.
==> drift run summary: tables_checked=1, tables_drifted=1
==> action: drop_and_recreate, reason=column 'id' changed STRING → INT64
==> target customers.id is now STRING (drop_and_recreate took effect)

Idempotent across consecutive runs.

Findings surfaced (documented in live/README.md)

  • Finding chore(deps): bump criterion from 0.5.1 to 0.7.0 in /engine #8: detect_drift (engine/crates/rocky-core/src/drift.rs) ignores added columns. The function's docstring claims "they appear naturally via SELECT *", but the incremental strategy then issues INSERT INTO target SELECT * FROM source against a target whose schema is fixed, and BigQuery rejects with Inserted row has wrong column count. The smoke test sidesteps this by changing an existing column's type instead. Worth a separate fix-with-receipt PR (graduated drift evolution: emit ALTER TABLE ADD COLUMN or restrict source SELECT to known target columns).
  • The runtime at run.rs:4137 only wires the drop_and_recreate branch from DriftAction; alter_column_types (the safe-widening case) is detected but takes no action. Same shape as chore(deps): bump criterion from 0.5.1 to 0.7.0 in /engine #8 — also worth a follow-up.

These are pre-existing gaps in rocky-core, surfaced by the first live drift exercise. Not blocking this PR.

Test plan

  • live/drift/run.sh against the sandbox: exits 0, drift detected, target reflects new schema
  • Idempotency: two consecutive runs both pass
  • Stale dataset cleanup via trapbq rm -r -f -d runs even on failure

First end-to-end replication-from-BQ smoke test, exercising per-table
drift detection on the BigQuery adapter. Now possible because the
DiscoveryAdapter shipped in PR #327.

The driver:

1. Seeds a 3-column source table (`id INT64, name STRING, _updated_at
   TIMESTAMP`) in a `hc_phase13_drift_src_orders` dataset.
2. Runs `rocky run` — replicates source to target via initial
   `full_refresh` (no drift, target_exists=false).
3. Drops + recreates the source with `id` changed from `INT64` to
   `STRING` (an unsafe type swap — BigQuery doesn't support
   ALTER COLUMN TYPE for non-widening conversions).
4. Runs `rocky run` again — drift is detected, classified as
   `drop_and_recreate` (per `detect_drift` in `rocky-core/src/drift.rs`),
   target is dropped and re-replicated with the new schema.
5. Asserts `drift.tables_drifted == 1` and the
   `drop_and_recreate` action is in `drift.actions_taken`, plus
   the target's `id` column is now STRING.

Adds finding #8 to `live/README.md`: `detect_drift` ignores added
columns by design, but the incremental strategy then fails with
`Inserted row has wrong column count` when the target's fixed schema
can't accept the source's expanded `SELECT *`. Captured as a separate
gap because fixing it requires a design call (graduated drift
evolution: emit `ALTER TABLE ADD COLUMN`, or restrict source SELECT
to known target columns). Same goes for `alter_column_types` — the
detection branch exists but the runtime only wires
`drop_and_recreate`, so safe widenings fall through.

Idempotent across consecutive runs.
@hugocorreia90 hugocorreia90 merged commit d3aa332 into main May 1, 2026
7 checks passed
@hugocorreia90 hugocorreia90 deleted the feat/bq-live-smoke-drift branch May 1, 2026 13:40
hugocorreia90 added a commit that referenced this pull request May 1, 2026
…errides (#333)

Closes the second drift-evolution gap from #328. PR #332 lifted
`is_safe_type_widening` + `alter_column_type_sql` onto the
`SqlDialect` trait with default impls preserving Databricks/Spark
semantics. This PR adds the BigQuery-specific overrides + the
runtime wiring to actually emit `AlterColumnTypes` SQL.

Engine changes:

- `BigQueryDialect::alter_column_type_sql` overrides the default
  ANSI form with BQ's required `ALTER COLUMN x SET DATA TYPE y`. The
  default `ALTER COLUMN x TYPE y` shape returns
  `Expected keyword DROP or keyword SET` from BigQuery.
- `BigQueryDialect::is_safe_type_widening` declares a strict
  BQ-specific allowlist:
    - `INT64 → NUMERIC` (lossless: INT64 fits in NUMERIC precision 38)
    - `INT64 → BIGNUMERIC` (lossless)
    - `NUMERIC → BIGNUMERIC` (strict precision widening)
  Excluded by design:
    - `… → FLOAT64`: lossy for absolute values > 2^53. BigQuery
      accepts via SET DATA TYPE but Rocky's "safe" contract is strict
      (matches the default Databricks/Spark allowlist that omits
      `INT → FLOAT`).
    - `… → STRING`: BigQuery's `ALTER COLUMN SET DATA TYPE` rejects
      these with `existing column type X is not assignable to
      STRING` even though STRING is lossless at the value level. The
      default allowlist's "any numeric → STRING" pattern doesn't
      transfer to BQ. Discovered via live verification — initial
      draft included these patterns and live runs surfaced the error.
- `run.rs::process_table` adds the `DriftAction::AlterColumnTypes`
  branch between `DropAndRecreate` and the existing `added_columns`
  branch. Emits via `drift::generate_alter_column_sql` (which now
  routes through `dialect.alter_column_type_sql`) and surfaces as
  `action: "alter_column_types"` in the run output. If the same
  drift round also surfaced added columns, both apply before the
  INSERT continues.

Smoke test: extends `live/drift/run.sh` to a four-stage flow:

1. Initial 4-column source → replicate clean
2. ALTER source ADD COLUMN region → `add_columns` action
3. DROP+CREATE source with id INT64→STRING → `drop_and_recreate`
4. DROP+CREATE source with score INT64→NUMERIC →
   `alter_column_types`; target's score column is now NUMERIC
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant