test(07-adapters/05-bigquery-native-queries): add live drift smoke test#328
Merged
hugocorreia90 merged 1 commit intomainfrom May 1, 2026
Merged
test(07-adapters/05-bigquery-native-queries): add live drift smoke test#328hugocorreia90 merged 1 commit intomainfrom
hugocorreia90 merged 1 commit intomainfrom
Conversation
First end-to-end replication-from-BQ smoke test, exercising per-table drift detection on the BigQuery adapter. Now possible because the DiscoveryAdapter shipped in PR #327. The driver: 1. Seeds a 3-column source table (`id INT64, name STRING, _updated_at TIMESTAMP`) in a `hc_phase13_drift_src_orders` dataset. 2. Runs `rocky run` — replicates source to target via initial `full_refresh` (no drift, target_exists=false). 3. Drops + recreates the source with `id` changed from `INT64` to `STRING` (an unsafe type swap — BigQuery doesn't support ALTER COLUMN TYPE for non-widening conversions). 4. Runs `rocky run` again — drift is detected, classified as `drop_and_recreate` (per `detect_drift` in `rocky-core/src/drift.rs`), target is dropped and re-replicated with the new schema. 5. Asserts `drift.tables_drifted == 1` and the `drop_and_recreate` action is in `drift.actions_taken`, plus the target's `id` column is now STRING. Adds finding #8 to `live/README.md`: `detect_drift` ignores added columns by design, but the incremental strategy then fails with `Inserted row has wrong column count` when the target's fixed schema can't accept the source's expanded `SELECT *`. Captured as a separate gap because fixing it requires a design call (graduated drift evolution: emit `ALTER TABLE ADD COLUMN`, or restrict source SELECT to known target columns). Same goes for `alter_column_types` — the detection branch exists but the runtime only wires `drop_and_recreate`, so safe widenings fall through. Idempotent across consecutive runs.
5 tasks
hugocorreia90
added a commit
that referenced
this pull request
May 1, 2026
…errides (#333) Closes the second drift-evolution gap from #328. PR #332 lifted `is_safe_type_widening` + `alter_column_type_sql` onto the `SqlDialect` trait with default impls preserving Databricks/Spark semantics. This PR adds the BigQuery-specific overrides + the runtime wiring to actually emit `AlterColumnTypes` SQL. Engine changes: - `BigQueryDialect::alter_column_type_sql` overrides the default ANSI form with BQ's required `ALTER COLUMN x SET DATA TYPE y`. The default `ALTER COLUMN x TYPE y` shape returns `Expected keyword DROP or keyword SET` from BigQuery. - `BigQueryDialect::is_safe_type_widening` declares a strict BQ-specific allowlist: - `INT64 → NUMERIC` (lossless: INT64 fits in NUMERIC precision 38) - `INT64 → BIGNUMERIC` (lossless) - `NUMERIC → BIGNUMERIC` (strict precision widening) Excluded by design: - `… → FLOAT64`: lossy for absolute values > 2^53. BigQuery accepts via SET DATA TYPE but Rocky's "safe" contract is strict (matches the default Databricks/Spark allowlist that omits `INT → FLOAT`). - `… → STRING`: BigQuery's `ALTER COLUMN SET DATA TYPE` rejects these with `existing column type X is not assignable to STRING` even though STRING is lossless at the value level. The default allowlist's "any numeric → STRING" pattern doesn't transfer to BQ. Discovered via live verification — initial draft included these patterns and live runs surfaced the error. - `run.rs::process_table` adds the `DriftAction::AlterColumnTypes` branch between `DropAndRecreate` and the existing `added_columns` branch. Emits via `drift::generate_alter_column_sql` (which now routes through `dialect.alter_column_type_sql`) and surfaces as `action: "alter_column_types"` in the run output. If the same drift round also surfaced added columns, both apply before the INSERT continues. Smoke test: extends `live/drift/run.sh` to a four-stage flow: 1. Initial 4-column source → replicate clean 2. ALTER source ADD COLUMN region → `add_columns` action 3. DROP+CREATE source with id INT64→STRING → `drop_and_recreate` 4. DROP+CREATE source with score INT64→NUMERIC → `alter_column_types`; target's score column is now NUMERIC
This was referenced May 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
First end-to-end replication-from-BQ smoke test. Exercises per-table drift detection on the BigQuery adapter — possible now that the DiscoveryAdapter shipped in #327, which unblocked replication-from-BQ pipelines.
What it covers
live/drift/run.sh:id INT64, name STRING, _updated_at TIMESTAMP) in ahc_phase13_drift_src_ordersdataset.rocky run— initialfull_refreshsince target doesn't exist;drift.tables_drifted == 0.idchanged fromINT64toSTRING(BigQuery doesn't supportALTER COLUMN TYPEfor non-widening conversions, so DROP/CREATE is the only path).rocky runagain — drift detected, classified asdrop_and_recreateperdetect_drift, target dropped + re-replicated with the new schema.drop_and_recreateindrift.actions_takenand the target'sidcolumn is nowSTRING.Idempotent across consecutive runs.
Findings surfaced (documented in
live/README.md)detect_drift(engine/crates/rocky-core/src/drift.rs) ignores added columns. The function's docstring claims "they appear naturally viaSELECT *", but the incremental strategy then issuesINSERT INTO target SELECT * FROM sourceagainst a target whose schema is fixed, and BigQuery rejects withInserted row has wrong column count. The smoke test sidesteps this by changing an existing column's type instead. Worth a separate fix-with-receipt PR (graduated drift evolution: emitALTER TABLE ADD COLUMNor restrict source SELECT to known target columns).run.rs:4137only wires thedrop_and_recreatebranch fromDriftAction;alter_column_types(the safe-widening case) is detected but takes no action. Same shape as chore(deps): bump criterion from 0.5.1 to 0.7.0 in /engine #8 — also worth a follow-up.These are pre-existing gaps in
rocky-core, surfaced by the first live drift exercise. Not blocking this PR.Test plan
live/drift/run.shagainst the sandbox: exits 0, drift detected, target reflects new schematrap—bq rm -r -f -druns even on failure