feat(rocky-bigquery): add BigQueryDiscoveryAdapter + wire end-to-end#327
Merged
hugocorreia90 merged 1 commit intomainfrom May 1, 2026
Merged
feat(rocky-bigquery): add BigQueryDiscoveryAdapter + wire end-to-end#327hugocorreia90 merged 1 commit intomainfrom
hugocorreia90 merged 1 commit intomainfrom
Conversation
The BigQuery adapter has only ever implemented `WarehouseAdapter`, even though the lib.rs docstring claimed otherwise. The capability table correctly marked `bigquery` as `DATA_ONLY`, so any pipeline declaring `[pipeline.X.source.discovery] adapter = "bigquery"` failed config validation, and replication-from-BQ pipelines bailed at run time with "no discovery adapter configured for this pipeline". This blocked drift verification, the conformance suite, and any straightforward BQ-source replication. Adds `BigQueryDiscoveryAdapter` that lists matching datasets via the region-qualified `INFORMATION_SCHEMA.SCHEMATA` view (the unqualified form is region-scoped to wherever the query runs and silently misses cross-region datasets) and per-dataset tables via `INFORMATION_SCHEMA.TABLES`. Uses `STARTS_WITH` for the prefix match because dataset names commonly contain the literal `_` character that SQL `LIKE` would interpret as a wildcard. The adapter shares the same `BigQueryAdapter` instance as the warehouse path so auth, retry budgets, and HTTP client config are identical. Region is read from `BigQueryAdapter::location()`; the qualifier is validated as `[a-z0-9-]+` to keep the value out of the SQL injection surface. Capability table flips `bigquery` from `DATA_ONLY` to `BOTH`. CLI registry instantiates the discovery adapter alongside the warehouse one whenever a `bigquery` adapter is configured, mirroring how DuckDB already does both roles. Verified end-to-end: - Unit tests for the region-qualifier escape paths. - `#[ignore]` integration test creates two datasets with one table each, runs `discover`, asserts both surface with their tables. - `live/discover/run.sh` smoke driver runs the full `rocky discover --output json` path through the CLI registry. Both pass against the sandbox; idempotent across consecutive runs.
3 tasks
hugocorreia90
added a commit
that referenced
this pull request
May 1, 2026
…st (#328) First end-to-end replication-from-BQ smoke test, exercising per-table drift detection on the BigQuery adapter. Now possible because the DiscoveryAdapter shipped in PR #327. The driver: 1. Seeds a 3-column source table (`id INT64, name STRING, _updated_at TIMESTAMP`) in a `hc_phase13_drift_src_orders` dataset. 2. Runs `rocky run` — replicates source to target via initial `full_refresh` (no drift, target_exists=false). 3. Drops + recreates the source with `id` changed from `INT64` to `STRING` (an unsafe type swap — BigQuery doesn't support ALTER COLUMN TYPE for non-widening conversions). 4. Runs `rocky run` again — drift is detected, classified as `drop_and_recreate` (per `detect_drift` in `rocky-core/src/drift.rs`), target is dropped and re-replicated with the new schema. 5. Asserts `drift.tables_drifted == 1` and the `drop_and_recreate` action is in `drift.actions_taken`, plus the target's `id` column is now STRING. Adds finding #8 to `live/README.md`: `detect_drift` ignores added columns by design, but the incremental strategy then fails with `Inserted row has wrong column count` when the target's fixed schema can't accept the source's expanded `SELECT *`. Captured as a separate gap because fixing it requires a design call (graduated drift evolution: emit `ALTER TABLE ADD COLUMN`, or restrict source SELECT to known target columns). Same goes for `alter_column_types` — the detection branch exists but the runtime only wires `drop_and_recreate`, so safe widenings fall through. Idempotent across consecutive runs.
This was referenced May 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The BigQuery adapter has only ever implemented
WarehouseAdapter, even though the lib.rs docstring claimed otherwise. As a result:bigqueryasDATA_ONLY.[pipeline.X.source.discovery] adapter = "bigquery"failed config validation.no discovery adapter configured for this pipeline.This blocked drift verification, the conformance suite, and the natural source-to-target replication path on BQ.
What this adds
BigQueryDiscoveryAdapter(engine/crates/rocky-bigquery/src/discovery.rs):INFORMATION_SCHEMA.SCHEMATAview. The unqualified form is region-scoped to wherever the query runs — cross-region projects silently get an empty result, which I confirmed live against the EU sandbox before writing the adapter.STARTS_WITHfor the prefix match (dataset names commonly contain literal_that SQLLIKEwould treat as a wildcard).BigQueryAdapterinstance as the warehouse path so auth, retry budgets, and HTTP client config are identical.BigQueryAdapter::location(); the qualifier is validated as[a-z0-9-]+to keep the value out of the SQL-injection surface.Wire-up
bigquery→BOTH.bigqueryadapter is configured (mirrors how DuckDB already does both roles).BigQueryAdapter::project_id()/location()so the discovery adapter doesn't need them threaded separately.Verification
us-east1, injection rejection, empty-location rejection).#[ignore]integration test: creates two datasets with one table each, runsdiscover, asserts both surface with their tables.live/discover/run.shsmoke driver: runs the fullrocky discover --output jsonpath through the CLI registry.Both pass against the sandbox; idempotent across consecutive runs.
Test plan
cargo test -p rocky-bigquery --lib— 61 passedcargo test -p rocky-bigquery --test integration -- --ignored— 2 passedcargo test -p rocky-core --lib adapter_capability— 4 passedcargo clippy -p rocky-bigquery -p rocky-cli -p rocky-core --all-targets -- -D warningscleancargo fmt -p rocky-bigquery -p rocky-cli -p rocky-core --checkcleanlive/discover/run.shagainst the sandbox — exits 0, both datasets surface, dataset cleanup runs