fix(rocky-core): detect added columns as drift + ALTER target before INSERT#331
Merged
hugocorreia90 merged 1 commit intomainfrom May 1, 2026
Merged
Conversation
…INSERT `detect_drift` previously ignored columns present in the source but absent from the target. The docstring claimed "they appear naturally via SELECT *", but the runtime's incremental INSERT path then issues `INSERT INTO target SELECT * FROM source` against a target whose schema is fixed — and BigQuery / Snowflake / Databricks all reject the INSERT with `Inserted row has wrong column count`. The natural "source schema evolves; replicate again" workflow was structurally broken. This PR: - Adds `added_columns: Vec<ColumnInfo>` to `DriftResult` and populates it in `detect_drift` from the same single pass over source columns. - New helper `drift::generate_add_column_sql` mirroring the existing `generate_alter_column_sql`. Standard `ALTER TABLE … ADD COLUMN` works across all four adapters today; no dialect override needed. - Runtime change in `run.rs`: when the drift result reports added columns (and DropAndRecreate isn't already firing), execute the ALTER statements before continuing with the regular INSERT path. Surfaces as `action: "add_columns"` in the run output's `drift.actions_taken` so orchestrators can observe schema evolution alongside data movement. Verified end-to-end via the existing `live/drift/run.sh`, extended into a three-stage flow: 1. Initial replication of a 3-column source (no drift). 2. `ALTER source ADD COLUMN region`; rerun — asserts `add_columns` action fires and target gains the column without a full refresh (historical rows stay, new rows include the source value). 3. DROP + CREATE source with `id` type changed `INT64`→`STRING`; rerun — asserts `drop_and_recreate` action fires and target's id column is now STRING. Idempotent across consecutive runs. A sibling gap remains: the `AlterColumnTypes` action is detected in `drift.rs` for safe widenings but the runtime at `run.rs:4105` still only wires `DropAndRecreate`. Captured as finding #9 in `live/README.md`; safe widenings silently fall through to the next INSERT.
This was referenced May 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
detect_driftpreviously ignored columns present in the source but absent from the target. The docstring claimed "they appear naturally via SELECT *", but the runtime's incremental INSERT path then issuesINSERT INTO target SELECT * FROM sourceagainst a target whose schema is fixed — and BigQuery / Snowflake / Databricks all reject withInserted row has wrong column count.The natural "source schema evolves; replicate again" workflow was structurally broken.
What this adds
DriftResult.added_columns: Vec<ColumnInfo>populated in the same single pass over source columns. Reuses the existingColumnInfoshape (name + data_type) so no codegen ripple.drift::generate_add_column_sqlmirroringgenerate_alter_column_sql. StandardALTER TABLE … ADD COLUMNworks across all four adapters; no dialect override needed.run.rs: when the drift result reports added columns (andDropAndRecreateisn't already firing for type drift), execute the ALTER statements before the regular INSERT path. Surfaces asaction: "add_columns"indrift.actions_takenso orchestrators can observe schema evolution alongside data movement.Verification
Extended
live/drift/run.shinto a three-stage flow:ALTER source ADD COLUMN region; rerun — assertsadd_columnsaction fires and target gains the column without a full refresh (historical rows stay, new rows include the source value).idtype changedINT64→STRING; rerun — assertsdrop_and_recreateaction fires and target'sidcolumn is nowSTRING.Idempotent across consecutive runs.
Sibling gap (not fixed here)
DriftAction::AlterColumnTypesis detected indrift.rsfor safe widenings (e.g.INT→BIGINT) but the runtime atrun.rs:4105still only wiresDropAndRecreate. Safe widenings silently fall through to the next INSERT. Documented as finding #9 inlive/README.md.Test plan
cargo test -p rocky-core --lib drift— 35 passed (includes 4 new tests)cargo clippy -p rocky-core -p rocky-cli --all-targets -- -D warningscleancargo fmt -p rocky-core -p rocky-cli --checkcleanlive/drift/run.shagainst the BQ sandbox — exits 0, all three stages pass