-
Notifications
You must be signed in to change notification settings - Fork 544
feat(pg): Faster migrations #7819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This adds diff methods for `Table`. This will help simplify migrations so we can transform the destination/database into `Table` struct and then understand what was changed/added in the same way across all destination plugins. PostgreSQL PR example - cloudquery/cloudquery#7819
candiduslynx
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I sense that there are issues with the append mode now (_cq_id was used as PK there).
| sql := "alter table " + pgx.Identifier{table.Name}.Sanitize() + " alter column " + pgx.Identifier{col}.Sanitize() + " set not null" | ||
| if _, err := c.conn.Exec(ctx, sql); err != nil { | ||
| return fmt.Errorf("failed to set not null on column %s on table %s: %w", col, table.Name, err) | ||
| if c.enabledPks() && !table.IsPrimaryKeyEqual(pgTable) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we may want to move this check to be above the GetAddedColumns check, so that no changes are made to the table if migrate-forced is false and a forced migration is necessary. Otherwise the table can be left in a half-migrated state. (Either that or move the added columns first, so that the user only needs to manually do the forced operations.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why it's half baked? but I can't move it up because potentially the primary keys are on the columns that were just added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok yeah you are right. I now moved that check up to fail early.
| switch t { | ||
| case "boolean": | ||
| return schema.TypeBool | ||
| case "bigint", "integer": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I guess there is some information lost in the conversion here because our types don't map 1:1 to postgres 🤔 So if you have a source table and write it back to postgres, the schemas won't match exactly (integers will become bigints). In this case it's fine, but hopefully we don't run into irreconcilable cases in other databases...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I guess we need to transform the incoming tables to pgtables so we can compare them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think in this case it's fine until we add BigInt type to our type system but if the information loss is the otherway around then we will need to transform first the Table -> DbTable and only then compare what we got from database table.
hermanschaaf
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the query to select all tables is not accounting for tables with composite PKs in the way we expect
d769e8a to
722d997
Compare
For each test in the destination testing suite we use unique table now. This should solve the bug I've hit this PR - cloudquery/cloudquery#7819
Co-authored-by: Erez Rokah <[email protected]>
Migration detection APIs take 2. - This creates **one** API that returns all the changes instead of multiple growing APIs for different set of changes. - Improves Migration tests for different use-cases This will come up also with a Postgres PR and documentation PR: Basically each destination plugin would have the following table: Migrations Logic | Change | Behaviour | |----------|----------------------------------| | Add Column | Auto Migrate/Drop Table | | Add Column with Not Null | Auto Migrate/Drop Table | | Remove Column | Auto Migrate/Drop Table | | Remove Column with Not Null | Auto Migrate/Drop Table | | Change Column | Auto Migrate/Drop Table | and we can easily expand it in the code, tests and documentations. Example of usage in cloudquery/cloudquery#7819
Co-authored-by: candiduslynx <[email protected]>
erezrokah
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see anything blocking 🚀 I'm still testing, but if you'd like to merge I think that's fine and we can follow up with fixes if we have them
erezrokah
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, found a bug
Co-authored-by: Erez Rokah <[email protected]>
| return schema.TypeByteArray | ||
| case "text[]": | ||
| return schema.TypeStringArray | ||
| case "timestamp without time zone": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will only become a problem with CDC, but at some point we'll also need to support (or at least handle without panicking?) all the other postgres types, like timestamptz.
🤖 I have created a release *beep* *boop* --- ## [2.1.0](plugins-destination-postgresql-v2.0.10...plugins-destination-postgresql-v2.1.0) (2023-02-16) ### Features * **pg:** Faster migrations ([#7819](#7819)) ([8f51733](8f51733)) ### Bug Fixes * **deps:** Update module github.com/cloudquery/plugin-sdk to v1.36.0 ([#7809](#7809)) ([c85a9cb](c85a9cb)) * Postgresql timestamps ([#7840](#7840)) ([e2c8b61](e2c8b61)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
#### Summary Same as #7819 for SQLite <!--
#### Summary Same as #7819 for MSSQL. This is a breaking change due to the type change from `datetimeoffset` to `datetime2` as we don't want to save time zone offset, see https://github.com/cloudquery/cloudquery/blob/ee4b2c01867b2140537ee4656d943d2e0cf2898b/plugins/destination/postgresql/client/types.go#L74 I suggest following the code in an IDE as it's hard to understand the changes from the diff. BEGIN_COMMIT_OVERRIDE feat(deps): Update MSSQL plugin-sdk to v1.38.2 BREAKING-CHANGE: Update MSSQL plugin-sdk to v1.38.2. You'll need to drop the database and start fresh due to a change in the schema for all timestamp columns from `datetimeoffset` to `datetime2`. END_COMMIT_OVERRIDE <!--
Blocked by cloudquery/plugin-sdk#668
This streamlines migrations, introduce force option and in general should speed up migrations substantially as we only do now one query instead of query per table.
Blocked by another take in SDK - cloudquery/plugin-sdk#688