feat(mysql): Update to SDK v2 and Arrow #10169

erezrokah · 2023-04-18T13:44:00Z

Summary

Fixes #9930.
Still need to test a bit and I probably missed a bunch of stuff. Keeping as draft for now

BEGIN_COMMIT_OVERRIDE

feat: Update to plugin SDK v2

feat: Update to use Apache Arrow type system
BREAKING-CHANGE: This release introduces an internal change to our type system to use Apache Arrow. This should not have any visible breaking changes, however due to the size of the change we are introducing it under a major version bump to communicate that it might have some bugs that we weren't able to catch during our internal tests. If you encounter an issue during the upgrade, please submit a bug report.

feat: Timestamp fields type changed from datetime to datetime(6) to avoid losing fractional data
BREAKING-CHANGE: Timestamp fields type changed from datetime to datetime(6) to avoid losing fractional data

feat: Mac Address and Inet column type changed from nvarchar(255) to text as we cannot assume nvarchar(255) columns represent valid Mac Address or Inet
BREAKING-CHANGE: Mac Address and Inet column type changed from nvarchar(255) to text as we cannot assume nvarchar(255) columns represent valid Mac Address or Inet

feat: Float fields type changed from float to double for increased precision
BREAKING-CHANGE: Float fields type changed from float to double for increased precision

END_COMMIT_OVERRIDE

plugins/destination/mysql/client/migrate.go

plugins/destination/mysql/client/types.go

yevgenypats

I think the list/nested handling is incorrect here. Take a look here - https://github.com/cloudquery/cloudquery/blob/main/plugins/destination/postgresql/client/transformer.go#L122

erezrokah · 2023-04-19T10:27:35Z

I think the list/nested handling is incorrect here. Take a look here - https://github.com/cloudquery/cloudquery/blob/main/plugins/destination/postgresql/client/transformer.go#L122

OK, so I took a look and tried implementing the same approach to find out MySQL does not support array types.
In that case we could use the string value or add some special serialization to handle them (e.g. string with some delimiter we later on split by). SQLite does not handle list type either.

Does that make sense? I might be missing something with how we should handle arrays in Arrow, so happy to sync on it

yevgenypats

Wasn't aware mysql doesn't have nested types. Updated my review accordingly.

yevgenypats · 2023-04-19T11:06:40Z

plugins/destination/mysql/client/read.go

+			} else {
+				recordBuilder.Field(i).(*array.TimestampBuilder).Append(arrow.Timestamp((*asTime).UnixMicro()))
+			}
+		case *types.UUIDType:


This is not needed I believe as the default case should take care of that.

I think using the default only works if we use the string representation for UUID (text and not binary(16))

yevgenypats · 2023-04-19T11:07:30Z

plugins/destination/mysql/client/types.go

+	case *arrow.BinaryType, *arrow.LargeBinaryType:
+		return "blob"
+	case *types.UUIDType:
+		return "binary(16)"


Let's use string I think.

There are a few options for UUID:

binary(16) is the most efficient

VARCHAR(36) or VARCHAR(255) which will show nicely in queries

text which is less efficient

The thing is that when we query the DB during migration we can't 100% say that VARCHAR(36) or VARCHAR(255) is UUID unless we try to parse the data.

Assuming binary(16) is UUID is safer (but also not 100%). So the choice for binary(16) was kind of a tradeoff to avoid using text.

We have a similar issue with bool which is an alias for tinyint(1), so we assume tinyint(1) is boolean

yevgenypats · 2023-04-19T11:08:55Z

plugins/destination/mysql/client/types.go

+	case *arrow.BooleanType:
+		return "bool"
+	case *arrow.Int8Type, *arrow.Uint8Type, *arrow.Int16Type, *arrow.Uint16Type, *arrow.Int32Type, *arrow.Uint32Type, *arrow.Int64Type, *arrow.Uint64Type:
+		return "bigint"


why all the same? It has tinyint,smallint,mediumint,int,bigint this would be very useful for db->db replication when the type is important.

Good point, mostly since this is what we had before in the old type system. I'll update it

Split those and the floats in 7eef56a also opened cloudquery/plugin-sdk#805

yevgenypats · 2023-04-19T11:10:07Z

plugins/destination/mysql/client/types.go

+		return "json"
+	case *arrow.StructType:
+		return "json"
+	case *types.InetType, *types.MacType:


I think for this one we can also use the default case so no special handling will be needed in reverse transform as well.

Done in f9a1c30

yevgenypats · 2023-04-19T11:10:59Z

plugins/destination/mysql/client/types.go

+		return "json"
+	case *types.InetType, *types.MacType:
+		return "nvarchar(255)"
+	default:


I would add another case when the type is a List type then we can also use json and then use marshaljson in the transformer for those

Good point JSON for list types is a great solution

Done in f9a1c30 (seems like ValueStr already does JSON marshalling)

yevgenypats

Looks good. had few non blocking questions.

yevgenypats · 2023-04-19T15:40:07Z

plugins/destination/mysql/client/types.go

-		"json":          schema.TypeJSON,
+	sqlTypeToSchemaType := map[string]arrow.DataType{
+		"bool":              arrow.FixedWidthTypes.Boolean,
+		"tinyint(1)":        arrow.FixedWidthTypes.Boolean,


why this one is needed?

This is more a defensive measure I think in case someone is doing MySQL -> MySQL sync and defined bool as tinyint(1) (similar to https://github.com/cloudquery/cloudquery/pull/9143/files). I think we can drop it for now

OK so the correct reason was that we return bool a bit below so when we normalized we need to support bool too.
ef4fdd1 should fix it.

See normalization code here https://github.com/cloudquery/cloudquery/pull/10169/files#diff-7a94fa4d0f7e4c8c80b1ad959ace684a43b44408b3d1d9cee3ea9ef2d87dd9a2R29

yevgenypats · 2023-04-19T15:41:38Z

plugins/destination/mysql/client/types.go

+		"smallint":          arrow.PrimitiveTypes.Int16,
+		"int":               arrow.PrimitiveTypes.Int32,
+		"bigint":            arrow.PrimitiveTypes.Int64,
+		"bigint(20)":        arrow.PrimitiveTypes.Int64,


Why this one is needed?

Fixed in ef4fdd1

erezrokah · 2023-04-19T16:17:59Z

I updated the description with all the breaking changes. I think the int changes should not be visible as we still map schema.TypeInt to arrow.PrimitiveTypes.Int64 so the mapping will be the same.

cq-bot added the mysql label Apr 18, 2023

erezrokah commented Apr 18, 2023

View reviewed changes

plugins/destination/mysql/client/migrate.go Outdated Show resolved Hide resolved

erezrokah commented Apr 18, 2023

View reviewed changes

plugins/destination/mysql/client/types.go Show resolved Hide resolved

erezrokah force-pushed the feat/migration_mysql_sdk_v2 branch 2 times, most recently from def106b to e6342a1 Compare April 19, 2023 08:59

erezrokah marked this pull request as ready for review April 19, 2023 09:02

erezrokah requested review from a team and hermanschaaf and removed request for a team April 19, 2023 09:02

yevgenypats suggested changes Apr 19, 2023

View reviewed changes

erezrokah mentioned this pull request Apr 19, 2023

feat(snowflake)!: Update to SDK V2 (Arrow Support) #10161

Merged

yevgenypats suggested changes Apr 19, 2023

View reviewed changes

erezrokah added 6 commits April 19, 2023 16:03

feat: Update to SDK v2 and Arrow

a003387

refactor: Don't panic, return errors

3b85120

refactor: Cleanup

2afce59

fix: Update to latest SDK, fix normalization

b378ffd

fix: Add missing release

19d8907

fix: Split int, float types

7eef56a

erezrokah force-pushed the feat/migration_mysql_sdk_v2 branch from 0d951c9 to 7eef56a Compare April 19, 2023 13:03

erezrokah added 3 commits April 19, 2023 16:42

fix: Drop custom type for Inet, Mac Address

f9a1c30

fix: Handle lists as JSON types

e6a71d5

refactor: Cleanup

c8b6d1b

yevgenypats changed the title ~~feat: Update to SDK v2 and Arrow~~ feat(mysql): Update to SDK v2 and Arrow Apr 19, 2023

yevgenypats approved these changes Apr 19, 2023

View reviewed changes

fix: Remove types

ef4fdd1

erezrokah added automerge Automatically merge once required checks pass and removed automerge Automatically merge once required checks pass labels Apr 19, 2023

erezrokah added the automerge Automatically merge once required checks pass label Apr 19, 2023

Merge branch 'main' into feat/migration_mysql_sdk_v2

48bb0bd

vercel bot deployed to Preview April 19, 2023 16:22 View deployment

kodiakhq bot merged commit c5d3508 into cloudquery:main Apr 19, 2023

erezrokah deleted the feat/migration_mysql_sdk_v2 branch April 19, 2023 17:05

feat(mysql): Update to SDK v2 and Arrow #10169

feat(mysql): Update to SDK v2 and Arrow #10169

Uh oh!

Conversation

erezrokah commented Apr 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

Uh oh!

Uh oh!

yevgenypats left a comment

Choose a reason for hiding this comment

Uh oh!

erezrokah commented Apr 19, 2023

Uh oh!

yevgenypats left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

erezrokah Apr 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

erezrokah Apr 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

erezrokah Apr 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

erezrokah Apr 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yevgenypats left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

erezrokah Apr 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

erezrokah commented Apr 19, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

erezrokah commented Apr 18, 2023 •

edited

Loading

erezrokah Apr 19, 2023 •

edited

Loading

erezrokah Apr 19, 2023 •

edited

Loading

erezrokah Apr 19, 2023 •

edited

Loading

erezrokah Apr 19, 2023 •

edited

Loading

erezrokah Apr 19, 2023 •

edited

Loading