fix!: Split max index length between PKs #13569

erezrokah · 2023-09-01T09:23:24Z

Summary

Fixes #13486. The max length of an index is either 3072 bytes or 767 bytes.
Since we use utf8mb4 which can use up to 4 bytes per char, we max out at either 768 or 191 chars (the number in bytes divided by 4).
The current code allocates 191 chars per PK which is OK if a table has 4 PKs since 191*4= 764, but fails if a table has more than that. See list of tables here

The solution is to get the max number of bytes based on the row format, and evenly distribute the max length between PKs.
This is still not the best solution as we assign the same number of bytes to a UUID and a text column (for example CQId is UUID), but I don't think it's very common to have one UUID PK and another text PK and enough other PKs so the prefix is too short.

erezrokah · 2023-09-01T09:24:20Z

plugins/destination/mysql/client/schema.go

 COLUMN_TYPE,
 IS_NULLABLE,
-constraint_type,
-sub_part


I couldn't understand why we need sub_part. Please comment if you know why

#11982 (comment)

Saw that comment and still don't understand the reason. It is because we want the migration to fail?

I presume so.
I think @bbernays is the one who can answer this question precisely.

Yes, without it migrations wouldn't catch (and then update) the subpart for existing DBs. Not sure if this was the best tradeoff (as I agree that it inhibits the DB->DB) use case...

But by shortening the subpart, we are risking losing data in cases when the beginning of PKs are the same...

Those won't really help source DB -> MySQL users

It will help all but mysql-mysql right? Which in that case we would not want to alter the length of PKs at all

We could force users to use pk_mode: cq-id-only and deterministic_cq_id, but I'm not sure that's the best default.
For example in Azure we hardly ever has more than 1 PK (it's always the ID), same with GCP and Name and AWS and arn, and so on.

Most tables do have a single field that acts as a PK, and in that case we would use the maximum possible length for the key with this PR. Without this fix we always use 191 which is not the maximum available value.

And it not just about mysql-mysql, it's also Postgres->MySQL and OracleDB->MySql, and maybe Firestore->MySQL.

Also, with this fix we won't be shortening the prefix for tables, only making it longer. If the tables exists we would skip it (we don't detect the prefix change) and if it doesn't exist we create it.
It can't exist with a longer prefix, since that's exactly the thing that was preventing it from getting created.

Technically we could detect the prefix change, but I believe that would be quite error prone

erezrokah · 2023-09-01T09:24:57Z

plugins/destination/mysql/client/schema.go

 			primaryKey = strings.Contains(*constraintType, "PRIMARY KEY")
 		}
-		// subpart only non nil for pks on blob/text columns
-		if subpart != nil && *subpart != maxPrefixLength {


Not sure why we disable the PK for columns that has a subpart but it doesn't match our own number

I think this could create issues in a source DB -> dest MySQL sync, for example MySQL -> MySQL

bbernays · 2023-09-01T14:56:57Z

plugins/destination/mysql/client/schema.go

 	if len(primaryKeysIndices) > 0 {
 		builder.WriteString(",\n  ")
 		builder.WriteString(" PRIMARY KEY (")
+		lengthPerPk := c.maxIndexLength / len(primaryKeysIndices)


A few things:

This makes the assumption that all keys in the composite PK have the same length... So in cases where the PK is something like req_region, req_account_id and arn the first 2 columns will be vastly over provisioned as we know an account_id is always 12 characters.

Don't we need to factor in the type of each column in the PK? If one column is an int, doesn't that take up a lot less space than an text/blob that requires the subpart?

Yes, but we don't have any way of knowing that in advance unless the source adds a column length in bytes to each column. Please note that currently we are still provisioning the same length for each key which is 191, so this PR doesn't make it worse, only better.

Yes it's not 100% yet, I commented on it in the PR description, see fix!: Split max index length between PKs #13569 (comment). Please note this probably only impacts tables with many PKs, so this optimization might not be needed yet.

erezrokah · 2023-09-01T15:58:42Z

Going to merge and release this next week

🤖 I have created a release *beep* *boop* --- ## [4.0.0](plugins-destination-mysql-v3.0.5...plugins-destination-mysql-v4.0.0) (2023-09-04) ### ⚠ BREAKING CHANGES * Split max index length between PKs ([#13569](#13569)) ### Bug Fixes * **deps:** Update github.com/99designs/go-keychain digest to 9cf53c8 ([#13561](#13561)) ([a170256](a170256)) * **deps:** Update github.com/cloudquery/arrow/go/v14 digest to cd3d411 ([#13598](#13598)) ([f22bfa6](f22bfa6)) * Split max index length between PKs ([#13569](#13569)) ([b0ad9d4](b0ad9d4)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).

erezrokah requested review from a team, bbernays and mnorbury and removed request for a team and mnorbury September 1, 2023 09:23

cq-bot added the mysql label Sep 1, 2023

erezrokah commented Sep 1, 2023

View reviewed changes

erezrokah force-pushed the fix/better_pk_prefix branch from 02ad33a to 2080191 Compare September 1, 2023 09:32

fix: Split max index length between PKs

4e63367

erezrokah force-pushed the fix/better_pk_prefix branch from 2080191 to 4e63367 Compare September 1, 2023 09:34

bbernays reviewed Sep 1, 2023

View reviewed changes

bbernays approved these changes Sep 1, 2023

View reviewed changes

erezrokah added the automerge Automatically merge once required checks pass label Sep 4, 2023

Merge branch 'main' into fix/better_pk_prefix

ca4d5e2

vercel bot deployed to Preview September 4, 2023 07:10 View deployment

kodiakhq bot merged commit b0ad9d4 into cloudquery:main Sep 4, 2023

cq-bot mentioned this pull request Sep 4, 2023

chore(main): Release plugins-destination-mysql v4.0.0 #13579

Merged

erezrokah mentioned this pull request Oct 16, 2023

feat: Allow destinations to declare and act on schema limitations #14587

Closed

2 tasks

fix!: Split max index length between PKs #13569

fix!: Split max index length between PKs #13569

Uh oh!

Conversation

erezrokah commented Sep 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

erezrokah Sep 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

erezrokah commented Sep 1, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

erezrokah commented Sep 1, 2023 •

edited

Loading

erezrokah Sep 1, 2023 •

edited

Loading