-
Notifications
You must be signed in to change notification settings - Fork 547
fix!: Split max index length between PKs #13569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| COLUMN_TYPE, | ||
| IS_NULLABLE, | ||
| constraint_type, | ||
| sub_part |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't understand why we need sub_part. Please comment if you know why
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Saw that comment and still don't understand the reason. It is because we want the migration to fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I presume so.
I think @bbernays is the one who can answer this question precisely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, without it migrations wouldn't catch (and then update) the subpart for existing DBs. Not sure if this was the best tradeoff (as I agree that it inhibits the DB->DB) use case...
But by shortening the subpart, we are risking losing data in cases when the beginning of PKs are the same...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those won't really help source DB -> MySQL users
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will help all but mysql-mysql right? Which in that case we would not want to alter the length of PKs at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could force users to use pk_mode: cq-id-only and deterministic_cq_id, but I'm not sure that's the best default.
For example in Azure we hardly ever has more than 1 PK (it's always the ID), same with GCP and Name and AWS and arn, and so on.
Most tables do have a single field that acts as a PK, and in that case we would use the maximum possible length for the key with this PR. Without this fix we always use 191 which is not the maximum available value.
And it not just about mysql-mysql, it's also Postgres->MySQL and OracleDB->MySql, and maybe Firestore->MySQL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, with this fix we won't be shortening the prefix for tables, only making it longer. If the tables exists we would skip it (we don't detect the prefix change) and if it doesn't exist we create it.
It can't exist with a longer prefix, since that's exactly the thing that was preventing it from getting created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically we could detect the prefix change, but I believe that would be quite error prone
| primaryKey = strings.Contains(*constraintType, "PRIMARY KEY") | ||
| } | ||
| // subpart only non nil for pks on blob/text columns | ||
| if subpart != nil && *subpart != maxPrefixLength { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why we disable the PK for columns that has a subpart but it doesn't match our own number
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could create issues in a source DB -> dest MySQL sync, for example MySQL -> MySQL
02ad33a to
2080191
Compare
2080191 to
4e63367
Compare
| if len(primaryKeysIndices) > 0 { | ||
| builder.WriteString(",\n ") | ||
| builder.WriteString(" PRIMARY KEY (") | ||
| lengthPerPk := c.maxIndexLength / len(primaryKeysIndices) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few things:
- This makes the assumption that all keys in the composite PK have the same length... So in cases where the PK is something like
req_region,req_account_idandarnthe first 2 columns will be vastly over provisioned as we know an account_id is always 12 characters. - Don't we need to factor in the type of each column in the PK? If one column is an int, doesn't that take up a lot less space than an text/blob that requires the subpart?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Yes, but we don't have any way of knowing that in advance unless the source adds a column length in bytes to each column. Please note that currently we are still provisioning the same length for each key which is
191, so this PR doesn't make it worse, only better. -
Yes it's not 100% yet, I commented on it in the PR description, see fix!: Split max index length between PKs #13569 (comment). Please note this probably only impacts tables with many PKs, so this optimization might not be needed yet.
|
Going to merge and release this next week |
🤖 I have created a release *beep* *boop* --- ## [4.0.0](plugins-destination-mysql-v3.0.5...plugins-destination-mysql-v4.0.0) (2023-09-04) ### ⚠ BREAKING CHANGES * Split max index length between PKs ([#13569](#13569)) ### Bug Fixes * **deps:** Update github.com/99designs/go-keychain digest to 9cf53c8 ([#13561](#13561)) ([a170256](a170256)) * **deps:** Update github.com/cloudquery/arrow/go/v14 digest to cd3d411 ([#13598](#13598)) ([f22bfa6](f22bfa6)) * Split max index length between PKs ([#13569](#13569)) ([b0ad9d4](b0ad9d4)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Summary
Fixes #13486. The max length of an index is either
3072bytes or767bytes.Since we use
utf8mb4which can use up to 4 bytes per char, we max out at either768or191chars (the number in bytes divided by4).The current code allocates
191chars per PK which is OK if a table has 4 PKs since191*4= 764, but fails if a table has more than that. See list of tables hereThe solution is to get the max number of bytes based on the row format, and evenly distribute the max length between PKs.
This is still not the best solution as we assign the same number of bytes to a UUID and a
textcolumn (for example CQId isUUID), but I don't think it's very common to have one UUID PK and anothertextPK and enough other PKs so the prefix is too short.