Implicit key type conversion for join#18672
Implicit key type conversion for join#18672vdimir wants to merge 8 commits intoClickHouse:masterfrom
Conversation
d6b2e0d to
85664fd
Compare
|
Current test is pretty simple, so I'm going to add more tests. |
85664fd to
edbd7e2
Compare
edbd7e2 to
b705f5b
Compare
| left_sample_block, table_join->keyNamesLeft(), right_sample_block, key_names_right); | ||
|
|
||
| /// Casted columns will be stored in separate columns with unique generated names | ||
| String uuid_string = UUIDHelpers::generateV4().toUnderType().toHexString(); |
There was a problem hiding this comment.
UUIDs in column names can be confusing. Maybe we can use names like cast(UInt8, left_table.key_column)? If you're using the cast function, you should get these names in a natural way.
|
By the way, why do we need any casts at all? The function It would be helpful to have a big comment somewhere describing the general logic of the process. E.g. the above point, the fact that we try to match the behavior of |
| query_analyzer.appendJoinLeftKeys(chain, only_types || !first_stage); | ||
| Block left_block_sample = source_header; | ||
|
|
||
| query_analyzer.appendJoinLeftKeys(chain, only_types || !first_stage, left_block_sample); |
There was a problem hiding this comment.
Did you consider adding the necessary casts to the right table here, using the ExpressionTransform that happens with it before joins? And then using this casted column as the right-side key. Probably you'll need much less special logic in other places then...
There was a problem hiding this comment.
I thought about this but I not sure that it doesn't break types in result table. We need to restore types in joined table. It's clearer to me how to perform conversion inside join implementation without affecting pipeline, but I realize that transform may be better...
b055ca4 to
3abca44
Compare
3abca44 to
923ee66
Compare
47c20c0 to
455c91d
Compare
455c91d to
73f1d24
Compare
| const auto & dst = result_sample_block.getByPosition(result_pos); | ||
|
|
||
| #ifndef NDEBUG | ||
| if (!JoinCommon::typesEqualUpToNullability(src.type, dst.type)) |
There was a problem hiding this comment.
assert is also OK for debug-only checks.
| } | ||
|
|
||
| /// Join keys not casted in some cases: | ||
| /// - for tables with `engine = Join` (in this case left_block hasn't rows) |
There was a problem hiding this comment.
You can use a Join table as a righthand side of join, shouldn't we have normal left blocks in this case?
But I think our variant of type conversion doesn't apply to Join tables anyway -- the hash table is already built, but we convert the righthand values before building the hash table.
| return materialized; | ||
| } | ||
|
|
||
| /// Join keys not casted in some cases: |
There was a problem hiding this comment.
| /// Join keys not casted in some cases: | |
| /// We allow joins not only when the key type is the same on both sides, but | |
| /// also when the right side can be implicitly CAST'ed to the left type. To | |
| /// perform the join with implicit cast, we cast the right-side column to | |
| /// `Nullable(left type)`, and use this column as a join key. We substitute | |
| /// NULLs for non-convertible values. We then perform the join normally | |
| /// using this modified key column, which gets us the expected result for inner | |
| /// and outer joins as well. We cast the right side because it is expected | |
| /// to have less rows. | |
| /// Join keys are not casted in some cases: |
We need to have this high-level description of the algorithm somewhere. Not sure if I wrote it correctly, or where is the best place to put it.
| bool cast_allowed = JoinCommon::isCastJoinKeysAllowed(left_sample_block, right_sample_block, *table_join); | ||
| if (cast_allowed) | ||
| { | ||
| /// Left key columns will be converted to types from right if it isn't match |
There was a problem hiding this comment.
Are we casting left columns for merge join, but right columns for hash join? Maybe we can always cast right, for consistency?
It would be interesting to add a test to see how the runtime switching from hash join to merge join works with casting.
|
I am somewhat concerned by the amount of changes. I imagined it could be done by adding a new virtual column Does this make sense? |
Got it, thanks. |
|
Internal documentation ticket DOCSUP-9868 |
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Close #18567