Fix Array join: Function writeSlice expects same column types#91784
Fix Array join: Function writeSlice expects same column types#91784Ergus wants to merge 5 commits intoClickHouse:masterfrom
Conversation
|
Workflow [PR], commit [2ee20cf] Summary: ❌
|
| // insertRangeFrom casts numeric types if they are different, so it can insert them. | ||
| || (sink.elements.lowCardinality() | ||
| && source_elements.lowCardinality() | ||
| && sink.elements.isNumeric() |
There was a problem hiding this comment.
I don't think it works for different data types, such as Float and Int.
insertRangeFrom only works for identical columns.
There was a problem hiding this comment.
I agree that the condition needs to be improved (maybe restricted to only intXX->intYY numeric types or tuned with some more elaborated check). I added more detailed comments in the original issue thread about the case by case condition relaxation.
In low cardinality columns insertRangeFrom relies on insertPositionsRange.
Low cardinality use a dictionary that automatically resizes when cardinality increases and insertion already checks sizes of types and indices and perform some casting and conversions:
But the initial condition source_elements.structureEquals(sink.elements):
- Will fail for two "compatible" lox cardinality numeric columns (i.e: if one of them has cardinality <256 and the other >256).
- Inserting
char -> intmay be possible with no problem, but the condition doesn't allow that. - The opposite
int -> charshould also work if all the input values are < CHAR_MAX. Otherwise an error may be triggered BUT in the inner casting code in order to trigger the most precise information about the problematic value.
| const NullMap * size_null_map = size_nullable ? &size_nullable->getNullMapData() : nullptr; | ||
| const IColumn * size_nested_column = size_nullable ? &size_nullable->getNestedColumn() : &size_column; | ||
|
|
||
| checkAreLowCardinalityInsertable(array_source, sink); |
There was a problem hiding this comment.
The "generic" implementation should already check that the types are identical, and the calling code should convert to the identical types before going here.
There was a problem hiding this comment.
But this is the generic function: A template that relies on all the writeSlice specializations. That's why the initial fix code was implemented in the specialized writeSlice function.
* src/Functions/GatherUtils/Algorithms.h (writeSlice) : Extend the condition to allow use insertRangeFrom when lowCardinality numeric
types. This is a preliminar fix that works, but maybe this extension is not general enough.
* tests/queries/0_stateless/02447_join_numeric_low_cardinality.{sql,reference}: New tests with previously failing queries.
Fixes: ClickHouse#57243
NOTE: This is intended to be better for performance, BUT decouples the problem source from the check to prevent it. So, I keep the assertion in case the issue triggers in the future again.
New approach for #80376 as recommended by @alexey-milovidov
Fixes #57243 #93343
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Fix an exception with ARRAY JOINs:
Function writeSlice expects same column types for GenericArraySlice and GenericArraySinkwhich happens when lowCardinality numeric types are used (issue #57243).