Cast Utf8View to Utf8 to support || from StringViewArray#11796
Cast Utf8View to Utf8 to support || from StringViewArray#11796alamb merged 3 commits intoapache:mainfrom
Utf8View to Utf8 to support || from StringViewArray#11796Conversation
Utf8View to Utf8 to support || from StringViewArrayUtf8View to Utf8 to support || from StringViewArray
| (LargeUtf8, from_type) | (from_type, LargeUtf8) => { | ||
| string_concat_internal_coercion(from_type, &LargeUtf8) | ||
| match (lhs_type, rhs_type) { | ||
| // If Utf8View is in any side, we coerce to Utf8. |
There was a problem hiding this comment.
I think it would be better to coerce to Utf8View as that coercsion will often be faster (it is faster to cast Utf8 -> Utf8View than the other way around)
Is that possible?
There was a problem hiding this comment.
Agree, similar to this policy:
datafusion/datafusion/expr/src/type_coercion/binary.rs
Lines 935 to 947 in b50887f
There was a problem hiding this comment.
I think it would be better to coerce to
Utf8Viewas that coercsion will often be faster (it is faster to cast Utf8 -> Utf8View than the other way around)Is that possible?
How about we do in a seperate PR. Previously, we were coerced to Utf8View, so concat was failing. As a temporary workaround to resolve the issue, I've coerce Utf8 instead.
| select column2||' is fast' from temp; | ||
| ---- | ||
| rust is fast | ||
| datafusion is fast |
| explain select column2 || 'is' || column3 from temp; | ||
| ---- | ||
| logical_plan | ||
| 01)Projection: CAST(temp.column2 AS Utf8) || Utf8("is") || CAST(temp.column3 AS Utf8) |
There was a problem hiding this comment.
This would be a better plan (likely faster) if the casting was to Utf8View rather than Utf8
| explain select column2||' is fast' from temp; | ||
| ---- | ||
| logical_plan | ||
| 01)Projection: CAST(temp.column2 AS Utf8) || Utf8(" is fast") |
There was a problem hiding this comment.
likewise here it would be better to use Utf8View
|
Thanks again @dharanad and @XiangpengHao -- sorry for the delay |
Which issue does this PR close?
Partially make #11766 work.
Rationale for this change
What changes are included in this PR?
Utf8ViewtoUtf8make concat workAre these changes tested?
Existing test cases
Are there any user-facing changes?