Implement Utf8View for lpad scalar function#11941
Conversation
alamb
left a comment
There was a problem hiding this comment.
Very impressive @Omega359 👏 I especially like the testing improvements of this PR
I mentioned some places this function could be made faster, but that can totally be done as a follow on (maybe waiting for someone for whom the performance of lpad is important). I can file a follow on ticket
| s.push_str(string); | ||
| Ok(Some(s)) | ||
| } | ||
| let mut s: String = " ".repeat(length - graphemes.len()); |
There was a problem hiding this comment.
This PR just follows the existing pattern so I don't think any changes are needed, but I think using a GenericStringBuilder would be much faster here (as it wouldn't allocate a string for each output array element)
There was a problem hiding this comment.
I did a test locally with using GenericStringBuilder vs what is in this PR and the differences are within +/- 10% for 1024 and 2048 element arrays for each of the 3 string types. Worth noting though is that timing between runs on my laptop is not really stable at all (+/- 10% or so between runs of the exact same code).
Interestingly the utf8view seems to be consistently slightly slower than just plain utf8 at least in this function.
lpad function/utf8 type/2048
time: [715.78 µs 718.43 µs 721.89 µs]
lpad function/largeutf8 type/2048
time: [745.44 µs 759.62 µs 779.84 µs]
lpad function/stringview type/2048
time: [734.25 µs 751.13 µs 773.67 µs]
| let length = if length < 0 { 0 } else { length as usize }; | ||
| if length == 0 { | ||
| Ok(Some("".to_string())) | ||
| if length < graphemes.len() { |
There was a problem hiding this comment.
likewise here I think GenericStringBuilder would likely be faster
|
I was looking at @Lordworms implementation in #11942 and I think it would make sense to align the two implementations. In some aspects I like his approach much more than the one I took in this PR. |
Do you think we should merge this PR and work on improvements in a follow on PR? |
I have no objections to that. |
|
Thanks @Omega359 -- let's merge this to get the initial implementation and tests in and then we can work on improvements as a follow on PR. Let me know if you would like me to file a ticket for
|
I've got the work done locally in a branch along with the GenericStringBuilder work and the benchmark to go with it. I'll create a ticket tomorrow for it thanks @alamb |
Which issue does this PR close?
Closes #11857
Rationale for this change
Support utf8view for lpad udf.
What changes are included in this PR?
Code, tests.
Are these changes tested?
Yes.
Are there any user-facing changes?
No