Improve the performance of ltrim/rtrim/btrim#10006
Conversation
|
Nice! It would be a nice addition if the benchmark was expanded to cover btrim and rtrim as well |
|
|
||
| if characters_array.len() == 1 { | ||
| if characters_array.is_null(0) { | ||
| return Ok(new_null_array(args[0].data_type(), args[0].len())); |
There was a problem hiding this comment.
This looks like new behavior for null handling? Do we have existing unit tests for this case or can we add a new test as part of this PR?
There was a problem hiding this comment.
This is not a new behavior. The reason for this logic characters_array.is_null(0) is because initially, a test did not pass, and the error was as follows:
...
External error: query result mismatch:
[SQL] SELECT btrim(' xyxtrimyyx ', NULL)
[Diff] (-expected|+actual)
- NULL
+ xyxtrimyyx
at test_files/expr.slt:373
...
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
error: test failed, to rerun pass `-p datafusion-sqllogictest --test sqllogictests`There was a problem hiding this comment.
To provide additional context, this logic is consistent with the _ => None here:
let result = string_array
.iter()
.zip(characters_array.iter())
.map(|(string, characters)| match (string, characters) {
(Some(string), Some(characters)) => Some(func(string, characters)),
_ => None, // If characters is null, append None.
})
.collect::<GenericStringArray<T>>();
it is not necessary, though it would be nice as @Omega359 said. We can also do it as a follow on PR. Thanks again @JasonLi-cn |
|
Thanks @Omega359 and @andygrove for the reviews! |
Which issue does this PR close?
Closes #10007
Rationale for this change
If the
trimfunction includes a second argument, I believe it is predominantly a Scalar rather than an Array. Expanding the second argument into an Array would lead to performance degradation, and more critically, the codearg.clone().into_array(expansion_len)would be invoked for every computation.Benchmark
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?