Support decimal for min and max aggregate#1407
Conversation
alamb
left a comment
There was a problem hiding this comment.
Thank you @liukun4515 -- I think this PR is great as it adds tests for the new feature; I had some code / style suggestions, but I also think it would be fine to address them in another PR (or in the future)
Please just let us know what you want to do
| } | ||
|
|
||
| #[tokio::test] | ||
| async fn aggregate_decimal_max() -> Result<()> { |
| } | ||
|
|
||
| // TODO implement this in arrow-rs with simd | ||
| // https://github.com/apache/arrow-rs/issues/1010 |
There was a problem hiding this comment.
I have not reviewed the code in the datafusion aggregate functions for a while, so I am not familiar with how much they do / don't use the arrow compute kernels, but I think the more we can leverage / reuse those kernels (and their SIMD specializations, if they exist), the better
| } | ||
| ScalarValue::Decimal128(Some(result), *$PRECISION, *$SCALE) | ||
| } else { | ||
| let mut result = 0_i128; |
There was a problem hiding this comment.
It might be more idomatic to use let mut rust: Option<i128> = 0; (aka use an Option rather than explicit flag)
Then instead of code like
if !has_value && array.is_valid(i) {
has_value = true;
result = array.value(i);
}
if array.is_valid(i) {
result = result.$OP(array.value(i));
}You could write code like the following (which I think saves at least one check of is_valid()):
if array.is_valid(i) {
let value = array.value(i);
result = result.$OP(result.unwrap_or(value)
}This is just a style suggestion, it is not needed I don't think
There was a problem hiding this comment.
I plan to resolve it in the follow-up pull request.
You can merge it if it looks good to you.
@alamb
There was a problem hiding this comment.
I recheck your suggestion and find some bugs.
For example, if all value in the array is less than zero, it may be [-1,-3,-3] and we want to get the max value of them.
initially, we set the result as Some(0) and follow your suggestion code, we will get the 0 as the max value for the result.
I think we should just change to that
let mut result = 0_i128; ->>> let mut result : i128 = 0;
In addition the unwrap_or is
pub fn unwrap_or(self, default: T) -> T {
match self {
Some(x) => x,
None => default,
}
}
| ($VALUES:expr, $OP:ident) => {{ | ||
| match $VALUES.data_type() { | ||
| DataType::Decimal(precision, scale) => { | ||
| typed_min_max_batch_decimal128!($VALUES, precision, scale, $OP) |
There was a problem hiding this comment.
I think if you added DecimalArray support to arrow::compute::kernels::min and arrow::compute::kernels::max you might be able to use typed_min_max_batch! here. Work for the future perhaps
There was a problem hiding this comment.
Yes, when I finish the task of aggregating with decimal data type, I will begin to do basic operations in arrow-rs for decimal data type.
| typed_min_max_decimal!(lhsv, rhsv, lhsp, lhss, Decimal128, $OP) | ||
| } else { | ||
| return Err(DataFusionError::Internal(format!( | ||
| "MIN/MAX is not expected to receive scalars of incompatible types {:?}", |
|
@alamb it's can be merged. |
|
Thanks again @liukun4515 ❤️ |
min and max aggregate
Which issue does this PR close?
part of #122
TODO
In the feature
support SIMD in the arrow-rs apache/arrow-rs#1010
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?