ARROW-9388: [C++] Division kernels#7748
Conversation
7e9aede to
39f1d7d
Compare
wesm
left a comment
There was a problem hiding this comment.
Thanks for starting this. I'm going to pull this down and make some changes per my comments
|
This can't be merged yet. Divide by zero in the unchecked case causes SIGFPE process crash. We should probably return null when dividing by zero, this is what Impala does I'm curious what other database-type systems do |
|
cc @pitrou |
|
This will wait for 2.0. |
|
But, yes, I'll take a look later. |
|
We have this implemented with SIMD (which was tricky) on the Rust side. We currently return a |
@wesm Thanks a lot for your effort. Your changes look much more reasonable than mine. |
I made some initial investigations, and it shows that some systems return null upon divide by zero (e.g. Impala and Hive (https://stackoverflow.com/questions/41165708/hive-dividing-numbers-from-the-same-column)), while others returns an error (like MySQL (https://www.got-it.ai/solutions/sqlquerychat/sql-help/data-manipulation/mysql-divide-by-zero-error-querychat/) and SQL Server (https://stackoverflow.com/questions/23230041/how-to-protect-sql-statement-from-divide-by-zero-error/23230084)) Anyway, we should handle it, instead of letting the process crash. |
7103e27 to
418e8ac
Compare
nealrichardson
left a comment
There was a problem hiding this comment.
Thanks for doing this. Could you please be sure to add this function to https://github.com/apache/arrow/blob/master/docs/source/cpp/compute.rst?
Done. Thanks for your kind reminder. |
3c2c8d7 to
dbd6c7e
Compare
There was a problem hiding this comment.
| return left / right; | |
| if (ARROW_PREDICT_FALSE(right == 0)) { | |
| ctx->SetStatus(Status::Invalid("divide by zero")); | |
| return 0; | |
| } | |
| return left / right; |
There was a problem hiding this comment.
Thanks for your suggestion.
According to the IEEE 754 standard, dividing by zero should be well-defined for floating point numbers. Please see, for example, https://dl.acm.org/doi/10.1145/103162.103163, page 26.
There was a problem hiding this comment.
This is true, but division by zero can result in SIGFPE if feenableexcept() or a hardware exception if _control87() are used. For example:
g++ -lm -xc++ - <<!
#include <fenv.h>
int main(int argc, char* argv[]) {
feenableexcept(FE_ALL_EXCEPT);
return static_cast<int>(1.0 / 0.0);
}
!
./a.out # abortThese are rare, so maybe we don't need to support them (or at least restrict this check to the checked division kernel), but they do represent a potential crash when dividing by 0.0. @pitrou ?
There was a problem hiding this comment.
I'd say if the user really wants to enable FP exceptions process-wise, we shouldn't try to counteract them.
There was a problem hiding this comment.
Alright, I withdraw this suggestion
docs/source/cpp/compute.rst
Outdated
There was a problem hiding this comment.
Please keep this table in alphabetical order.
There was a problem hiding this comment.
Done. Thanks for your kind reminder.
cpp/src/arrow/compute/api_scalar.h
Outdated
There was a problem hiding this comment.
Can you move the zero divisor mention to the body of the docstring (together with "If either argument is null...")?
There was a problem hiding this comment.
Done. Thank you for the good suggestion.
There was a problem hiding this comment.
Please don't remove these checks, instead write the tests so that they don't require approximate equality checks.
There was a problem hiding this comment.
I have restored the checks.
I still think it makes sense to provide approx equality checks for scalar?
There was a problem hiding this comment.
It's not useful to test so many values. We know the CPU handles division properly. Instead, focus on test cases that may trigger specific behaviour:
- empty array
- nulls
- zeros
- for floating-point: infinities, perhaps not-a-number
There was a problem hiding this comment.
Thank you for the good suggestion.
I have removed some test cases, and made sure the remaining cases cover the above items, with one exception: the inifinity comparisons should depend on another PR: #7826
There was a problem hiding this comment.
#7826 was merged now. Also, the tested values are still plethoric IMHO, which makes the tests less readable.
There was a problem hiding this comment.
Thanks a lot for your effort. I have added test case for infinity.
In addition, I have removed/simplified more test cases, please check if it looks good to you.
There was a problem hiding this comment.
Should also check what happens when the overflow checks are disabled?
There was a problem hiding this comment.
When overflow checks are disabled, we get SIGFP, and the tests crash.
There was a problem hiding this comment.
Hmm, we should definitly not crash on overflow. In this case, I think overflow should be detected and a dummy value be output (0 perhaps?).
There was a problem hiding this comment.
Sounds reasonable. I have updated the PR accordingly.
cc9e3e2 to
0fed34c
Compare
15bd314 to
2b7204d
Compare
|
@pitrou if you're satisfied, could you please merge? |
2b7204d to
88ce81d
Compare
|
I think unchecked floating point division by zero should simply return an infinite, but we can leave that for a separate JIRA. |
@pitrou Thanks for your feedback, and sorry I did not take care of this problem. I will fix it up in the follow up issue. |
As discussed in #7748 (comment), we need to compare scalars approximately in some scenarios. Also: * Fix comparison of same-pointer NaN values * Fix scalar comparison result when both inputs are null (ARROW-8956) * Fix scalar kernel result type when result is null scalar Closes #8438 from liyafan82/fly_1012_scl Authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
See https://issues.apache.org/jira/browse/ARROW-9388