chore: Introduce ANSI support for remainder operation#1971
chore: Introduce ANSI support for remainder operation#1971mbutrovich merged 5 commits intoapache:mainfrom
Conversation
| fn evaluate_statistics(&self, children: &[&Distribution]) -> Result<Distribution> { | ||
| let (left, right) = (children[0], children[1]); | ||
|
|
||
| if let (Gaussian(left), Gaussian(right)) = (left, right) { |
There was a problem hiding this comment.
Borrowed these changes from BinaryExpr expression, here.
| (Ok(DataType::Decimal128(p1, s1)), Ok(DataType::Decimal128(p2, s2))) | ||
| if max(s1, s2) as u8 + max(p1 - s1 as u8, p2 - s2 as u8) > DECIMAL128_MAX_PRECISION => | ||
| { | ||
| let left = Arc::new(Cast::new( |
There was a problem hiding this comment.
Borrowed this code from
. We were doing these casting before creating modulo operation.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1971 +/- ##
============================================
+ Coverage 56.12% 58.77% +2.64%
- Complexity 976 1154 +178
============================================
Files 119 134 +15
Lines 11743 13054 +1311
Branches 2251 2417 +166
============================================
+ Hits 6591 7672 +1081
- Misses 4012 4155 +143
- Partials 1140 1227 +87 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Note that vanilla spark message is a bit misleading. It says - Filed minor ticket for spark: https://issues.apache.org/jira/browse/SPARK-52659 |
|
@andygrove : Could you please review and check if this is how you envisioned these changes? |
Thanks @rishvin. I took a first pass through, and this looks great. Thanks for adding comprehensive tests both in Scala and Rust. I do wonder if we should implement the expression as a |
Thanks @andygrove , yes SGTM. Let me rewrite these changes as UDF. |
|
@andygrove I have made the changes, could you please recheck. |
|
@parthchandra @mbutrovich would you also like to review? |
Which issue does this PR close?
Closes #532
Rationale for this change
Spark in ANSI mode can throw division by zero exception for remainder, however, the remainder implementation in Comet is not ANSI compliant and rather than throwing division by zero, returns NULL. This PR introduces remainder that is compliant with Spark behavior, both in ANSI and non-ANSI mode.
Ref doc.
As per the doc,
try_mod: identical to the remainder operator %, except that it returns NULL result instead of throwing an exception on dividing 0.What changes are included in this PR?
ModuloExpr. This expression supports both ANSI and non-ANSI mode (viafail_on_errorflag). In ANSI mode, it translates the Arrow's division by zero error to Spark compliant error message.nullIfWhenPrimitive, when serializing the remainder exception, and implements equivalent functionality inModuloExpr. This method set null values for RHS record batches that have zero values in them. This is required for non-ANSI mode, otherwise Arrow's remainder implementation will throw divide by zero exception.How are these changes tested?
Added rust and scala tests to verify the behavior.
Results from spark-shell,
Without Comet (ANSI/non-ANSI)

With Comet (ANSI/non-ANSI)
