feat: Expand murmur3 hash support to complex types#3077
feat: Expand murmur3 hash support to complex types#3077mbutrovich merged 8 commits intoapache:mainfrom
murmur3 hash support to complex types#3077Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3077 +/- ##
============================================
+ Coverage 56.12% 59.50% +3.37%
- Complexity 976 1370 +394
============================================
Files 119 167 +48
Lines 11743 15560 +3817
Branches 2251 2586 +335
============================================
+ Hits 6591 9259 +2668
- Misses 4012 5002 +990
- Partials 1140 1299 +159 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| // Hash each element in sequence, chaining the hash values | ||
| for elem_idx in 0..len { | ||
| let elem_array = values.slice(start + elem_idx, 1); | ||
| let mut single_hash = [*hash]; |
There was a problem hiding this comment.
why is single_hash an array?
There was a problem hiding this comment.
single_hash is an array because the recursive hash method interface expects a slice of hashes and this allows us to reuse that rather than add another version of the code
| * These tests verify that Comet's native implementation of murmur3 hash produces identical | ||
| * results to Spark's implementation for all supported data types. | ||
| */ | ||
| class CometHashExpressionSuite extends CometTestBase with AdaptiveSparkPlanHelper { |
There was a problem hiding this comment.
Should we add this expression to one of the fuzz suites?
There was a problem hiding this comment.
Sure. Are you thinking more about the fuzz data generation aspect, or testing across different scans/shuffles, or both?
There was a problem hiding this comment.
Mostly just interested in the fuzz data generation, seems like a good schema to throw at this.
There was a problem hiding this comment.
yup, that already uncovered a bug 💪
|
mbutrovich
left a comment
There was a problem hiding this comment.
Thanks for adding the benchmark! I have a followup PR to improve performance, I think. Let's go with this for now though. Thanks again @andygrove!
Which issue does this PR close?
Closes #.
Rationale for this change
This is needed for the following features:
What changes are included in this PR?
How are these changes tested?