Use f64::total_cmp instead of OrderedFloat by comphead · Pull Request #4133 · apache/datafusion

comphead · 2022-11-07T23:06:17Z

Which issue does this PR close?

Closes #4051 .

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

comphead · 2022-11-07T23:07:46Z

@tustvold I have replaced almost all entries OrderedFloat to f64. Still thinking how to use you hasher to remove OrderedFloat from Hash.
As your trait implement HashValue and ScalarValue requires std::cmp::Hash

tustvold · 2022-11-08T01:22:23Z

@comphead I would recommend creating a newtype wrapper around a float that implements Hash using hash_utils, eq using total_cmp, etc...

comphead · 2022-11-09T01:13:38Z

@comphead I would recommend creating a newtype wrapper around a float that implements Hash using hash_utils, eq using total_cmp, etc...

Hi @tustvold I have implemented hasher through std::hash but the impl is the same as in hash_utils. HashValue trait is not the same as Hash, afaik. Let me know if the hasher should be done in other way

tustvold · 2022-11-09T05:17:10Z

datafusion/common/src/scalar.rs

-                let v2 = v2.map(OrderedFloat);
-                v1.partial_cmp(&v2)
-            }
+            (Float32(v1), Float32(v2)) => v1.partial_cmp(v2),


Suggested change

(Float32(v1), Float32(v2)) => v1.partial_cmp(v2),

(Float32(v1), Float32(v2)) => v1.total_cmp(v2),

v1 is an Option<f32>, it supports partial_cmp, not total_cmp. let me know if you ok if I unwrap it to floats, the same way as done for Decimals.

Yes, we will need to match the option, I keep forgetting that ScalarValue has typed nulls for some reason 😆

partial_cmp on Option, will call through to partial_cmp on f32, which is not the same as total_cmp

Done. Yeah, I checked that Float64(NULL) == Float64(NULL) now.

tustvold · 2022-11-09T05:18:45Z

datafusion/common/src/scalar.rs

-                let v = v.map(OrderedFloat);
-                v.hash(state)
-            }
+            Float32(v) => v.map(Fl).hash(state),


I think this can just call HashValue on v?

v is Option<f32> is supports Hash, but we have to wrap f32 into some wrapper supporting hash. Fl in this case, I didn't find how to implement Hash directly on f32/f64

Fair, I think there is a way to clean this up, but we can do that in a follow on PR

tustvold · 2022-11-09T05:19:31Z

datafusion/common/src/scalar.rs

-                let v1 = v1.map(OrderedFloat);
-                let v2 = v2.map(OrderedFloat);
-                v1.eq(&v2)
+                // Handle NaN == NaN as true manually like in OrderedFloat


To be consistent with the hash implementation, this should also use total_cmp. Otherwise two "equal" values according to PartialEq, e.g. +0 and -0, will have different hashes

What if

match (v1, v2) { (Some(f1), Some(f2)) => f1.total_cmp(f2).is_eq(), _ => v1.eq(v2), }

tustvold · 2022-11-09T05:20:43Z

datafusion/physical-expr/src/aggregate/tdigest.rs

-                .map(f64::from)
-                .map(|v| OrderedFloat::from(v as f64))
-                .collect();
+            let values: Vec<_> = (1..=1_000).map(f64::from).map(|v| v as f64).collect();


Suggested change

let values: Vec<_> = (1..=1_000).map(f64::from).map(|v| v as f64).collect();

let values: Vec<_> = (1..=1_000).map(f64::from).collect();

tustvold · 2022-11-09T05:20:53Z

datafusion/physical-expr/src/aggregate/tdigest.rs

-        for _ in 0..400_000 {
-            values.push(OrderedFloat::from(1_000_000_f64));
-        }
+        let mut values: Vec<_> = (1..=600_000).map(f64::from).map(|v| v as f64).collect();


Suggested change

let mut values: Vec<_> = (1..=600_000).map(f64::from).map(|v| v as f64).collect();

let mut values: Vec<_> = (1..=600_000).map(f64::from).collect();

tustvold · 2022-11-09T05:21:01Z

datafusion/physical-expr/src/aggregate/tdigest.rs

-            .map(f64::from)
-            .map(|v| OrderedFloat::from(v as f64))
-            .collect();
+        let values: Vec<_> = (1..=1_000_000).map(f64::from).map(|v| v as f64).collect();


Suggested change

let values: Vec<_> = (1..=1_000_000).map(f64::from).map(|v| v as f64).collect();

let values: Vec<_> = (1..=1_000_000).map(f64::from).collect();

datafusion/physical-expr/src/aggregate/tdigest.rs

datafusion/common/src/scalar.rs

tustvold · 2022-11-10T01:11:42Z

datafusion/common/src/scalar.rs

-                let v2 = v2.map(OrderedFloat);
-                v1.partial_cmp(&v2)
-            }
+            (Float32(Some(f1)), Float32(Some(f2))) => Some(f1.total_cmp(f2)),


I think this will now return None when comparing nulls, which isn't consistent with the other types

Right! Fixed.

tustvold

Thank you 👍

ursabot · 2022-11-10T19:43:03Z

Benchmark runs are scheduled for baseline = 509c82c and contender = 5883e43. 5883e43 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

comphead added 2 commits November 7, 2022 14:34

Replace OrderedFloat with f64

a8ae8bd

clippy

387e904

github-actions bot added the physical-expr Changes to the physical-expr crates label Nov 7, 2022

Adding hasher

d98e66f

tustvold reviewed Nov 9, 2022

View reviewed changes

datafusion/physical-expr/src/aggregate/tdigest.rs Outdated Show resolved Hide resolved

tustvold reviewed Nov 9, 2022

View reviewed changes

datafusion/common/src/scalar.rs Outdated Show resolved Hide resolved

tustvold reviewed Nov 9, 2022

View reviewed changes

datafusion/common/src/scalar.rs Outdated Show resolved Hide resolved

fixed comments

4362c52

tustvold mentioned this pull request Nov 9, 2022

Add compare to ArrowNativeTypeOp apache/arrow-rs#3070

Merged

comphead added 2 commits November 9, 2022 15:31

fixed comments

9060099

fmt

3fef43b

comphead marked this pull request as ready for review November 9, 2022 23:47

tustvold reviewed Nov 10, 2022

View reviewed changes

comphead added 2 commits November 9, 2022 21:08

comments fixed

0bab5f4

removed ordered_flost from toml

d9872a0

github-actions bot added the core Core DataFusion crate label Nov 10, 2022

changed cargo.lock

cc9cd52

comphead requested a review from tustvold November 10, 2022 18:23

tustvold approved these changes Nov 10, 2022

View reviewed changes

tustvold merged commit 5883e43 into apache:master Nov 10, 2022

	(Float32(v1), Float32(v2)) => v1.partial_cmp(v2),
	(Float32(v1), Float32(v2)) => v1.total_cmp(v2),

	let values: Vec<_> = (1..=1_000).map(f64::from).map(\|v\| v as f64).collect();
	let values: Vec<_> = (1..=1_000).map(f64::from).collect();

	let mut values: Vec<_> = (1..=600_000).map(f64::from).map(\|v\| v as f64).collect();
	let mut values: Vec<_> = (1..=600_000).map(f64::from).collect();

	let values: Vec<_> = (1..=1_000_000).map(f64::from).map(\|v\| v as f64).collect();
	let values: Vec<_> = (1..=1_000_000).map(f64::from).collect();

Conversation

comphead commented Nov 7, 2022

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Uh oh!

comphead commented Nov 7, 2022

Uh oh!

tustvold commented Nov 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

comphead commented Nov 9, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tustvold Nov 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tustvold Nov 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tustvold left a comment

Choose a reason for hiding this comment

Uh oh!

ursabot commented Nov 10, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tustvold commented Nov 8, 2022 •

edited

Loading

tustvold Nov 9, 2022 •

edited

Loading

tustvold Nov 9, 2022 •

edited

Loading