Skip to content

Use f64::total_cmp instead of OrderedFloat#4133

Merged
tustvold merged 9 commits intoapache:masterfrom
comphead:f64_total_cmp
Nov 10, 2022
Merged

Use f64::total_cmp instead of OrderedFloat#4133
tustvold merged 9 commits intoapache:masterfrom
comphead:f64_total_cmp

Conversation

@comphead
Copy link
Contributor

@comphead comphead commented Nov 7, 2022

Which issue does this PR close?

Closes #4051 .

Rationale for this change

See #4051

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Nov 7, 2022
@comphead
Copy link
Contributor Author

comphead commented Nov 7, 2022

@tustvold I have replaced almost all entries OrderedFloat to f64. Still thinking how to use you hasher to remove OrderedFloat from Hash.
As your trait implement HashValue and ScalarValue requires std::cmp::Hash

@tustvold
Copy link
Contributor

tustvold commented Nov 8, 2022

@comphead I would recommend creating a newtype wrapper around a float that implements Hash using hash_utils, eq using total_cmp, etc...

@comphead
Copy link
Contributor Author

comphead commented Nov 9, 2022

@comphead I would recommend creating a newtype wrapper around a float that implements Hash using hash_utils, eq using total_cmp, etc...

Hi @tustvold I have implemented hasher through std::hash but the impl is the same as in hash_utils. HashValue trait is not the same as Hash, afaik. Let me know if the hasher should be done in other way

let v2 = v2.map(OrderedFloat);
v1.partial_cmp(&v2)
}
(Float32(v1), Float32(v2)) => v1.partial_cmp(v2),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(Float32(v1), Float32(v2)) => v1.partial_cmp(v2),
(Float32(v1), Float32(v2)) => v1.total_cmp(v2),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v1 is an Option<f32>, it supports partial_cmp, not total_cmp. let me know if you ok if I unwrap it to floats, the same way as done for Decimals.

Copy link
Contributor

@tustvold tustvold Nov 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we will need to match the option, I keep forgetting that ScalarValue has typed nulls for some reason 😆

partial_cmp on Option, will call through to partial_cmp on f32, which is not the same as total_cmp

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Yeah, I checked that Float64(NULL) == Float64(NULL) now.

let v = v.map(OrderedFloat);
v.hash(state)
}
Float32(v) => v.map(Fl).hash(state),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can just call HashValue on v?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v is Option<f32> is supports Hash, but we have to wrap f32 into some wrapper supporting hash. Fl in this case, I didn't find how to implement Hash directly on f32/f64

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, I think there is a way to clean this up, but we can do that in a follow on PR

let v1 = v1.map(OrderedFloat);
let v2 = v2.map(OrderedFloat);
v1.eq(&v2)
// Handle NaN == NaN as true manually like in OrderedFloat
Copy link
Contributor

@tustvold tustvold Nov 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be consistent with the hash implementation, this should also use total_cmp. Otherwise two "equal" values according to PartialEq, e.g. +0 and -0, will have different hashes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if

                match (v1, v2) {
                    (Some(f1), Some(f2)) => f1.total_cmp(f2).is_eq(),
                    _ => v1.eq(v2),
                }

.map(f64::from)
.map(|v| OrderedFloat::from(v as f64))
.collect();
let values: Vec<_> = (1..=1_000).map(f64::from).map(|v| v as f64).collect();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let values: Vec<_> = (1..=1_000).map(f64::from).map(|v| v as f64).collect();
let values: Vec<_> = (1..=1_000).map(f64::from).collect();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

for _ in 0..400_000 {
values.push(OrderedFloat::from(1_000_000_f64));
}
let mut values: Vec<_> = (1..=600_000).map(f64::from).map(|v| v as f64).collect();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let mut values: Vec<_> = (1..=600_000).map(f64::from).map(|v| v as f64).collect();
let mut values: Vec<_> = (1..=600_000).map(f64::from).collect();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

.map(f64::from)
.map(|v| OrderedFloat::from(v as f64))
.collect();
let values: Vec<_> = (1..=1_000_000).map(f64::from).map(|v| v as f64).collect();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let values: Vec<_> = (1..=1_000_000).map(f64::from).map(|v| v as f64).collect();
let values: Vec<_> = (1..=1_000_000).map(f64::from).collect();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@comphead comphead marked this pull request as ready for review November 9, 2022 23:47
let v2 = v2.map(OrderedFloat);
v1.partial_cmp(&v2)
}
(Float32(Some(f1)), Float32(Some(f2))) => Some(f1.total_cmp(f2)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will now return None when comparing nulls, which isn't consistent with the other types

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right! Fixed.

@github-actions github-actions bot added the core Core DataFusion crate label Nov 10, 2022
@comphead comphead requested a review from tustvold November 10, 2022 18:23
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you 👍

@tustvold tustvold merged commit 5883e43 into apache:master Nov 10, 2022
@ursabot
Copy link

ursabot commented Nov 10, 2022

Benchmark runs are scheduled for baseline = 509c82c and contender = 5883e43. 5883e43 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use f64::total_cmp instead of OrderedFloat

3 participants