Skip to content

Comments

fix: decimal conversion looses value on lower precision#6836

Merged
tustvold merged 3 commits intoapache:mainfrom
himadripal:fix_decimal_conversion_bug
Dec 12, 2024
Merged

fix: decimal conversion looses value on lower precision#6836
tustvold merged 3 commits intoapache:mainfrom
himadripal:fix_decimal_conversion_bug

Conversation

@himadripal
Copy link
Contributor

@himadripal himadripal commented Dec 4, 2024

Which issue does this PR close?

Closes #6833

Closes #.

Rationale for this change

Decimal128 to Decimal128 with smaller precision produces incorrect results in some cases.

What changes are included in this PR?

It adds a decimal validation after conversion to check if the converted result can fit into the specified precision and scale

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Dec 4, 2024
@himadripal
Copy link
Contributor Author

@andygrove @viirya @alamb please take a look.

@himadripal himadripal changed the title decimal conversion looses value on lower precision, throws error now … fix: decimal conversion looses value on lower precision Dec 4, 2024
Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The logic now matches the logic in cast_floating_point_to_decimal128. Thanks @himadripal

@andygrove
Copy link
Member

There is a regression in test_cast_decimal128_to_decimal256_negative. The new validation check is correctly throwing an error, but it looks like we also need to add this validation when creating decimal arrays since the current test is creating invalid arrays before the cast:

let array = vec![Some(i128::MAX), Some(i128::MIN)];
let input_decimal_array = create_decimal_array(array, 10, 3).unwrap();

I would expect this to fail, so we probably need to add the same validation there.

@andygrove
Copy link
Member

andygrove commented Dec 4, 2024

There is a regression in test_cast_decimal128_to_decimal256_negative. The new validation check is correctly throwing an error, but it looks like we also need to add this validation when creating decimal arrays since the current test is creating invalid arrays before the cast:

let array = vec![Some(i128::MAX), Some(i128::MIN)];
let input_decimal_array = create_decimal_array(array, 10, 3).unwrap();

I would expect this to fail, so we probably need to add the same validation there.

On second thoughts, it seems it was an intentional design decision not to validate this on array creation. Instead, an array.validate_decimal_precision method can optionally be called on the array to validate it after creation, so we should probably just update the test as needed (@alamb @tustvold perhaps you could correct me if I am wrong about this).

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we run any benchmarks (I'm not sure if any actually exist) to confirm this doesn't significantly regress performance.

It seems unfortunate to be always performing overflow checks, when in many cases it should be possible to prove that precision overflow can't occur and need not be checked for

@andygrove
Copy link
Member

Have we run any benchmarks (I'm not sure if any actually exist) to confirm this doesn't significantly regress performance.

It seems unfortunate to be always performing overflow checks, when in many cases it should be possible to prove that precision overflow can't occur and need not be checked for

I'll create a separate PR (probably tomorrow) to add some criterion benchmarks

@himadripal
Copy link
Contributor Author

himadripal commented Dec 5, 2024

There is a regression in test_cast_decimal128_to_decimal256_negative. The new validation check is correctly throwing an error, but it looks like we also need to add this validation when creating decimal arrays since the current test is creating invalid arrays before the cast:

let array = vec![Some(i128::MAX), Some(i128::MIN)];
let input_decimal_array = create_decimal_array(array, 10, 3).unwrap();

I would expect this to fail, so we probably need to add the same validation there.

On second thoughts, it seems it was an intentional design decision not to validate this on array creation. Instead, an array.validate_decimal_precision method can optionally be called on the array to validate it after creation, so we should probably just update the test as needed (@alamb @tustvold perhaps you could correct me if I am wrong about this).

Changed the test to pass.

…eded.

revert whitespace changes

formatting check
@himadripal himadripal force-pushed the fix_decimal_conversion_bug branch from 68b0f68 to 6c50fe3 Compare December 5, 2024 17:32
@himadripal
Copy link
Contributor Author

himadripal commented Dec 6, 2024

Can anyone please let the build run - workflows waiting for approval.

@andygrove
Copy link
Member

Have we run any benchmarks (I'm not sure if any actually exist) to confirm this doesn't significantly regress performance.

It seems unfortunate to be always performing overflow checks, when in many cases it should be possible to prove that precision overflow can't occur and need not be checked for

I created a simple benchmark for decimal casting in #6850.

Unsurprisingly, validating that the results are correct is slower than not validating the results.

before

cast_decimal            time:   [45.281 ns 45.549 ns 45.871 ns]

after (this PR)

cast_decimal            time:   [247.97 ns 248.78 ns 249.78 ns]
                        change: [+435.06% +439.47% +443.15%] (p = 0.00 < 0.05)

We currently have the config option of safe on or off:

pub struct CastOptions<'a> {
    /// how to handle cast failures, either return NULL (safe=true) or return ERR (safe=false)
    pub safe: bool,

So, yes, it is a performance regression, but the previous behavior was incorrect. This PR now makes this work as advertised.

@tustvold Is there a use case we need to support for faster casts without validating results per the CastOptions?

@tustvold
Copy link
Contributor

tustvold commented Dec 7, 2024

@tustvold Is there a use case we need to support for faster casts without validating results per the CastOptions?

No, the cast should be checked, apologies if that wasn't clear, my concern was the PR as originally formulated blindly performed the checked conversion regardless of the input type, even when the cast was increasing the precision. Given the whole purpose of tracking precision is to avoid overflow checks, it seemed a little off.

I'll try to find some time to take another look at this PR as from a quick scan this looks to have been addressed

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still overly pessimistic, for example, if I increase both the precision and scale by the same amount I shouldn't need to perform any checks.

However, the kernel is already fallible (try_unary/unary_opt vs unary) and so this kernel already won't be vectorizing properly, and so it isn't like we're regressing a highly optimised kernel here.

I've therefore instead opted to file #6877 and if people care they can action it.

@tustvold tustvold merged commit eb7ab83 into apache:main Dec 12, 2024
andygrove pushed a commit to andygrove/arrow-rs that referenced this pull request Jan 3, 2025
* decimal conversion looses value on lower precision, throws error now on overflow.

* fix review comments and fix formatting.

* for simple case of equal scale and bigger precision, no conversion needed.

revert whitespace changes

formatting check

---------

Co-authored-by: himadripal <[email protected]>
alamb pushed a commit that referenced this pull request Jan 5, 2025
* decimal conversion looses value on lower precision, throws error now on overflow.

* fix review comments and fix formatting.

* for simple case of equal scale and bigger precision, no conversion needed.

revert whitespace changes

formatting check

---------

Co-authored-by: Himadri Pal <[email protected]>
Co-authored-by: himadripal <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Casting Decimal128 to Decimal128 with smaller precision produces incorrect results in some cases

5 participants