fix: decimal conversion looses value on lower precision by himadripal · Pull Request #6836 · apache/arrow-rs

himadripal · 2024-12-04T17:28:34Z

Which issue does this PR close?

Closes #.

Rationale for this change

Decimal128 to Decimal128 with smaller precision produces incorrect results in some cases.

What changes are included in this PR?

It adds a decimal validation after conversion to check if the converted result can fit into the specified precision and scale

Are there any user-facing changes?

…on overflow.

himadripal · 2024-12-04T17:29:53Z

@andygrove @viirya @alamb please take a look.

arrow-cast/src/cast/decimal.rs

andygrove

LGTM. The logic now matches the logic in cast_floating_point_to_decimal128. Thanks @himadripal

arrow-cast/src/cast/mod.rs

andygrove · 2024-12-04T17:48:56Z

There is a regression in test_cast_decimal128_to_decimal256_negative. The new validation check is correctly throwing an error, but it looks like we also need to add this validation when creating decimal arrays since the current test is creating invalid arrays before the cast:

let array = vec![Some(i128::MAX), Some(i128::MIN)];
let input_decimal_array = create_decimal_array(array, 10, 3).unwrap();

I would expect this to fail, so we probably need to add the same validation there.

andygrove · 2024-12-04T18:02:36Z

There is a regression in test_cast_decimal128_to_decimal256_negative. The new validation check is correctly throwing an error, but it looks like we also need to add this validation when creating decimal arrays since the current test is creating invalid arrays before the cast:
let array = vec![Some(i128::MAX), Some(i128::MIN)];
let input_decimal_array = create_decimal_array(array, 10, 3).unwrap();
I would expect this to fail, so we probably need to add the same validation there.

On second thoughts, it seems it was an intentional design decision not to validate this on array creation. Instead, an array.validate_decimal_precision method can optionally be called on the array to validate it after creation, so we should probably just update the test as needed (@alamb @tustvold perhaps you could correct me if I am wrong about this).

tustvold

Have we run any benchmarks (I'm not sure if any actually exist) to confirm this doesn't significantly regress performance.

It seems unfortunate to be always performing overflow checks, when in many cases it should be possible to prove that precision overflow can't occur and need not be checked for

arrow-cast/src/cast/decimal.rs

andygrove · 2024-12-04T21:11:44Z

Have we run any benchmarks (I'm not sure if any actually exist) to confirm this doesn't significantly regress performance.

It seems unfortunate to be always performing overflow checks, when in many cases it should be possible to prove that precision overflow can't occur and need not be checked for

I'll create a separate PR (probably tomorrow) to add some criterion benchmarks

himadripal · 2024-12-05T01:24:51Z

There is a regression in test_cast_decimal128_to_decimal256_negative. The new validation check is correctly throwing an error, but it looks like we also need to add this validation when creating decimal arrays since the current test is creating invalid arrays before the cast:
let array = vec![Some(i128::MAX), Some(i128::MIN)];
let input_decimal_array = create_decimal_array(array, 10, 3).unwrap();
I would expect this to fail, so we probably need to add the same validation there.
On second thoughts, it seems it was an intentional design decision not to validate this on array creation. Instead, an array.validate_decimal_precision method can optionally be called on the array to validate it after creation, so we should probably just update the test as needed (@alamb @tustvold perhaps you could correct me if I am wrong about this).

Changed the test to pass.

arrow-cast/src/cast/decimal.rs

…eded. revert whitespace changes formatting check

himadripal · 2024-12-06T15:09:52Z

Can anyone please let the build run - workflows waiting for approval.

andygrove · 2024-12-06T20:52:55Z

Have we run any benchmarks (I'm not sure if any actually exist) to confirm this doesn't significantly regress performance.

It seems unfortunate to be always performing overflow checks, when in many cases it should be possible to prove that precision overflow can't occur and need not be checked for

I created a simple benchmark for decimal casting in #6850.

Unsurprisingly, validating that the results are correct is slower than not validating the results.

before

cast_decimal            time:   [45.281 ns 45.549 ns 45.871 ns]

after (this PR)

cast_decimal            time:   [247.97 ns 248.78 ns 249.78 ns]
                        change: [+435.06% +439.47% +443.15%] (p = 0.00 < 0.05)

We currently have the config option of safe on or off:

pub struct CastOptions<'a> {
    /// how to handle cast failures, either return NULL (safe=true) or return ERR (safe=false)
    pub safe: bool,

So, yes, it is a performance regression, but the previous behavior was incorrect. This PR now makes this work as advertised.

@tustvold Is there a use case we need to support for faster casts without validating results per the CastOptions?

tustvold · 2024-12-07T14:40:59Z

@tustvold Is there a use case we need to support for faster casts without validating results per the CastOptions?

No, the cast should be checked, apologies if that wasn't clear, my concern was the PR as originally formulated blindly performed the checked conversion regardless of the input type, even when the cast was increasing the precision. Given the whole purpose of tracking precision is to avoid overflow checks, it seemed a little off.

I'll try to find some time to take another look at this PR as from a quick scan this looks to have been addressed

tustvold

This is still overly pessimistic, for example, if I increase both the precision and scale by the same amount I shouldn't need to perform any checks.

However, the kernel is already fallible (try_unary/unary_opt vs unary) and so this kernel already won't be vectorizing properly, and so it isn't like we're regressing a highly optimised kernel here.

I've therefore instead opted to file #6877 and if people care they can action it.

* decimal conversion looses value on lower precision, throws error now on overflow. * fix review comments and fix formatting. * for simple case of equal scale and bigger precision, no conversion needed. revert whitespace changes formatting check --------- Co-authored-by: himadripal <[email protected]>

* decimal conversion looses value on lower precision, throws error now on overflow. * fix review comments and fix formatting. * for simple case of equal scale and bigger precision, no conversion needed. revert whitespace changes formatting check --------- Co-authored-by: Himadri Pal <[email protected]> Co-authored-by: himadripal <[email protected]>

decimal conversion looses value on lower precision, throws error now …

076cee4

…on overflow.

github-actions bot added the arrow Changes to the arrow crate label Dec 4, 2024

andygrove reviewed Dec 4, 2024

View reviewed changes

arrow-cast/src/cast/decimal.rs Outdated Show resolved Hide resolved

himadripal changed the title ~~decimal conversion looses value on lower precision, throws error now …~~ fix: decimal conversion looses value on lower precision Dec 4, 2024

andygrove approved these changes Dec 4, 2024

View reviewed changes

andygrove reviewed Dec 4, 2024

View reviewed changes

arrow-cast/src/cast/mod.rs Outdated Show resolved Hide resolved

tustvold reviewed Dec 4, 2024

View reviewed changes

arrow-cast/src/cast/decimal.rs Outdated Show resolved Hide resolved

findepi reviewed Dec 4, 2024

View reviewed changes

arrow-cast/src/cast/decimal.rs Outdated Show resolved Hide resolved

fix review comments and fix formatting.

264c29b

viirya reviewed Dec 5, 2024

View reviewed changes

arrow-cast/src/cast/decimal.rs Show resolved Hide resolved

for simple case of equal scale and bigger precision, no conversion ne…

6c50fe3

…eded. revert whitespace changes formatting check

himadripal force-pushed the fix_decimal_conversion_bug branch from 68b0f68 to 6c50fe3 Compare December 5, 2024 17:32

andygrove mentioned this pull request Dec 6, 2024

chore: add cast_decimal benchmark #6850

Merged

tustvold mentioned this pull request Dec 12, 2024

Optimise Decimal Casting #6877

Closed

tustvold approved these changes Dec 12, 2024

View reviewed changes

tustvold merged commit eb7ab83 into apache:main Dec 12, 2024

andygrove mentioned this pull request Dec 13, 2024

Improve decimal casting performance apache/datafusion-comet#1168

Open

alamb mentioned this pull request Mar 20, 2025

Release arrow-rs / parquet minor version 53.4.0 (Jan 2025) apache/arrow-rs-object-store#30

Closed

14 tasks

himadripal mentioned this pull request Dec 27, 2024

Implement Spark-compatible cast between decimals with different precision and scale apache/datafusion-comet#375

Closed

andygrove mentioned this pull request Jan 3, 2025

[53.0.0_maintenance] fix: decimal conversion looses value on lower precision (#6836) #6936

Merged

himadripal mentioned this pull request Jan 4, 2025

minor: fix test and remove println in tests #6935

Merged

alamb mentioned this pull request Feb 10, 2025

[Regression in 54.0.0]. Decimal cast to smaller precision gives invalid (off-by-one) result in some cases #7069

Closed

Comments

Conversation

himadripal commented Dec 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Uh oh!

himadripal commented Dec 4, 2024

Uh oh!

Uh oh!

andygrove left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

andygrove commented Dec 4, 2024

Uh oh!

andygrove commented Dec 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tustvold left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

andygrove commented Dec 4, 2024

Uh oh!

himadripal commented Dec 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

himadripal commented Dec 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andygrove commented Dec 6, 2024

before

after (this PR)

Uh oh!

tustvold commented Dec 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tustvold left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

himadripal commented Dec 4, 2024 •

edited

Loading

andygrove commented Dec 4, 2024 •

edited

Loading

himadripal commented Dec 5, 2024 •

edited

Loading

himadripal commented Dec 6, 2024 •

edited

Loading

tustvold commented Dec 7, 2024 •

edited

Loading