Rewrite `Decimal` and `DecimalArray` using `const_generic` by HaoYang670 · Pull Request #2383 · apache/arrow-rs

HaoYang670 · 2022-08-09T05:29:33Z

Signed-off-by: remzi [email protected]

Which issue does this PR close?

This is a subtask of #2384 .

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

From developer's view: Yes
From user's view: No.

Signed-off-by: remzi <[email protected]>

HaoYang670 · 2022-08-09T06:24:14Z

Seems like there are some unrelated errors in cargo doc.
Filed #2385 to track this.

Signed-off-by: remzi <[email protected]>

viirya · 2022-08-09T06:37:56Z

arrow/src/array/array_decimal.rs

            data.buffers().len(),
            1,
-            "Decimal128Array data should contain 1 buffer only (values)"
+            "DecimalArray data should contain 1 buffer only (values)"


We may show Decimal128Array or Decimal256Array based on byte width.

viirya · 2022-08-09T06:38:50Z

arrow/src/array/array_decimal.rs

 mod private_decimal {
    pub trait DecimalArrayPrivate {
        fn raw_value_data_ptr(&self) -> *const u8;
    }
 }


Do we still need this trait?

viirya · 2022-08-09T06:40:46Z

arrow/src/array/array_decimal.rs

-    const DEFAULT_TYPE: DataType;
-    const MAX_PRECISION: usize;
-    const MAX_SCALE: usize;
+pub struct BasicDecimalArray<const BYTE_WIDTH: usize> {


We still have no idea how to constrain byte width, right?

We could also seal the byte_width in this way:
https://users.rust-lang.org/t/how-to-seal-the-const-generic/77947/2

viirya · 2022-08-09T06:42:07Z

arrow/src/array/array_decimal.rs

+    pub const DEFAULT_TYPE: DataType = BasicDecimal::<BYTE_WIDTH>::DEFAULT_TYPE;
+    pub const MAX_PRECISION: usize = BasicDecimal::<BYTE_WIDTH>::MAX_PRECISION;
+    pub const MAX_SCALE: usize = BasicDecimal::<BYTE_WIDTH>::MAX_SCALE;
+    pub const TYPE_CONSTRUCTOR: fn(usize, usize) -> DataType =


Can TYPE_CONSTRUCTOR be non-public?

HaoYang670 · 2022-08-09T06:43:46Z

arrow/src/util/decimal.rs

+            DataType::Decimal256,
+            DataType::Decimal256(DECIMAL256_MAX_PRECISION, DECIMAL_DEFAULT_SCALE),
+        ),
+        _ => panic!("invalid byte width"),


We could constrain the byte width here. When compile the constant items, the compiler will give error if byte width != 16 or 32.
@viirya

Looks okay.

Maybe mention what is valid byte width in the message.

tustvold

Really cool, was going to do something similar so awesome you've already done it. Will review in more detail later today

arrow/src/util/decimal.rs

viirya · 2022-08-09T06:48:13Z

arrow/src/util/decimal.rs

+
+impl<const BYTE_WIDTH: usize> BasicDecimal<BYTE_WIDTH> {
+    #[allow(clippy::type_complexity)]
+    const _MAX_PRECISION_SCALE_CONSTRUCTOR_DEFAULT_TYPE: (


As this is not pub, I think MAX_PRECISION_SCALE_CONSTRUCTOR_DEFAULT_TYPE might be okay. A _ prefix seems redundant.

viirya

I have a few comments as above. Otherwise it looks good to me. Thanks for the refactoring.

HaoYang670 · 2022-08-09T08:07:39Z

I assume that this PR will not introduce performance regression. But we need more test because weird things often happen.

liukun4515 · 2022-08-09T08:13:45Z

I assume that this PR will not introduce performance regression. But we need more test because weird things often happen.

Thanks for your concern about performance, that is why I submit these two pr about decimal optimization. #2360 #2357

In our service, there are many columns with decimal data type.

tustvold

Looks good, will run benchmarks to double check

tustvold · 2022-08-09T08:45:28Z

arrow/src/array/array_decimal.rs

+impl Decimal128Array {
+    /// Creates a [Decimal128Array] with default precision and scale,
+    /// based on an iterator of `i128` values without nulls
+    pub fn from_iter_values<I: IntoIterator<Item = i128>>(iter: I) -> Self {


For the record this method is unsound, but was unsound before. There is a broader issue here

Signed-off-by: remzi <[email protected]>

tustvold · 2022-08-09T10:35:46Z

So we don't actually have any decimal benchmarks that I can find... Created #2388

tustvold · 2022-08-09T10:38:42Z

I'm going to get this in as I think it is a valuable cleanup, and we can continue to iterate on performance in subsequent PRs

ursabot · 2022-08-09T10:42:17Z

Benchmark runs are scheduled for baseline = 56f7904 and contender = 77c814c. 77c814c is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

HaoYang670 · 2022-08-09T10:46:04Z

Thank you for your review @viirya @tustvold @liukun4515

alamb · 2022-08-09T11:44:37Z

It wasn't 100% clear to me -- but @liukun4515 this PR may have improved decimal validation performance. Perhaps you can run your benchmarks again now to see if things have improved.

viirya · 2022-08-09T16:25:41Z

I guess one point from @tustvold is that using fixed length slices when constructing decimals would help the compiler elide bounds checks, the performance gain might come from it.

viirya · 2022-08-09T16:27:03Z

Opened a minor PR #2389 to clean up some comments not addressed yet.

HaoYang670 · 2022-08-10T01:12:37Z

I guess one point from @tustvold is that using fixed length slices when constructing decimals would help the compiler elide bounds checks, the performance gain might come from it.

Yes, I found we have discussion about bound checking in #2360.

const generic decimal

2826d3f

Signed-off-by: remzi <[email protected]>

github-actions bot added arrow Changes to the arrow crate parquet Changes to the parquet crate labels Aug 9, 2022

HaoYang670 changed the title ~~Rewrite Decimal and DecimalArray using const_generic~~ Rewrite Decimal and DecimalArray using const_generic Aug 9, 2022

HaoYang670 mentioned this pull request Aug 9, 2022

Rewrite Decimal Array using const_generic #2384

Closed

3 tasks

fix docs and lint

5980b9a

Signed-off-by: remzi <[email protected]>

viirya reviewed Aug 9, 2022

View reviewed changes

HaoYang670 commented Aug 9, 2022

View reviewed changes

tustvold reviewed Aug 9, 2022

View reviewed changes

arrow/src/util/decimal.rs Outdated Show resolved Hide resolved

viirya reviewed Aug 9, 2022

View reviewed changes

arrow/src/util/decimal.rs Show resolved Hide resolved

viirya reviewed Aug 9, 2022

View reviewed changes

viirya approved these changes Aug 9, 2022

View reviewed changes

tustvold approved these changes Aug 9, 2022

View reviewed changes

tustvold reviewed Aug 9, 2022

View reviewed changes

add bound

cb7d701

Signed-off-by: remzi <[email protected]>

tustvold merged commit 77c814c into apache:master Aug 9, 2022

HaoYang670 deleted the const_generic_decimal branch August 9, 2022 10:46

alamb mentioned this pull request Aug 9, 2022

optimize decimal: reduce validation when construct the decimal array or cast to the decimal array #2313

Closed

4 tasks

HaoYang670 mentioned this pull request Aug 17, 2022

Seal the decimal type. #2439

Closed

Conversation

HaoYang670 commented Aug 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Uh oh!

HaoYang670 commented Aug 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tustvold left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

viirya left a comment

Choose a reason for hiding this comment

Uh oh!

HaoYang670 commented Aug 9, 2022

Uh oh!

liukun4515 commented Aug 9, 2022

Uh oh!

tustvold left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tustvold commented Aug 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tustvold commented Aug 9, 2022

Uh oh!

ursabot commented Aug 9, 2022

Uh oh!

HaoYang670 commented Aug 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb commented Aug 9, 2022

Uh oh!

viirya commented Aug 9, 2022

Uh oh!

viirya commented Aug 9, 2022

Uh oh!

HaoYang670 commented Aug 10, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

HaoYang670 commented Aug 9, 2022 •

edited

Loading

HaoYang670 commented Aug 9, 2022 •

edited

Loading

tustvold commented Aug 9, 2022 •

edited

Loading

HaoYang670 commented Aug 9, 2022 •

edited

Loading