support cast decimal to signed numeric by liukun4515 · Pull Request #1073 · apache/arrow-rs

liukun4515 · 2021-12-21T04:18:51Z

Which issue does this PR close?

part of #1043

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

…d decimal to signed numeric type support decimal to unsigned numeric

liukun4515 · 2021-12-21T13:40:23Z

@alamb PTAL

codecov-commenter · 2021-12-21T13:51:09Z

Codecov Report

Merging #1073 (070bb40) into master (f3e452c) will increase coverage by 0.03%.
The diff coverage is 98.43%.

@@            Coverage Diff             @@
##           master    #1073      +/-   ##
==========================================
+ Coverage   82.27%   82.31%   +0.03%     
==========================================
  Files         168      168              
  Lines       49281    49376      +95     
==========================================
+ Hits        40547    40643      +96     
+ Misses       8734     8733       -1

Impacted Files	Coverage Δ
arrow/src/compute/kernels/cast.rs	`95.13% <98.43%> (+0.14%)`	⬆️
parquet/src/encodings/rle.rs	`92.70% <0.00%> (ø)`
arrow/src/array/transform/mod.rs	`84.86% <0.00%> (+0.13%)`	⬆️
parquet_derive/src/parquet_field.rs	`66.43% <0.00%> (+0.45%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f3e452c...070bb40. Read the comment docs.

alamb

Thanks @liukun4515 -- looks good

I also recommend we add some Decimal cases to get_all_types in https://github.com/liukun4515/arrow-rs/blob/support_decimal_to_signed_numeric/arrow/src/compute/kernels/cast.rs#L4350

Perhaps as a follow on PR

alamb · 2021-12-21T14:58:11Z

arrow/src/compute/kernels/cast.rs

+                // check the overflow
+                // For example: Decimal(128,10,0) as i8
+                // 128 is out of range i8
+                if v <= max_bound && v >= min_bound {


alamb · 2021-12-21T15:05:29Z

arrow/src/compute/kernels/cast.rs

+        let div: i128 = 10_i128.pow(*$SCALE as u32);
+        let min_bound = ($NATIVE_TYPE::MIN) as i128;
+        let max_bound = ($NATIVE_TYPE::MAX) as i128;
+        for i in 0..array.len() {


FWIW as a performance optimization in the future, we might be able to avoid the use of a builder and create the arrays directly

So something like

let new_array: Int8Array = array.iter()
.map(|v| v.map(|v| {
let v = v / div;
if v <= max_bound && v >= min_bound {
...
}
}).collect()?;

Or something

add follow-up issue to track this enhancement.
#1083

alamb · 2021-12-21T15:09:15Z

arrow/src/compute/kernels/cast.rs

+    }
+
+    // TODO remove this function if the decimal array has the creator function
+    fn create_decimal_array(


Tracked by #1009

I remember this

alamb · 2021-12-21T15:11:16Z

arrow/src/compute/kernels/cast.rs

-    use num::traits::Pow;
+
+    macro_rules! generate_cast_test_case {
+        ($INPUT_ARRAY: expr, $INPUT_ARRAY_TYPE: expr, $OUTPUT_TYPE_ARRAY: ident, $OUTPUT_TYPE: expr, $OUTPUT_VALUES: expr) => {


I think you could avoid having to pass in $INPUT_ARRAY_TYPE by using the data_type() function

like

let input_array_type = $INPUT_ARRAY.data_type()

alamb · 2021-12-21T15:14:27Z

arrow/src/compute/kernels/cast.rs

+            vec![Some(1_i64), Some(2_i64), Some(3_i64), None, Some(5_i64)]
+        );
+
+        // overflow test: out of range of max i8


alamb · 2021-12-21T15:14:51Z

arrow/src/compute/kernels/cast.rs

+            Float32Array,
+            &DataType::Float32,
+            vec![
+                Some(1.25_f32),


alamb · 2021-12-21T15:17:50Z

arrow/src/compute/kernels/cast.rs

+        let div: i128 = 10_i128.pow(*$SCALE as u32);
+        let min_bound = ($NATIVE_TYPE::MIN) as i128;
+        let max_bound = ($NATIVE_TYPE::MAX) as i128;
+        for i in 0..array.len() {


In general I think the use of Builders is not as fast as constructing arrays using FromIter as the bounds are checked on each element

In the following case, if we could structure the code in the following way

let output_array: $OUTPUT_ARRAY_TYPE = array .iter() .map(|v| { v.map(|v| { // scale the value v }) }).collect()

It may be significantly faster.

Perhaps a good follow on PR

Thanks.
I will do this in the follow-up PR.
Maybe I need also to add some micro benchmark to get the result of performance improvement.

alamb · 2021-12-21T15:18:23Z

It appears this PR has a clippy error as well now

liukun4515 · 2021-12-22T06:44:20Z

arrow/src/compute/kernels/cast.rs

            | Dictionary(_, _),
            Null,
        ) => true,
+        (Decimal(_, _), _) => false,


All data types expect above signed numeric data type, the result will be false with decimal.

liukun4515 · 2021-12-22T07:28:17Z

arrow/src/compute/kernels/cast.rs

+            None,
+            Some(525),
+            Some(112345678),
+            Some(112345679),


this case is used to test the float32 excessive precision

liukun4515 · 2021-12-22T07:29:21Z

arrow/src/compute/kernels/cast.rs

+            Some(325),
+            None,
+            Some(525),
+            Some(112345678901234568),


This case is used to test float64 excessive precision

alamb

Thanks again @liukun4515

* add cast test macro function; refactor other type to decimal type; add decimal to signed numeric type support decimal to unsigned numeric * address the comments and fix the clippy

* add cast test macro function; refactor other type to decimal type; add decimal to signed numeric type support decimal to unsigned numeric * address the comments and fix the clippy Co-authored-by: Kun Liu <[email protected]>

liukun4515 · 2021-12-23T02:26:36Z

thanks for your review.
@houqp @alamb

liukun4515 mentioned this pull request Dec 21, 2021

support cast for decimal data type #1043

Closed

8 tasks

github-actions bot added the arrow Changes to the arrow crate label Dec 21, 2021

add cast test macro function; refactor other type to decimal type; ad…

070bb40

…d decimal to signed numeric type support decimal to unsigned numeric

liukun4515 force-pushed the support_decimal_to_signed_numeric branch from d17180f to 070bb40 Compare December 21, 2021 13:38

liukun4515 marked this pull request as ready for review December 21, 2021 13:39

alamb approved these changes Dec 21, 2021

View reviewed changes

liukun4515 mentioned this pull request Dec 22, 2021

Add into iter for decimal array #1083

Closed

address the comments and fix the clippy

d400bdc

liukun4515 requested a review from alamb December 22, 2021 04:20

liukun4515 commented Dec 22, 2021

View reviewed changes

houqp approved these changes Dec 22, 2021

View reviewed changes

houqp added the enhancement Any new improvement worthy of a entry in the changelog label Dec 22, 2021

alamb approved these changes Dec 22, 2021

View reviewed changes

alamb merged commit 8f41a07 into apache:master Dec 22, 2021

alamb added the cherry-picked label Dec 22, 2021

alamb mentioned this pull request Dec 22, 2021

Cherry pick support cast decimal to signed numeric to active_release #1089

Merged

Comments

Conversation

liukun4515 commented Dec 21, 2021

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Uh oh!

liukun4515 commented Dec 21, 2021

Uh oh!

codecov-commenter commented Dec 21, 2021

Codecov Report

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented Dec 21, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

liukun4515 commented Dec 23, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants