Improve DecimalArray API ergonomics: add iter(), FromIterator, with_precision_and_scale#1223
Conversation
alamb
left a comment
There was a problem hiding this comment.
cc @liukun4515 I coded this one up to try and improve the code for working with DecimalArrays (and I wanted to write some code this weekend rather than just reviewing code!)
| /// builder.append_null().unwrap(); | ||
| /// builder.append_value(-8_887_000_000).unwrap(); | ||
| /// let decimal_array: DecimalArray = builder.finish(); | ||
| /// // Create a DecimalArray with the default precision and scale |
There was a problem hiding this comment.
this is an example of how the new API workrs -- I think it is easier to understand and use
| } | ||
|
|
||
| #[test] | ||
| fn test_decimal_iter() { |
There was a problem hiding this comment.
Also included a basic iter() and into_iter() functions -- while there is room for performance improvement I think the API is solid
Codecov Report
@@ Coverage Diff @@
## master #1223 +/- ##
==========================================
+ Coverage 83.02% 83.04% +0.01%
==========================================
Files 180 180
Lines 52289 52418 +129
==========================================
+ Hits 43413 43529 +116
- Misses 8876 8889 +13
Continue to review full report at Codecov.
|
arrow/src/array/array_binary.rs
Outdated
| /// The default precision and scale used when not specified | ||
| pub fn default_type() -> DataType { | ||
| // Keep maximum precision | ||
| DataType::Decimal(38, 10) |
There was a problem hiding this comment.
we can replace 38 to DECIMAL_MAX_PRECISION, and 10 to OTHER_DEFAULT_VALUE.
| } | ||
|
|
||
| /// The maximum precision for [DataType::Decimal] values | ||
| pub const DECIMAL_MAX_PRECISION: usize = 38; |
There was a problem hiding this comment.
In the datafusion, we use
https://github.com/apache/arrow-datafusion/blob/e92225d6f4660363eec3b1e767a188fccebb7ed9/datafusion/src/scalar.rs#L39 .
Maybe we can replace the const in the datafusion.
If we have the plan to support the decimal256, we may need to use some labels to identify diff decimal type.
There was a problem hiding this comment.
I agree that we should remove the duplication in DataFusion -- when this PR is merged, I can file a ticket in DataFusion to upgrade to the new DecimalAPI and remove the duplication
|
Do you want to refactor some codes like arrow-rs/arrow/src/compute/kernels/cast.rs Line 2075 in 66e029e If you have the plan to refactor this code, I will review this later. LGTM, expect the #1223 (comment) |
I was planning to do that in a follow on PR to keep this PR small and focused on the API changes. So I will plan to make the changes you suggest in this PR, and then prepare a new one that refactors the rest of the code. Thank you for the review. |
|
Update: I have not forgotten about this PR but I have been busy with other tasks this week. If I don't get to it later this week I'll work on it over the weekend |
|
🤔 while working to port some of the code over, it turns out that DecimalBuilder also does value validation on each value (so it is certain that the decimal values are within correct range. I need to think about this 🤔 |
|
back to draft while I think about this a bit |
|
This PR is now ready for (re) review @liukun4515 cc @sweb as you contributed the initial You can see examples of how the API is used in #1247 and #1249 |
| builder: FixedSizeListBuilder<UInt8Builder>, | ||
| } | ||
|
|
||
| pub const MAX_DECIMAL_FOR_EACH_PRECISION: [i128; 38] = [ |
| ))); | ||
| } | ||
| Ok(result) | ||
| validate_decimal_precision(result, precision) |
There was a problem hiding this comment.
I consolidated the bounds checking (so that I could also use it in with_precision_and_scale)
|
FYI @liukun4515 this is ready for review. I know you away for a few days so no worries. I plan to wait for a review prior to merging this PR; If it gets reviewed by tomorrow's cutoff for arrow 9.0.0 I'll include it there, otherwise this can wait for 2 weeks and perhaps be included in arrow 10.0.0 |
I will review it later today. |
| builder: FixedSizeListBuilder<UInt8Builder>, | ||
| } | ||
|
|
||
| pub const MAX_DECIMAL_FOR_EACH_PRECISION: [i128; 38] = [ |
There was a problem hiding this comment.
This change is an API breaks.
There was a problem hiding this comment.
Added 'api-change' label
| let casted_array = cast(&array, &DataType::Decimal(3, 1)); | ||
| assert!(casted_array.is_err()); | ||
| assert_eq!("Invalid argument error: The value of 1000 i128 is not compatible with Decimal(3,1)", casted_array.unwrap_err().to_string()); | ||
| assert_eq!("Invalid argument error: 1000 is too large to store in a Decimal of precision 3. Max is 999", casted_array.unwrap_err().to_string()); |
| assert_eq!(actual, expected); | ||
| } | ||
|
|
||
| #[test] |
liukun4515
left a comment
There was a problem hiding this comment.
LGTM
Sorry for the delayed review.
|
No problem -- thanks for the review @liukun4515 |
|
Thanks again for the help @liukun4515 |
DecimalArray API ergonomics: add iter(), create from iter(), change precision / scaleDecimalArray API ergonomics: add iter(), FromIterator, change precision / scale
DecimalArray API ergonomics: add iter(), FromIterator, change precision / scaleDecimalArray API ergonomics: add iter(), FromIterator, with_precision_and_scale
Which issue does this PR close?
Closes #1009
Closes #1083
Rationale for this change
It is somewhat awkward to create
DecimalArrayas well as iterate through their values.This leads to (repeated) code like
create_decimal_arrayhttps://sourcegraph.com/search?q=context:global+repo:%5Egithub%5C.com/apache/arrow-rs%24+create_decimal_array&patternType=literal(there is similar repetition in datafusion)
There are also more bounds checks than necessary (given
DecimalArray::value()checks on each accessWhat changes are included in this PR?
DecimalArray::fromandDecimalArray::from_iter_values(mirroringPrimitiveArray)DecimalArray::into_iter()andDecimalArray::iter()for iterating through valuesDecimalArray::with_precision_and_scale()for changing the relevant precision and scale, and validate precisionAre there any user-facing changes?
DecimalArraysdatatypeFollow on PRs:
DecimalArraycreation API in parquet crate #1247