ARROW-9747: [Java][C++] Initial Support for 256-bit Decimals#8475
ARROW-9747: [Java][C++] Initial Support for 256-bit Decimals#8475emkornfield wants to merge 31 commits intoapache:masterfrom
Conversation
|
IMO, Decimal256 is better, as it avoids confusing with |
java/vector/src/main/codegen/templates/AbstractPromotableFieldWriter.java
Outdated
Show resolved
Hide resolved
java/vector/src/main/codegen/templates/AbstractPromotableFieldWriter.java
Outdated
Show resolved
Hide resolved
|
Maybe some CI failures can be fixed by referencing #8455 |
There was a problem hiding this comment.
It should be "No ${uncappedName} vector present ..." ?
There was a problem hiding this comment.
yes, I think so. Nice catch.
There was a problem hiding this comment.
This should be cast to long
There was a problem hiding this comment.
Maybe we need a cast here
Otherwise, it first multiplies two (32-bit) integers, and then promotes it to a long.
If the result of the multiplication overflows, it just promotes the overflown value to a long, which is useless.
838c785 to
c9fa7c2
Compare
emkornfield
left a comment
There was a problem hiding this comment.
@liyafan82 I think I addressed your feedback except for the rename of the new class. Unfortunately I had to force push due to a rebase. I'll tag you when I rename.
There was a problem hiding this comment.
Are we sure the remaining bytes will be zeroed?
There was a problem hiding this comment.
cpp/src/arrow/array/array_test.cc
Outdated
There was a problem hiding this comment.
Do we really want to test every precision between 1 and 76? (note the same comment applies to Decimal128Test above).
I'm concerned about the readability of test output here.
There was a problem hiding this comment.
Something like ::testing::Values(1, 2, 5, 10, 75, 76) would sound sufficient (untested).
There was a problem hiding this comment.
The testing every value between 1 and 38 for decimal 128 appears to be the previous behavior I think these tests are fairly light weight but I'll update for Decimal256
cpp/src/arrow/array/validate.cc
Outdated
There was a problem hiding this comment.
Hmm, we could have a BaseDecimalArray class like we already have BaseListArray and BaseBinaryArray.
There was a problem hiding this comment.
Yes, I tried to make this work, and at the moment making this seems like it would make this change bigger then I would feel comfortable with. There are a lot of type_traits that have confusing hierarchies (is_primitive and is_binary_like both would include Decimal and SFINAE doesn't work out well, so it would be an intrusive change).
There was a problem hiding this comment.
FWIW, https://github.com/apache/arrow/pull/8417/files is probably what some of it would look like but I haven't reviewed it fully.
cpp/src/arrow/c/bridge_test.cc
Outdated
There was a problem hiding this comment.
Can you also add import and/or roundtrip tests?
There was a problem hiding this comment.
done, added additional schema tests and a round trip test.
cpp/src/arrow/type.cc
Outdated
There was a problem hiding this comment.
Shouldn't this be changed to DECIMAL128?
(in general, do a search for DECIMAL in all the C++ code, this may catch some overloooked instances)
There was a problem hiding this comment.
this one should. We have DECIMAL for backwards compatibility, I think the remaining places that it is used are places we will need to update to support Decimal256. By leaving them as DECIMAL we can find them easily with by commenting out the alias. Does this sound reasonable?
There was a problem hiding this comment.
Ok. Maybe we can deprecate the alias at some point?
There was a problem hiding this comment.
yes, that is the intent. I'll be opening up a bunch of JIRA work to track down usages and remove. Partial list so far:
- CSV
- Ruby/Gobj bindings
- Implementation for Parquet
- Finish Python implementation (rescaling is needed)
- Gandiva
- Computation kernels (in particular casts)
Likely some others ...
cpp/src/arrow/type_traits.h
Outdated
cpp/src/arrow/util/basic_decimal.cc
Outdated
There was a problem hiding this comment.
for some reason this is how "make format" wants it to be
There was a problem hiding this comment.
Why only this line? Ideally we would to the same operations as in BinaryMathOp.
There was a problem hiding this comment.
As of right now we've only added support for operator*, I think as we add more operators this benchmark can be expanded to reach parity with the other.
cpp/src/arrow/util/decimal_test.cc
Outdated
There was a problem hiding this comment.
Can we include arrow/util/int128_internal.h instead?
There was a problem hiding this comment.
Ah, I see that we also use int256_t...
438ca49 to
71735a7
Compare
|
Thanks for the reviews @liyafan82 and @pitrou. I rebased an i'll merge when green an open up some follow-up JIRAs. |
|
Mac CI failures seem unrelated. going to merge. |
|
This might have "broken" the spark integration builds: https://github.com/ursa-labs/crossbow/runs/1304128112 (now I am not familiar enough with spark to know what kind of "broken" it is, but in any case the integration build is failing) |
@jorisvandenbossche Thanks for reporting the problem. |
This provides sufficient coverage to support round trip between C++ and Java. There are still some gaps in python. Based on review, I will open JIRAs to track missing functionality (i.e. parquet support in C++). Marking as draft until i can triage CI failures but early feedback is welcome. Open questions I have: [C++] * Should we retain logic in decimal() factory function to adjust type on scale/precision or take an explicit argument or keep it as an alias for decimal128? [Java] * Naming: Would Decimal256 be better then BigDecimal? Closes apache#8475 from emkornfield/decimal256 Lead-authored-by: Mingyu Zhong <[email protected]> Co-authored-by: Micah Kornfield <[email protected]> Co-authored-by: Micah Kornfield <[email protected]> Co-authored-by: emkornfield <[email protected]> Co-authored-by: Ezra <[email protected]> Signed-off-by: Micah Kornfield <[email protected]>
This provides sufficient coverage to support round trip between C++ and Java. There are still some gaps in python. Based on review, I will open JIRAs to track missing functionality (i.e. parquet support in C++). Marking as draft until i can triage CI failures but early feedback is welcome.
Open questions I have:
[C++]
[Java]