ARROW-11673 - [C++] Casting dictionary type to use different index type#10721
ARROW-11673 - [C++] Casting dictionary type to use different index type#10721nirandaperera wants to merge 14 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
Sorry - now that I look at the JIRA again, it seems having unsafe casts isn't useful for this case? What do you think? If it produces invalid output (presumably, negative and out of bounds indices?), then it seems kind of pointless.
There was a problem hiding this comment.
I was also thinking about this. OTOH I was thinking whether the ValidataOutput should check the validity of dictionary array in DictionaryType. 🤔
There was a problem hiding this comment.
It looks like it does - are you saying it shouldn't?
There was a problem hiding this comment.
I dont think ValidateOutput checks the validity of ArrayData::dictionary field. That's why ValidateOutput passes for unsafe valies AFAIU. :thin
There was a problem hiding this comment.
It looks like it does:
arrow/cpp/src/arrow/array/validate.cc
Lines 565 to 572 in dbeed52
There was a problem hiding this comment.
I think there's no way a primitive array to be validated once the casting is complete (unsafely in this instance).
There was a problem hiding this comment.
We're casting from dictionary to dictionary here though.
Anyways, the point stands: this is an unsafe cast that generates an invalid array (and will mostly always do so), so is this case worth supporting?
There was a problem hiding this comment.
I think I agree with you. It's not worth testing that. I will remove it. :-)
|
@lidavidm is this a known test failure? |
Looks like this enables something that didn't work before? |
Ah! indeed! thanks... |
|
I think this is good now, thanks! (Just one nit about an unused variable) |
|
|
There was a problem hiding this comment.
Add tests casting from int to/from float and signed to/from unsigned integers.
There was a problem hiding this comment.
I ran int to/from float and unsigned (with Safe and Unsafe options) and it works correctly. LGTM.
There was a problem hiding this comment.
What was the error here? Would be nice to file a JIRA for it
There was a problem hiding this comment.
The error is from ArrayFromJSON(dictionary(..., floatXX()), ...) is:
NotImplemented: JSON conversion to dictionary<values=float, indices=int8, ordered=0> not implemented
Nevertheless, you can Cast successfully to float values with
auto arr = ArrayFromJSON(dictionary(int8(), int32()), "[1, 2, 3, 1, null, 3]");
ASSERT_OK_AND_ASSIGN(auto casted, Cast(arr, dictionary(int8(), float32()), CastOptions::Safe()));
There is also a DictArrayFromJSON but this requires explicit index values:
auto arr = DictArrayFromJSON(dictionary(int8(), float32()), "[0, 1, 2, 3, 4, 5]", "[1, 2, 3, 1, null, 3]");
There was a problem hiding this comment.
@nirandaperera I submitted this PR that enables using ArrayFromJSON for dictionaries with floating-point values. If PR passes and gets merged, you can change test accordingly and remove error comment.
There was a problem hiding this comment.
| // check float types (TODO: ARROW-13381 ArrayFromJSON doesnt work for float value dictionary types) | |
| // check float types | |
| // TODO(ARROW-13381): ArrayFromJSON doesnt work for float value dictionary types |
|
I made the suggested changes and I think this is ready now |
bcc0a02 to
26ffdb3
Compare
kszucs
left a comment
There was a problem hiding this comment.
Thanks Niranda! Merging on green.
This PR adds casting from one dictionary type to anther dictionary type for both scalars and arrays :
ex: