ARROW-7858: [C++][Python] Support casting from ExtensionArray#6633
ARROW-7858: [C++][Python] Support casting from ExtensionArray#6633kszucs wants to merge 9 commits intoapache:masterfrom
Conversation
|
The test failure is related to #6632 |
jorisvandenbossche
left a comment
There was a problem hiding this comment.
Casting to its own storage type (so eg not a different integer bit), still returns an ExtensionArray. With the example from the test:
In [3]: arr.cast(pa.int64())
Out[3]:
<pyarrow.lib.ExtensionArray object at 0x7f4b45ca5fa8>
[
1,
2,
3,
4
]
In [4]: arr.cast(pa.int32())
Out[4]:
<pyarrow.lib.Int32Array object at 0x7f4b45ca5348>
[
1,
2,
3,
4
]
I am not fully sure that this is the expected behaviour.
|
Nice catch, it was caused by the identity shortcut. Added a separate kernel which handles the identity case properly. |
jorisvandenbossche
left a comment
There was a problem hiding this comment.
Looks good to me now! (but not very familiar with the cast kernel code)
|
@bkietz please take a look again |
Co-Authored-By: Benjamin Kietzman <[email protected]>
wesm
left a comment
There was a problem hiding this comment.
The PR title misled me when I reviews this, it should be "Support casting the storage of an ExtensionArray" or similar
| // construct an ArrayData object with the underlying storage type | ||
| auto new_input = input.array()->Copy(); | ||
| new_input->type = storage_type_; | ||
| return InvokeWithAllocation(ctx, storage_caster_.get(), new_input, out); |
There was a problem hiding this comment.
Does it allocate if the out_type and storage_type are the same?
There was a problem hiding this comment.
Added tests on both C++ and Python side, seems like no allocation happens.
| } | ||
| return true; | ||
| return (other.extension_name() == this->extension_name()); | ||
| } |
There was a problem hiding this comment.
I wonder if this should be the default implementation of ExtensionEquals
There was a problem hiding this comment.
The name and the storage type should be equal, created a follow-up JIRA https://issues.apache.org/jira/browse/ARROW-8143
|
There was more C++ code in the implementation than I expected, not sure what could be done to make implementing something that "seems easy" (on paper) easier |
This add support for casting from ExtensionType. We should probably add support for converting to an ExtensionType, but we can handle it in a follow-up.