ARROW-15089: [C++][Compute] Implement kernel to lookup a MapArray item for a given key#12162
ARROW-15089: [C++][Compute] Implement kernel to lookup a MapArray item for a given key#12162dhruv9vats wants to merge 24 commits intoapache:masterfrom
Conversation
|
|
There was a problem hiding this comment.
As per your comment below, map_scalar.is_valid needs to be checked before casting, but in that case, how could a NullScalar of an appropriate type be made (which requires item_type)?
There was a problem hiding this comment.
Ah sorry, the cast from Scalar to MapScalar is fine - it's the cast from MapScalar.value to StructArray that is failing, because value is nullptr when is_valid == false.
There was a problem hiding this comment.
I would suggest looking at this in a debugger if you haven't already, just to see what this case looks like.
There was a problem hiding this comment.
Doing something like:
auto descrs = batch.GetDescriptors();
std::shared_ptr<DataType> type = descrs.front().type;
std::shared_ptr<DataType> item_type = checked_cast<const MapType&>(*type).item_type();Like in ResolveMapArrayLookupType okay?
There was a problem hiding this comment.
checked_cast<const MapType&>(*batch[0].type()) is enough, a Datum has its own type.
| std::shared_ptr<DataType> item_type = checked_cast<const MapType&>(*type).item_type(); | ||
| std::shared_ptr<DataType> key_type = checked_cast<const MapType&>(*type).key_type(); | ||
|
|
||
| if (!options.query_key || !options.query_key->type || |
There was a problem hiding this comment.
This does the query_key null checking part, right? @lidavidm
The Errors test also tests for this.
There was a problem hiding this comment.
No, this tests if the shared_ptr is null, but a valid Scalar may contain a null value (Scalar.is_valid) - that needs to be checked as well
There was a problem hiding this comment.
(Also, we should break out the error separately for !query_key || !query_key->is_valid so that we give a relevant error message. This will crash if we give a nullptr query_key since the error message tries to read query_key->type.)
docs/source/python/api/compute.rst
Outdated
| IndexOptions | ||
| JoinOptions | ||
| MakeStructOptions | ||
| MapArrayLookupOptions |
There was a problem hiding this comment.
Just a reminder to revisit and implement this.
To do so, you'll need to:
- Add the options class definition to the Cython pxd:
arrow/python/pyarrow/includes/libarrow.pxd
Lines 2147 to 2150 in 39367db
- Add a Python options class wrapper to the Cython code:
arrow/python/pyarrow/_compute.pyx
Lines 1317 to 1333 in 39367db
- Add an import so it's in the public API:
arrow/python/pyarrow/compute.py
Line 18 in 39367db
- Add the options class to the serialization test: (just add it to the list there, no need to add additional assertions to the test)
arrow/python/pyarrow/tests/test_compute.py
Line 138 in 39367db
- Add a brief test to make sure everything works (no need to add detailed tests, we just want to make sure the bindings were set up):
arrow/python/pyarrow/tests/test_compute.py
Lines 2081 to 2092 in 39367db
docs/source/cpp/compute.rst
Outdated
| +---------------------+------------+-------------------------------------+------------------+---------------------------------+--------+ | ||
| | list_parent_indices | Unary | List-like | Int64 | | \(3) | | ||
| +---------------------+------------+-------------------------------------+------------------+---------------------------------+--------+ | ||
| | map_array_lookup | Unary | Map | Computed | :struct:`MapArrayLookupOptions` | \(4) | |
There was a problem hiding this comment.
I'm really sorry to bikeshed here, but now that I look at it…would "map_lookup" be a better kernel name? I think we only use "array" in a kernel name when the kernel only works on arrays, but this works on arrays and scalars.
There was a problem hiding this comment.
Not a problem, better to be rigorous here than have an inconsistent codebase. We just will have to be on the lookout for anything that I've missed renaming.
|
Also just note a few lints from CI: |
lidavidm
left a comment
There was a problem hiding this comment.
Sorry, my last two comments, promise :)
|
Thanks @dhruv9vats! |
|
Benchmark runs are scheduled for baseline = 5ab4112 and contender = 76decf6. 76decf6 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Implement a Scalar Kernel to lookup a value for a given key in a
MapArray, whose type is an alias forList(Struct(<key>, <item>))Todo:
Add more tests and combinations (Do suggest more)
nullmapsnullitemsnullnon-empty itemsTypes tested:
Template Kernel
Things to consider:
nullkeys. (Null keys are not allowed)