ARROW-1699: [C++] forward, backward fill kernel functions#11853
ARROW-1699: [C++] forward, backward fill kernel functions#11853AlvinJ15 wants to merge 1 commit intoapache:masterfrom
Conversation
|
|
ea62a7a to
638aa61
Compare
29597b4 to
6638d56
Compare
0758449 to
ba4edd8
Compare
|
@lidavidm all comments were solved |
|
Small naming suggestion: maybe we could use "fill_null_forward" instead of "fill_forward_null" to align it more with the existing "fill_null" |
a15ebe7 to
77530fd
Compare
|
For the failures with chunked arrays, the kernel needs these options set: diff --git a/cpp/src/arrow/compute/kernels/vector_replace.cc b/cpp/src/arrow/compute/kernels/vector_replace.cc
index d85965cfd..8ee4bbfe1 100644
--- a/cpp/src/arrow/compute/kernels/vector_replace.cc
+++ b/cpp/src/arrow/compute/kernels/vector_replace.cc
@@ -799,6 +799,8 @@ void RegisterVectorFunction(FunctionRegistry* registry,
kernel.mem_allocation = MemAllocation::type::PREALLOCATE;
kernel.signature = Functor<FixedType>::GetSignature(get_id.id);
kernel.exec = std::move(exec);
+ kernel.can_execute_chunkwise = false;
+ kernel.output_chunked = false;
DCHECK_OK(func->AddKernel(std::move(kernel)));
};
auto add_primitive_kernel = [&](detail::GetTypeId get_id) {
diff --git a/cpp/src/arrow/compute/kernels/vector_replace_test.cc b/cpp/src/arrow/compute/kernels/vector_replace_test.cc
index 742facf19..48a0b40f5 100644
--- a/cpp/src/arrow/compute/kernels/vector_replace_test.cc
+++ b/cpp/src/arrow/compute/kernels/vector_replace_test.cc
@@ -110,7 +110,7 @@ class TestReplaceKernel : public ::testing::Test {
const std::shared_ptr<ChunkedArray> array,
const std::shared_ptr<ChunkedArray>& expected) {
ASSERT_OK_AND_ASSIGN(auto actual, func(Datum(*array), nullptr));
- AssertChunkedEqual(*expected, *actual.chunked_array());
+ AssertChunkedEquivalent(*expected, *actual.chunked_array());
}The test still fails, but I'll leave that for further debugging. Basically, with a chunked array input, by default, the compute infrastructure will split up the inputs and feed the kernel one chunk at a time, instead of the entire chunked array. In that case, it expects the kernel to output Array, not ChunkedArray, as it will assemble the final ChunkedArray itself. Also, in that case, it expects the kernel to keep track of its own state (in KernelState in KernelContext). Setting these options tells the compute infrastructure not to do this splitting and not to expect an Array output. |
|
Also note that I would expect this: @@ -1255,7 +1255,7 @@ TYPED_TEST(TestFillNullNumeric, FillForwardChunkedArray) {
this->AssertFillNullChunkedArray(
FillForwardNull,
this->chunked_array({"[1,2,3,null,null,4]", "[5, null, null]", "[null, 7, null]"}),
- this->chunked_array({"[1,2,3,3,3,4]", "[5, 5, 5]", "[null, 7, 7]"}));
+ this->chunked_array({"[1,2,3,3,3,4]", "[5, 5, 5]", "[5, 7, 7]"}));
}
} // namespace compute
} // namespace arrowi.e. the ChunkedArray should behave as one large array, not an array of independent arrays. |
77530fd to
3c0f708
Compare
lidavidm
left a comment
There was a problem hiding this comment.
Note this needs to be rebased now.
There was a problem hiding this comment.
For Date Types, there's not exist a Random implementation, and it return an nullptr array_random
There was a problem hiding this comment.
Just a reminder here.
For Date Types which are part of the NumericBasedTypes, there's not exist a Random implementation, and it return an nullptr array_random
There was a problem hiding this comment.
@AlvinJ15: what is the reason for this check? If there is one, please let me know - I'm just wondering why it's here.
There was a problem hiding this comment.
same as the next comment, no support for rand.Numeric( ) for DateType, even if I use rand.ArrayOf no support for ConstantArrayGenerator::Numeric(DateTypes).
3c0f708 to
21a2885
Compare
|
@lidavidm I resolved all comments. The AppVeyor gave me errors on pipeline, how Can I solve that error? |
|
It's not related to this PR. We can ignore it. |
cb4a1a3 to
f77d30a
Compare
lidavidm
left a comment
There was a problem hiding this comment.
Thanks. We just need to make sure the forward/backward implementations are consistent now.
There was a problem hiding this comment.
For Date Types which are part of the NumericBasedTypes, there's not exist a Random implementation, and it return an nullptr array_random
There was a problem hiding this comment.
Do you mind using ArrayOf instead then? It should handle all types:
arrow/cpp/src/arrow/testing/random.h
Line 380 in c6143a2
There was a problem hiding this comment.
It work for generate a random_array, but currently there is not support for DateTypes for generate a ConstantArray which is needed for this test.
auto constant_array = ConstantArrayGenerator::Numeric<TypeParam>(len_null, x_ptr[len_random - 1]) Concatenate({array_random, constant_array, array_random}))
There was a problem hiding this comment.
Then please document why the check is needed.
There was a problem hiding this comment.
Or actually, frankly: just add those cases to ConstantArrayGenerator::Numeric, e.g.
case Date32Type:
return Int32(size, ...)->View(date32());arrow/cpp/src/arrow/testing/generator.h
Lines 129 to 157 in 91e3ac5
There was a problem hiding this comment.
@lidavidm the types were added and the condition was removed
f77d30a to
58aee3d
Compare
There was a problem hiding this comment.
@AlvinJ15: what is the reason for this check? If there is one, please let me know - I'm just wondering why it's here.
58aee3d to
37bd845
Compare
323ee82 to
8942626
Compare
8942626 to
1e8bf58
Compare
lidavidm
left a comment
There was a problem hiding this comment.
This needs to be formatted again.
1e8bf58 to
1b0aa99
Compare
1b0aa99 to
f921e53
Compare
There was a problem hiding this comment.
Can these checks be eliminated?
That said, I'm not sure we need random tests for slicing. I think the hardcoded ones should be sufficient.
There was a problem hiding this comment.
Ok I deleted this random tests for slicing.
f921e53 to
d9fb98b
Compare
d9fb98b to
3998a7f
Compare
|
Ah shoot, I forgot to ask you to update the docs. I'll file a minor PR. |
|
Benchmark runs are scheduled for baseline = 67a29fd and contender = ec38aeb. ec38aeb is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Add the docs for the functions from ARROW-1699/PR #11853 since they were omitted in the PR. Closes #12082 from lidavidm/arrow-1699 Authored-by: David Li <[email protected]> Signed-off-by: David Li <[email protected]>
ARROW-1699: [C++] forward, backward fill kernel functions