GH-40207: [C++] TakeCC: Concatenate only once and delegate to TakeAA instead of TakeCA#40206
GH-40207: [C++] TakeCC: Concatenate only once and delegate to TakeAA instead of TakeCA#40206pitrou merged 8 commits intoapache:mainfrom
Conversation
|
|
|
+1. Thanks! |
|
@js8544 thank you for the review. I pushed a revert for the last commit because it isn't necessary for this PR and might conflict with a PR someone else is working on. If not one objects, I will probably merge this right before I'm ready to send more Take-related PRs. |
There was a problem hiding this comment.
Why not add at least "ChunkedChunked" variations for each of these types as well? (FSB, FSL, String). I'm assuming the performance characteristics might be different?
There was a problem hiding this comment.
("ChunkedFlat" is less interesting arguably)
This comment was marked as outdated.
This comment was marked as outdated.
|
@felipecrv Would you like to rebase if not already done? |
Soon. I'm currently working on top of this branch locally 😅 |
…eCAC ...which would concatenate values on every loop iteration.
- Use args.size correctly (no division by sizeof(int64_t) - More chunks: 100 instead of just 10 - Keep the number of indices in the chunked version equal to the number of items (just like the non-chunked benchmarks) - Variations without chunking of the indices
|
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 2c13a19. There were 2 benchmark results with an error:
There were 13 benchmark results indicating a performance regression:
The full Conbench report has more details. It also includes information about 5 possible false positives for unstable benchmarks that are known to sometimes produce them. |


Rationale for this change
takeconcatenates chunks when it's applied to a chunkedvaluesarray, but when theindicesarrays is alsochunkedit concatenatesvaluesmore than once -- oneConcatenatecall withvalues.chunks()for every chunk inindices. This PR doesn't remove the concatenation, but ensures it's done only once instead ofindices.size()times.What changes are included in this PR?
TakeXXnames (->TakeXXY) to makes code easier to understandTakeCCC— copied from ARROW-9773: [C++] Implement Take kernel for ChunkedArray #13857Are these changes tested?
By existing tests.
Are there any user-facing changes?
A faster compute kernel.