ARROW-14183: [C++] Improve select_k_unstable performance#12582
ARROW-14183: [C++] Improve select_k_unstable performance#12582AlvinJ15 wants to merge 1 commit intoapache:mainfrom
Conversation
|
@ursabot please benchmark |
|
Benchmark runs are scheduled for baseline = 63e1acc and contender = 39002af84e35b725ba48f71f160bcffbbe2cefda. Results will be available as each benchmark for each run completes. |
|
Looks this PR is still WIP? |
39002af to
e12db21
Compare
|
Yes @cyb70289 I have different runs on my local. master branch with master branch with |
|
Should you close the older PR for the same issue? |
aocsa
left a comment
There was a problem hiding this comment.
This PR looks an interesting change in favor of better performance. I put some general comments, however just for clarity I would suggest to show better the benchmark results as well the improvement in numbers (%) and put some comments in which cases you are getting better/worse results. Moreover if these cases are localized I would suggest also to add new unit tests with comments around this.
| checked_cast<const ArrayType*>(chunks_[loc.chunk_index]), loc.index_in_chunk); | ||
| } | ||
|
|
||
| template <typename ArrayType> |
There was a problem hiding this comment.
I would suggest to add some documentation to these internal but exposes functions (Resolve, ResolveChunkLocation).
| DCHECK_EQ(sorted[i].overall_end(), indices_begin + end_offset); | ||
| DCHECK_EQ(sorted[i].non_null_count() + sorted[i].null_count(), batch.num_rows()); | ||
| begin_offset = end_offset; | ||
| // XXX this is an upper bound on the true null count |
|
@AlvinJ15 , are you still working on this PR? |
|
Closing because it has been untouched for a while, in case it's still relevant feel free to reopen and move it forward 👍 |










Improve select_k_unstable performance, for tables using radix sort for sort individual batches, and a merge sort with K first elements.