-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Improve performance of merge_percentiles
#7172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dask/array/percentile.py
Outdated
| # comparable to, but typically slower than, `merge_sorted`. | ||
| # | ||
| # >>> A = np.concatenate(map(np.array, map(zip, vals, counts))) | ||
| # >>> A.sort(0, kind='mergesort') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! You can probably update or remove the comment block above. I hope it was helpful. Your solution below looks good. (I wish I could go back and look at the variations I tried and my benchmarks, but I think they are lost to time)
Sanity check: you have cytoolz installed in the benchmark environment, right? Based on your timings, I think it must be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback! Looking at the comment above, should it be:
# Sort by calculated percentile values
rather than
# Sort by calculated percentile values, then number of observations.
?
And yes, I do have cytoolz installed.
… improve-percentile-perf
|
Apologies for the multitude of commits here - got myself in a bit of a git mess. Should be good now. |
merge_percentilesmerge_percentiles
|
No worries we do squash merge here 🙂 |
|
It looks like the tests are passing other than the known mac one. There is a fix on master for that, so if you merge or rebase, the tests should all pass. |
|
Looks like Ashwin merged in |
|
Taking silence here to mean no |
|
Thanks Ashwin! 😄 If anything else comes up, please let us know and we can follow up in a new issue/PR as appropriate 🙂 |
As discussed in #7162 (comment), this PR adds an improvement to merge_percentiles - by using numpy operations rather than
merge_sorted. This makes it significantly faster, especially for CuPy arrays:merge_percentilestimings for NumPy arrays, before and after this PR:merge_percentilestimings for CuPy arrays, before and after this PR:Additional info:
Details