-
-
Notifications
You must be signed in to change notification settings - Fork 11.9k
ENH: Perf improvements to np.sort, np.argsort, np.partition and np.argpartition #24924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@r-devulap the test failures here do look real, at least in the form of compiler warnings. EDIT: If you should want to backport the DOWNFALL fix, that probably would need a more targeted diff. OTOH, I am not sure it's worth the trouble since it "only" restores performance. |
|
@seberg Indeed, still working on fixing them. And I don't think we need to worry about backporting the DOWNFALL fix. |
|
Update: We have changed API to use |
So that is technically be broken or need a guard until we would do gh-24888? Although I guess that the use of that code implies we are not on a niche platform. |
I don't think that's necessary.
let me know if my assumptions are wrong. |
|
hmm, the macOS x86_64 failure seems hard to figure out. It's using x86_64-apple-darwin13.4.0-clang++ (clang 15.0.7 "clang version 15.0.7") but I don't see this build fail with clang++-15.0.7 locally. Fails with: |
|
yay, finally. Decided to explicitly instantiate to |
|
Friendly ping :) |
|
Changes in NumPy here are pretty minor in this PR. Perhaps we can close this and merge it into a single PR #25045. Please reopen if you disagree. |
Updating x86-simd-sort to latest commit. Includes 2 major updates:
np.sortby up-to 2x for 32-bit and up-to 1.5x for 64-bit data. Ref: Various performance improvements x86-simd-sort#83 adds optimal sorting networks and minor improvements to vectorized partitioning. This also speeds upnp.partitionby up-to 1.3x.np.argsortandnp.argparitionrelied heavily on the gather instruction which unfortunately is terrible for performance because of a new vulnerability DOWNFALL (see https://www.phoronix.com/review/downfall). We reverted back to using scalar emulation of gather (see Use scalar emulation of gather instruction for arg methods x86-simd-sort#65) which then improves performance by about 1.6x for bothnp.argsortandnp.argpartition.Benchmarks for random data:
Detailed benchmarks can be seen here: https://gist.github.com/r-devulap/21dc3afdbab47c7aa08087c1445954b7