-
-
Notifications
You must be signed in to change notification settings - Fork 11.9k
ENH: np.unique: support hash based unique for complex dtype #29537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Very cool! When you finish this up for review it'd be nice to see some before/after comparisons on some benchmarks. |
|
Now it's ready for being reviewed. |
|
I added the triage review label so hopefully someone at the next triage meeting on Wedesnday will volunteer to look closer at this. Unfortunately I can't take that on this month - too many things going on! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR extends NumPy's np.unique function to support hash-based unique value extraction for float and complex dtypes, providing significant performance improvements over the existing sort-based approach. The implementation leverages hash tables to efficiently identify unique values without requiring sorting.
Key changes include:
- Implementation of hash-based unique extraction for float and complex numeric types
- Addition of specialized hash and equality functions that handle NaN values appropriately
- Updates to test cases to accommodate unsorted output from hash-based approach
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
numpy/_core/src/multiarray/unique.cpp |
Core implementation adding hash-based unique support for float/complex types with NaN handling |
numpy/lib/tests/test_arraysetops.py |
Updated test assertions to handle unsorted output from hash-based unique operations |
numpy/_core/meson.build |
Added npymath include directory for math function access |
doc/release/upcoming_changes/29537.performance.rst |
Documentation of performance improvements |
doc/release/upcoming_changes/29537.change.rst |
Documentation of behavior change regarding sorted output |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
I'm not sure what the state of the It's probably worth benchmarking several sizes of inputs, not just really huge arrays. |
|
@charris is curious how you're handling NaN. What happens if there's more than one distinct NaN value in the array. |
|
I see you debugging in CI here. That's a pretty expensive (in terms of energy usage) way to debug. Maybe you'll find the docs I recently wrote about using native debuggers with NumPy useful? https://numpy.org/devdocs/dev/development_advanced_debugging.html#c-debuggers |
|
Thank you for the suggestion, but the bug did not reproduce in my local environment even in debug build |
995c5c8 to
2d1a6b9
Compare
|
@ngoldbaum numpy/numpy/_core/src/npymath/npy_math_private.h Lines 254 to 261 in 989e9a6
|
|
Awesome! I'll take a close look at this soon and will make sure we ship this in 2.4.0. In the meantime, would you mind re-running the benchmarks to make sure there are no big regressions with the new implementation? Or maybe it's faster? Also to make sure the numbers in the release note are still accurate. |
|
I already run benchmarks and ensured there is no big regressions. |
ngoldbaum
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping @seberg and @inakleinbottle in case either of you would like to give this a once-over and hit the merge button if you don't spot any issues
| #if defined(HAVE_LDOUBLE_IBM_DOUBLE_DOUBLE_BE) || defined(HAVE_LDOUBLE_IBM_DOUBLE_DOUBLE_LE) | ||
| constexpr size_t SIZEOF_BUFFER = NPY_SIZEOF_CLONGDOUBLE; | ||
| const unsigned char* buffer = reinterpret_cast<const unsigned char*>(value); | ||
| #else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach below looks correct to me. My only comment is that you might be able to add an #else if block for platforms where longdouble and double are identical, avoiding the possibly slightly more expensive but standards-compliant version below for 80-bit longdouble on Linux. I don't know offhand if we already have preprocessor macros for platforms where longdouble is double already defined. If not, then this is fine as-is and no need for you to figure that out to merge this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practice, long double representations that contain junk bits only appear in the following cases:
• HAVE_LDOUBLE_INTEL_EXTENDED_12_BYTES_LE
• HAVE_LDOUBLE_INTEL_EXTENDED_16_BYTES_LE
• HAVE_LDOUBLE_MOTOROLA_EXTENDED_12_BYTES_BE
Therefore, the high-cost special handling is actually required only for these cases.
For all other platforms,
const unsigned char* buffer = reinterpret_cast<const unsigned char*>(value);
should be sufficient.
Actually, macros such as HAVE_LDOUBLE_IEEE_DOUBLE_BE indicate that long double has the same representation as double in big-endian format.
numpy/numpy/_core/src/npymath/npy_math_private.h
Lines 178 to 290 in 989e9a6
| #if defined(HAVE_LDOUBLE_INTEL_EXTENDED_12_BYTES_LE) | |
| /* | |
| * Intel extended 80 bits precision. Bit representation is | |
| * | junk | s |eeeeeeeeeeeeeee|mmmmmmmm................mmmmmmm| | |
| * | 16 bits| 1 bit | 15 bits | 64 bits | | |
| * | a[2] | a[1] | a[0] | | |
| * | |
| * 16 stronger bits of a[2] are junk | |
| */ | |
| typedef npy_uint32 IEEEl2bitsrep_part; | |
| union IEEEl2bitsrep { | |
| npy_longdouble e; | |
| IEEEl2bitsrep_part a[3]; | |
| }; | |
| #define LDBL_MANL_INDEX 0 | |
| #define LDBL_MANL_MASK 0xFFFFFFFF | |
| #define LDBL_MANL_SHIFT 0 | |
| #define LDBL_MANH_INDEX 1 | |
| #define LDBL_MANH_MASK 0xFFFFFFFF | |
| #define LDBL_MANH_SHIFT 0 | |
| #define LDBL_EXP_INDEX 2 | |
| #define LDBL_EXP_MASK 0x7FFF | |
| #define LDBL_EXP_SHIFT 0 | |
| #define LDBL_SIGN_INDEX 2 | |
| #define LDBL_SIGN_MASK 0x8000 | |
| #define LDBL_SIGN_SHIFT 15 | |
| #define LDBL_NBIT 0x80000000 | |
| typedef npy_uint32 ldouble_man_t; | |
| typedef npy_uint32 ldouble_exp_t; | |
| typedef npy_uint32 ldouble_sign_t; | |
| #elif defined(HAVE_LDOUBLE_INTEL_EXTENDED_16_BYTES_LE) | |
| /* | |
| * Intel extended 80 bits precision, 16 bytes alignment.. Bit representation is | |
| * | junk | s |eeeeeeeeeeeeeee|mmmmmmmm................mmmmmmm| | |
| * | 16 bits| 1 bit | 15 bits | 64 bits | | |
| * | a[2] | a[1] | a[0] | | |
| * | |
| * a[3] and 16 stronger bits of a[2] are junk | |
| */ | |
| typedef npy_uint32 IEEEl2bitsrep_part; | |
| union IEEEl2bitsrep { | |
| npy_longdouble e; | |
| IEEEl2bitsrep_part a[4]; | |
| }; | |
| #define LDBL_MANL_INDEX 0 | |
| #define LDBL_MANL_MASK 0xFFFFFFFF | |
| #define LDBL_MANL_SHIFT 0 | |
| #define LDBL_MANH_INDEX 1 | |
| #define LDBL_MANH_MASK 0xFFFFFFFF | |
| #define LDBL_MANH_SHIFT 0 | |
| #define LDBL_EXP_INDEX 2 | |
| #define LDBL_EXP_MASK 0x7FFF | |
| #define LDBL_EXP_SHIFT 0 | |
| #define LDBL_SIGN_INDEX 2 | |
| #define LDBL_SIGN_MASK 0x8000 | |
| #define LDBL_SIGN_SHIFT 15 | |
| #define LDBL_NBIT 0x800000000 | |
| typedef npy_uint32 ldouble_man_t; | |
| typedef npy_uint32 ldouble_exp_t; | |
| typedef npy_uint32 ldouble_sign_t; | |
| #elif defined(HAVE_LDOUBLE_MOTOROLA_EXTENDED_12_BYTES_BE) | |
| /* | |
| * Motorola extended 80 bits precision. Bit representation is | |
| * | s |eeeeeeeeeeeeeee| junk |mmmmmmmm................mmmmmmm| | |
| * | 1 bit | 15 bits | 16 bits| 64 bits | | |
| * | a[0] | a[1] | a[2] | | |
| * | |
| * 16 low bits of a[0] are junk | |
| */ | |
| typedef npy_uint32 IEEEl2bitsrep_part; | |
| union IEEEl2bitsrep { | |
| npy_longdouble e; | |
| IEEEl2bitsrep_part a[3]; | |
| }; | |
| #define LDBL_MANL_INDEX 2 | |
| #define LDBL_MANL_MASK 0xFFFFFFFF | |
| #define LDBL_MANL_SHIFT 0 | |
| #define LDBL_MANH_INDEX 1 | |
| #define LDBL_MANH_MASK 0xFFFFFFFF | |
| #define LDBL_MANH_SHIFT 0 | |
| #define LDBL_EXP_INDEX 0 | |
| #define LDBL_EXP_MASK 0x7FFF0000 | |
| #define LDBL_EXP_SHIFT 16 | |
| #define LDBL_SIGN_INDEX 0 | |
| #define LDBL_SIGN_MASK 0x80000000 | |
| #define LDBL_SIGN_SHIFT 31 | |
| #define LDBL_NBIT 0x80000000 | |
| typedef npy_uint32 ldouble_man_t; | |
| typedef npy_uint32 ldouble_exp_t; | |
| typedef npy_uint32 ldouble_sign_t; |
|
Alright, I’d like to have this merged in with a little time before 2.4.0 comes out. I doubt anyone else will look closely before the release and I already held this up enough already. Thanks for your diligence here @math-hiyoko. |
) * save: editting * enh: implement unique_complex * fix: spin build * enh: impl unique_numeric for integer, float and complex * fix: case when complex is nan * fix: case when order is not guaranteed * fix: lint * fix: type of data * fix: error * enh: change template type for unordered_set * enh: set function * enh: refactoring * enh: refactoring * enh: refactor test * enh: apply inline * doc: performance and change * enh: test check -0.0 and -np.nan * enh: use fnv-hash for complex * fix: add const * fix: use memcpy * fix: use static_cast * fix: remove support for float * fix: change variable name * fix: inline * fix: template type * fix: padding buffer * fix: padding buffer * fix: use buffer * fix: add constexpr * fix: copy of long double * fix: valie_real -> value_imag * fix: reinterpret value pointer * fix: use NPY_BITSOF_* * fix: use fnv magic prime * enh: refactoring * doc: fix docs * doc: fix docs * enh: add comment * fix: use sizeof to hash complex * fix: use reinterpret_cast * fix: impl hash_complex_large * fix: use constexpr * fix: define hash_complex_clongdouble * fix: use NPY_SIZEOF_LONGDOUBLE * fix: use IEEEl2bitsrep * fix: remove designated initializers * fix: comment * fix: small fix * fix: change case for handling bit expression * fix: refactoring
) * save: editting * enh: implement unique_complex * fix: spin build * enh: impl unique_numeric for integer, float and complex * fix: case when complex is nan * fix: case when order is not guaranteed * fix: lint * fix: type of data * fix: error * enh: change template type for unordered_set * enh: set function * enh: refactoring * enh: refactoring * enh: refactor test * enh: apply inline * doc: performance and change * enh: test check -0.0 and -np.nan * enh: use fnv-hash for complex * fix: add const * fix: use memcpy * fix: use static_cast * fix: remove support for float * fix: change variable name * fix: inline * fix: template type * fix: padding buffer * fix: padding buffer * fix: use buffer * fix: add constexpr * fix: copy of long double * fix: valie_real -> value_imag * fix: reinterpret value pointer * fix: use NPY_BITSOF_* * fix: use fnv magic prime * enh: refactoring * doc: fix docs * doc: fix docs * enh: add comment * fix: use sizeof to hash complex * fix: use reinterpret_cast * fix: impl hash_complex_large * fix: use constexpr * fix: define hash_complex_clongdouble * fix: use NPY_SIZEOF_LONGDOUBLE * fix: use IEEEl2bitsrep * fix: remove designated initializers * fix: comment * fix: small fix * fix: change case for handling bit expression * fix: refactoring
Description
This PR introduces hash-based uniqueness extraction support for complex types in NumPy's np.unique function.
Benchmark Results
The benchmark demonstrates significant performance improvement from the new implementation.
close #28363