Skip to content

Conversation

@math-hiyoko
Copy link
Contributor

@math-hiyoko math-hiyoko commented Aug 11, 2025

Description

This PR introduces hash-based uniqueness extraction support for complex types in NumPy's np.unique function.

Benchmark Results

The benchmark demonstrates significant performance improvement from the new implementation.

$ spin bench -t Unique.time_unique_value -c
Change Before [989e9a6] After [4ff52ca] <feature/#28363> Ratio Benchmark (Parameter)
- 5.92±0.08μs 4.75±0.06μs 0.8 bench_lib.Unique.time_unique_values(200, 10.0, 20.0, <class 'numpy.complex128'>)
- 12.2±0.1ms 8.63±0.1ms 0.7 bench_lib.Unique.time_unique_values(200000, 10.0, 20.0, <class 'numpy.complex128'>)
- 5.31±0.3μs 2.95±0.04μs 0.56 bench_lib.Unique.time_unique_values(200, 10.0, 0.2, <class 'numpy.complex128'>)
- 5.03±0.02ms 2.71±0.03ms 0.54 bench_lib.Unique.time_unique_values(200000, 90.0, 20.0, <class 'numpy.complex128'>)
- 5.57±0.04μs 2.62±0.05μs 0.47 bench_lib.Unique.time_unique_values(200, 90.0, 20.0, <class 'numpy.complex128'>)
- 5.42±0.07μs 2.04±0.03μs 0.38 bench_lib.Unique.time_unique_values(200, 90.0, 0.2, <class 'numpy.complex128'>)
- 9.91±0.1ms 2.33±0.02ms 0.23 bench_lib.Unique.time_unique_values(200000, 10.0, 0.2, <class 'numpy.complex128'>)
- 5.02±0.08ms 882±4μs 0.18 bench_lib.Unique.time_unique_values(200000, 90.0, 0.2, <class 'numpy.complex128'>)

close #28363

@math-hiyoko math-hiyoko marked this pull request as draft August 11, 2025 04:25
@math-hiyoko math-hiyoko changed the title Feature/#28363 ENH: np.unique: support hash based unique for float and complex dtype Aug 11, 2025
@ngoldbaum
Copy link
Member

Very cool! When you finish this up for review it'd be nice to see some before/after comparisons on some benchmarks.

@math-hiyoko math-hiyoko marked this pull request as ready for review August 15, 2025 13:43
@math-hiyoko
Copy link
Contributor Author

Now it's ready for being reviewed.

@ngoldbaum ngoldbaum added 01 - Enhancement triage review Issue/PR to be discussed at the next triage meeting labels Aug 18, 2025
@ngoldbaum
Copy link
Member

I added the triage review label so hopefully someone at the next triage meeting on Wedesnday will volunteer to look closer at this. Unfortunately I can't take that on this month - too many things going on!

@jorenham jorenham requested a review from Copilot August 20, 2025 18:05
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR extends NumPy's np.unique function to support hash-based unique value extraction for float and complex dtypes, providing significant performance improvements over the existing sort-based approach. The implementation leverages hash tables to efficiently identify unique values without requiring sorting.

Key changes include:

  • Implementation of hash-based unique extraction for float and complex numeric types
  • Addition of specialized hash and equality functions that handle NaN values appropriately
  • Updates to test cases to accommodate unsorted output from hash-based approach

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
numpy/_core/src/multiarray/unique.cpp Core implementation adding hash-based unique support for float/complex types with NaN handling
numpy/lib/tests/test_arraysetops.py Updated test assertions to handle unsorted output from hash-based unique operations
numpy/_core/meson.build Added npymath include directory for math function access
doc/release/upcoming_changes/29537.performance.rst Documentation of performance improvements
doc/release/upcoming_changes/29537.change.rst Documentation of behavior change regarding sorted output

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@ngoldbaum
Copy link
Member

I'm not sure what the state of the unique benchmarks is in our benchmarks, but I think it would help to add new benchmarks for this case to our benchmark suite.

It's probably worth benchmarking several sizes of inputs, not just really huge arrays.

@ngoldbaum
Copy link
Member

@charris is curious how you're handling NaN. What happens if there's more than one distinct NaN value in the array.

@ngoldbaum
Copy link
Member

I see you debugging in CI here. That's a pretty expensive (in terms of energy usage) way to debug. Maybe you'll find the docs I recently wrote about using native debuggers with NumPy useful? https://numpy.org/devdocs/dev/development_advanced_debugging.html#c-debuggers

@math-hiyoko
Copy link
Contributor Author

Thank you for the suggestion, but the bug did not reproduce in my local environment even in debug build

@math-hiyoko
Copy link
Contributor Author

@ngoldbaum
As described in npy_math_private.h, some floating-point formats include junk bits inside their binary representation, which means that simply hashing the sizeof representation cannot work reliably.
To address this, I used the macros defined in that file to extract and pack the mantissa, exponent, and sign components separately into a buffer, and then computed the hash from that buffer.
This approach successfully avoids the undefined-bit issue.

/*
* Motorola extended 80 bits precision. Bit representation is
* | s |eeeeeeeeeeeeeee| junk |mmmmmmmm................mmmmmmm|
* | 1 bit | 15 bits | 16 bits| 64 bits |
* | a[0] | a[1] | a[2] |
*
* 16 low bits of a[0] are junk
*/

@ngoldbaum
Copy link
Member

ngoldbaum commented Oct 17, 2025

Awesome! I'll take a close look at this soon and will make sure we ship this in 2.4.0.

In the meantime, would you mind re-running the benchmarks to make sure there are no big regressions with the new implementation? Or maybe it's faster? Also to make sure the numbers in the release note are still accurate.

@math-hiyoko
Copy link
Contributor Author

math-hiyoko commented Oct 17, 2025

I already run benchmarks and ensured there is no big regressions.
The result table in the PR comment #29537 (comment) is updated to the latest commit.

Copy link
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @seberg and @inakleinbottle in case either of you would like to give this a once-over and hit the merge button if you don't spot any issues

#if defined(HAVE_LDOUBLE_IBM_DOUBLE_DOUBLE_BE) || defined(HAVE_LDOUBLE_IBM_DOUBLE_DOUBLE_LE)
constexpr size_t SIZEOF_BUFFER = NPY_SIZEOF_CLONGDOUBLE;
const unsigned char* buffer = reinterpret_cast<const unsigned char*>(value);
#else
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach below looks correct to me. My only comment is that you might be able to add an #else if block for platforms where longdouble and double are identical, avoiding the possibly slightly more expensive but standards-compliant version below for 80-bit longdouble on Linux. I don't know offhand if we already have preprocessor macros for platforms where longdouble is double already defined. If not, then this is fine as-is and no need for you to figure that out to merge this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice, long double representations that contain junk bits only appear in the following cases:
• HAVE_LDOUBLE_INTEL_EXTENDED_12_BYTES_LE
• HAVE_LDOUBLE_INTEL_EXTENDED_16_BYTES_LE
• HAVE_LDOUBLE_MOTOROLA_EXTENDED_12_BYTES_BE

Therefore, the high-cost special handling is actually required only for these cases.
For all other platforms,
const unsigned char* buffer = reinterpret_cast<const unsigned char*>(value);
should be sufficient.

Actually, macros such as HAVE_LDOUBLE_IEEE_DOUBLE_BE indicate that long double has the same representation as double in big-endian format.

#if defined(HAVE_LDOUBLE_INTEL_EXTENDED_12_BYTES_LE)
/*
* Intel extended 80 bits precision. Bit representation is
* | junk | s |eeeeeeeeeeeeeee|mmmmmmmm................mmmmmmm|
* | 16 bits| 1 bit | 15 bits | 64 bits |
* | a[2] | a[1] | a[0] |
*
* 16 stronger bits of a[2] are junk
*/
typedef npy_uint32 IEEEl2bitsrep_part;
union IEEEl2bitsrep {
npy_longdouble e;
IEEEl2bitsrep_part a[3];
};
#define LDBL_MANL_INDEX 0
#define LDBL_MANL_MASK 0xFFFFFFFF
#define LDBL_MANL_SHIFT 0
#define LDBL_MANH_INDEX 1
#define LDBL_MANH_MASK 0xFFFFFFFF
#define LDBL_MANH_SHIFT 0
#define LDBL_EXP_INDEX 2
#define LDBL_EXP_MASK 0x7FFF
#define LDBL_EXP_SHIFT 0
#define LDBL_SIGN_INDEX 2
#define LDBL_SIGN_MASK 0x8000
#define LDBL_SIGN_SHIFT 15
#define LDBL_NBIT 0x80000000
typedef npy_uint32 ldouble_man_t;
typedef npy_uint32 ldouble_exp_t;
typedef npy_uint32 ldouble_sign_t;
#elif defined(HAVE_LDOUBLE_INTEL_EXTENDED_16_BYTES_LE)
/*
* Intel extended 80 bits precision, 16 bytes alignment.. Bit representation is
* | junk | s |eeeeeeeeeeeeeee|mmmmmmmm................mmmmmmm|
* | 16 bits| 1 bit | 15 bits | 64 bits |
* | a[2] | a[1] | a[0] |
*
* a[3] and 16 stronger bits of a[2] are junk
*/
typedef npy_uint32 IEEEl2bitsrep_part;
union IEEEl2bitsrep {
npy_longdouble e;
IEEEl2bitsrep_part a[4];
};
#define LDBL_MANL_INDEX 0
#define LDBL_MANL_MASK 0xFFFFFFFF
#define LDBL_MANL_SHIFT 0
#define LDBL_MANH_INDEX 1
#define LDBL_MANH_MASK 0xFFFFFFFF
#define LDBL_MANH_SHIFT 0
#define LDBL_EXP_INDEX 2
#define LDBL_EXP_MASK 0x7FFF
#define LDBL_EXP_SHIFT 0
#define LDBL_SIGN_INDEX 2
#define LDBL_SIGN_MASK 0x8000
#define LDBL_SIGN_SHIFT 15
#define LDBL_NBIT 0x800000000
typedef npy_uint32 ldouble_man_t;
typedef npy_uint32 ldouble_exp_t;
typedef npy_uint32 ldouble_sign_t;
#elif defined(HAVE_LDOUBLE_MOTOROLA_EXTENDED_12_BYTES_BE)
/*
* Motorola extended 80 bits precision. Bit representation is
* | s |eeeeeeeeeeeeeee| junk |mmmmmmmm................mmmmmmm|
* | 1 bit | 15 bits | 16 bits| 64 bits |
* | a[0] | a[1] | a[2] |
*
* 16 low bits of a[0] are junk
*/
typedef npy_uint32 IEEEl2bitsrep_part;
union IEEEl2bitsrep {
npy_longdouble e;
IEEEl2bitsrep_part a[3];
};
#define LDBL_MANL_INDEX 2
#define LDBL_MANL_MASK 0xFFFFFFFF
#define LDBL_MANL_SHIFT 0
#define LDBL_MANH_INDEX 1
#define LDBL_MANH_MASK 0xFFFFFFFF
#define LDBL_MANH_SHIFT 0
#define LDBL_EXP_INDEX 0
#define LDBL_EXP_MASK 0x7FFF0000
#define LDBL_EXP_SHIFT 16
#define LDBL_SIGN_INDEX 0
#define LDBL_SIGN_MASK 0x80000000
#define LDBL_SIGN_SHIFT 31
#define LDBL_NBIT 0x80000000
typedef npy_uint32 ldouble_man_t;
typedef npy_uint32 ldouble_exp_t;
typedef npy_uint32 ldouble_sign_t;

@ngoldbaum ngoldbaum added this to the 2.4.0 release milestone Oct 20, 2025
@ngoldbaum
Copy link
Member

Alright, I’d like to have this merged in with a little time before 2.4.0 comes out. I doubt anyone else will look closely before the release and I already held this up enough already.

Thanks for your diligence here @math-hiyoko.

@ngoldbaum ngoldbaum merged commit fa84b04 into numpy:main Oct 30, 2025
76 checks passed
@math-hiyoko math-hiyoko deleted the feature/#28363 branch October 30, 2025 02:32
cakedev0 pushed a commit to cakedev0/numpy that referenced this pull request Dec 5, 2025
)

* save: editting

* enh: implement unique_complex

* fix: spin build

* enh: impl unique_numeric for integer, float and complex

* fix: case when complex is nan

* fix: case when order is not guaranteed

* fix: lint

* fix: type of data

* fix: error

* enh: change template type for unordered_set

* enh: set function

* enh: refactoring

* enh: refactoring

* enh: refactor test

* enh: apply inline

* doc: performance and change

* enh: test check -0.0 and -np.nan

* enh: use fnv-hash for complex

* fix: add const

* fix: use memcpy

* fix: use static_cast

* fix: remove support for float

* fix: change variable name

* fix: inline

* fix: template type

* fix: padding buffer

* fix: padding buffer

* fix: use buffer

* fix: add constexpr

* fix: copy of long double

* fix: valie_real -> value_imag

* fix: reinterpret value pointer

* fix: use NPY_BITSOF_*

* fix: use fnv magic prime

* enh: refactoring

* doc: fix docs

* doc: fix docs

* enh: add comment

* fix: use sizeof to hash complex

* fix: use reinterpret_cast

* fix: impl hash_complex_large

* fix: use constexpr

* fix: define hash_complex_clongdouble

* fix: use NPY_SIZEOF_LONGDOUBLE

* fix: use IEEEl2bitsrep

* fix: remove designated initializers

* fix: comment

* fix: small fix

* fix: change case for handling bit expression

* fix: refactoring
IndifferentArea pushed a commit to IndifferentArea/numpy that referenced this pull request Dec 7, 2025
)

* save: editting

* enh: implement unique_complex

* fix: spin build

* enh: impl unique_numeric for integer, float and complex

* fix: case when complex is nan

* fix: case when order is not guaranteed

* fix: lint

* fix: type of data

* fix: error

* enh: change template type for unordered_set

* enh: set function

* enh: refactoring

* enh: refactoring

* enh: refactor test

* enh: apply inline

* doc: performance and change

* enh: test check -0.0 and -np.nan

* enh: use fnv-hash for complex

* fix: add const

* fix: use memcpy

* fix: use static_cast

* fix: remove support for float

* fix: change variable name

* fix: inline

* fix: template type

* fix: padding buffer

* fix: padding buffer

* fix: use buffer

* fix: add constexpr

* fix: copy of long double

* fix: valie_real -> value_imag

* fix: reinterpret value pointer

* fix: use NPY_BITSOF_*

* fix: use fnv magic prime

* enh: refactoring

* doc: fix docs

* doc: fix docs

* enh: add comment

* fix: use sizeof to hash complex

* fix: use reinterpret_cast

* fix: impl hash_complex_large

* fix: use constexpr

* fix: define hash_complex_clongdouble

* fix: use NPY_SIZEOF_LONGDOUBLE

* fix: use IEEEl2bitsrep

* fix: remove designated initializers

* fix: comment

* fix: small fix

* fix: change case for handling bit expression

* fix: refactoring
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

01 - Enhancement 07 - Deprecation triaged Issue/PR that was discussed in a triage meeting

Projects

None yet

Development

Successfully merging this pull request may close these issues.

np.unique: support float dtypes

4 participants