Skip to content

Conversation

@eendebakpt
Copy link
Contributor

We can improve performance of np.linalg.det for small arrays by up to 40% with 3 changes:

  • Move _assert_stacked_2d check into _assert_stacked_square
  • Improve performance of _commonType by using a cache
  • Avoid r.astype(...) for scalar arguments making a copy of the data (and internally converting to an array and back)

In this PR we perform the first step.

@eendebakpt eendebakpt marked this pull request as draft April 4, 2025 15:18
@eendebakpt eendebakpt marked this pull request as ready for review April 4, 2025 16:01
Copy link
Contributor

@tylerjereddy tylerjereddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far asv isn't showing much perf improvement on this branch on x86_64 Linux (i9-13900K):

asv continuous -E virtualenv -e -b "time_det.*" main linalg_refactor

BENCHMARKS NOT SIGNIFICANTLY CHANGED.

It may be because you're only making the first of the series of proposed changes.

Regardless of the performance changes, I suppose this is a reduction in lines of code, so maybe "ok" on its own anyway.

if issubclass(type_, inexact):
if isComplexType(type_):
is_complex = True
is_complex = is_complex or isComplexType(type_)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really worth doing? It takes my brain longer to process and shouldn't matter much performance-wise?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not sure this routine is worth it, but if one goes for it, I'd start with something like

types = set(a.dtype.type for a in arrays)

which will generally reduce the number already, and then do

is_complex = any(isComplexType(type_) for type_ in types)

and something along similar lines for result_type (but with a check whether one can really not simply use the built-in np.result_type - not obvious to me).

But I'd do it in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea! It might not work out because the common use case is with len(arrays) just 1 or 2 and the set overhead is too large. I will remove this change here and test out in a new PR.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the cleanup, though I'm not surprised it doesn't have that much of an effect on speed. A suggestion in-line for an extra (if very minor) performance boost.

Also, would suggest to do just the removal of _assert_stacked_square here.

if issubclass(type_, inexact):
if isComplexType(type_):
is_complex = True
is_complex = is_complex or isComplexType(type_)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not sure this routine is worth it, but if one goes for it, I'd start with something like

types = set(a.dtype.type for a in arrays)

which will generally reduce the number already, and then do

is_complex = any(isComplexType(type_) for type_ in types)

and something along similar lines for result_type (but with a check whether one can really not simply use the built-in np.result_type - not obvious to me).

But I'd do it in a separate PR.

w = gufunc(a, signature=signature)
return w.astype(_realType(result_t), copy=False)

def _convertarray(a):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch that this is not actually used!


def _assert_stacked_square(*arrays):
for a in arrays:
if a.ndim < 2:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the combination. If one really wants to get out the most, it could be

try:
    m, n = a.shape[-2:]
except ValueError:
    riase LinAlgError(f"{a.ndim}-dimensional...") from None
if m != n:
   ...

Using that these days try/except has no cost if no exception is raised.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, pushed this after submitting the review - above is the most relevant comment! (and not very relevant at that!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _assert_stacked_square is about 10% faster using your suggestion, I updated the PR with this.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks! Let's get it in.

@mhvk mhvk merged commit 422ca44 into numpy:main Apr 9, 2025
69 of 71 checks passed
@eendebakpt
Copy link
Contributor Author

Thanks for reviewing! Next PR is #28686

MaanasArora pushed a commit to MaanasArora/numpy that referenced this pull request Apr 11, 2025
…8649)

* ENH: Improve np.linalg.det performance

* Update numpy/linalg/_linalg.py

* revert change to complex detection

* use suggestion

* whitespace

* add more small array benchmarks

* trigger build
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants