PERF: Improve performance of special attribute lookups#21423
PERF: Improve performance of special attribute lookups#21423seberg merged 4 commits intonumpy:mainfrom
Conversation
Cache PyUniCode object with value __array_ufunc__ for attribute lookup
8dc6a24 to
ee8c683
Compare
|
Oh, I had not realized that this branch kicks in for our scalars. I think this makes sense in any case, although I am curious about cleaning the code up a bit more. Would you be up for that? The main point is that I am pretty sure there are two things to note here:
There are two ways to go about, either introduce a I don't want to overstretch you though. So if you like, I could also try to do the larger refactor and you can review it instead? |
|
@seberg I can take a shot a refactoring. There is an issue though with both A screenshot of profiling results from calculating About 37% of the There are some options to eliminate the exception generation:
Since I an new to numpy I cannot see the implications of these options (and perhaps there are more), so I welcome your thoughts on this. |
|
@seberg The attribute lookup is slow due to the fact that numpy looks for the attribute on the type and not the instance The following benchmarks shows what happens: Result: The difference in execution times is because in the first 2 tests we follow this path: and in the last two we follow
Not sure why the attribute lookup for types should be slower, I will look into this. |
|
About the earlier option:
Looking up on the type is correct, I am surprised that looking up on the instance is so much faster! Squinting at the Python code, it might be to do with metaclass handling, but these don't even have a metaclass (well besides But, no matter all of those other things, storing the interned string somewhere and working with that exclusively still seems like a good start, that is independend of the other things you found? |
|
The third option works: results in a 5% speedup (and perhaps when added on the C side). I will first complete refactoring the unicode string conversion and then continue on the attribute lookup. |
|
Yeah, but I had not double checked that it will not have any effect on binary operators in weird subclassing situation. Not that this is likely to matter in practice :/. Defining The alternative would be to use Squinting at it, if it is important to not slow down unknown/arbitrary objects in some of these places, we could even add a bloom filter (a bloom filter is like a mini Lets focus on the other things first! Interesting CPython issue, so it is the error formatting mainly/only? I guess CPython would have to add a fast-path for |
|
Ah, so you know what I was looking at. The binary operator logic in question is the logic following here: numpy/numpy/core/src/common/binop_override.h Line 131 in d40f630 |
…ance Refactor code so the unicode names are interned
Frankly, we should not use a header anymore there, it should be a C file now, but that is better for another PR/day.
|
Going to put this in once CI is happy, thanks @eendebakpt! We should have a few benchmarks that notice this just fine. At least one of the array-coercion does Further, the benchmarks for |
|
@eendebakpt since you managed to run the benchmarks. If you like, it would be cool to followup with a small ufunc benchark to test ufunc call overhead on NumPy scalars (and maybe also 0-D or just very small arrays). But, I am happy with these changes, so will put them in, thanks. |

This PR reduces the overhead of
ufunc_generic_fastcall, which is used by many numpy methods. For many input arguments a check is performed on the existence of the__array_ufunc__attribute. The check involves a call totp_getattrofromPyTypeObjectwhich requires aPyUniCodeobject.This PR avoids construction of the
PyUniCodefor each invocation of the attribute lookup.Benchmark
Results
Results of numpy benchmarks
Benchmark:The results show some tests with improved and some with decreased performance. It looks like some instability on my system.
Most changes are in
bench_ufunc_strides.BinaryInt, on a re-run that values changed and the increases or decreases do not look systematic.Full results