WIP::ENH:SIMD Improve the performance of comparison operators#16960
WIP::ENH:SIMD Improve the performance of comparison operators#16960seiko2plus wants to merge 9 commits intonumpy:mainfrom
Conversation
numpy/core/src/umath/loops.h.src
Outdated
112b5ac to
2e1682c
Compare
2e1682c to
95c485a
Compare
9628906 to
8d4ae79
Compare
c09eecc to
cedc863
Compare
numpy/distutils/pyas_template.py
Outdated
There was a problem hiding this comment.
I suspect this would make sense in a standalone PR, along with tests and documention.
There was a problem hiding this comment.
yes, indeed. it's under experiments right now. sure I will move it later into a seprate pr along with doc and testing unit.
There was a problem hiding this comment.
It would be good to look around and see if there are existing template solutions that can be reused. tempita I think is used in some places in numpy, and jinja may be an option too.
There was a problem hiding this comment.
I tried almost everything until I figure out the most flexible template engine is the one who doesn't bring new language syntax or philosophies and that what "pyas" does "Python as a template language", its simply treat Python as a PHP and f-strings as a template. it also provides a simple translation mechanism.
There was a problem hiding this comment.
And the reason why I drop repeat template is the generated source size almost hit 9mb without finishing the rest of the work also it can't be used for generating C macros.
7602864 to
2c4415b
Compare
|
I wonder if we could refactor the dispatch mechanism to be much more limited: only have two loops: a baseline and an advanced loop, written in C. Then only use these loops via the current ufunc reassign-c-function-loops-at-import rather than the macro-based runtime mechanism via |
|
If the generated code is so large, maybe we need to rethink what we are trying to do here. |
xref, mattip#46 (comment) |
The issue in the conv_template(template repeater) that we had to count on C preprocessors for everything even with internal looping! $ python numpy/numpy/distutils/conv_template.py einsum_sumprod.c.src
$ du einsum_sumprod.c
4576 einsum_sumprod.cThis PR is covering more and more kernels than einsum. So the best thing we can do is to cut the roots from the beginning |
2c4415b to
727a2e7
Compare
|
closed in favor of #21483, while it doesn't contains all the improvements this pr has but we could add later during moving to C++ |
Don't merge! Work in progress
Summary of the changes, performance achievements, TODO list will be written later.
NOTE: Feel free to leave comment/review while I'm working on it