SIMD: Add partial/non-contig load and store intrinsics for 32/64-bit#17340
SIMD: Add partial/non-contig load and store intrinsics for 32/64-bit#17340charris merged 1 commit intonumpy:masterfrom
Conversation
ed975c8 to
b699b95
Compare
b699b95 to
b7761ba
Compare
|
This seems to build on PR gh-16782, correct? |
19fd9fd to
e7e4699
Compare
|
All tests are successfully passed, I will move testing units of the new initrinics to #16782 so we can merge this pr. |
0744337 to
24b5841
Compare
…-bit This patch improves the implementation of memory load/store for VSX
bec733b to
1b8637d
Compare
|
@mattip, These intrinsics already been used by #17587 and #16247 and proved a good efficiency almost similar to the replacement raw SIMD in case of AVX2 and AVX512F, provide massive improvements for non-contiguous memory access I hope we can merge this pull-request as soon as possible. |
|
@seiko2plus I notice that you are still making commits here. Do you feel that there is more to do? |
|
I was hoping to merge #16782 first, thinking that then we might be able to add some (maybe marked |
|
@charris, no, the last change I made on this pr was 17 days ago, the other messages due to build #16247 and #17587 on the top of this pr(reference commit).
I totally agree with you without testing cases it would be chaos.
there's no need for |
|
Thanks Sayed. |
This patch implements NPYV intrinsics for partial and non-contiguous memory access,
which paves the way to replace the raw SIMD kernels in
simd.inc.srcwith the universal intrinsics.required by #16247