ENH, SIMD: Add new NPYV intrinsics pack(0)#17789
Conversation
- add bitfield conversion for boolean vectors - add reverse elements of each 64-bit lane - add testing cases
8f57fc7 to
8ec494e
Compare
806968f to
dcbfe24
Compare
|
Looks good to me, I will refactor #17102 and give a new benchmark test by using new intrinsics here. |
| { | ||
| return _mm256_shuffle_ps(a, a, _MM_SHUFFLE(2, 3, 0, 1)); | ||
| } | ||
|
|
There was a problem hiding this comment.
why not add reverse f64 api here?
There was a problem hiding this comment.
since this intrinsic only reverse vector elements on each 64-bit width, same as neon vrev64q_*
| #define npyv_cvt_b64_f64 _mm256_castpd_si256 | ||
|
|
||
| // convert boolean vector to integer bitfield | ||
| NPY_FINLINE npy_uint64 npyv_tobits_b8(npyv_b8 a) |
There was a problem hiding this comment.
The name is a litter confused, how about signmask or movemask?
There was a problem hiding this comment.
this intrinsic isn't replaceable for movemask, since it accepts only boolean vector where the value must be
(1 << lane_size_inbits) - 1// 0xff**** or 0 and thats why we should get a better perfomance on aarch64.
| } | ||
| #define npyv_rev64_s16 npyv_rev64_u16 | ||
|
|
||
| NPY_FINLINE npyv_u8 npyv_rev64_u8(npyv_u8 a) |
There was a problem hiding this comment.
npyv_rev64_u8 should be in front of npyv_rev64_u16
There was a problem hiding this comment.
no, we need to reuse npyv_rev64_u16() within npyv_rev64_u8() for emulating when SSSE3 isn't available
|
I think we should wait till after the 1.20 release is branched off master to put this in. |
|
@mattip, no problem |
|
@mattip, Is there any possibility to merge it soon? |
mattip
left a comment
There was a problem hiding this comment.
looks OK from what I understand. Just a minor nit about formatting comments.
| *************************************************************************/ | ||
| //######################################################################### | ||
| //## Defining NPYV intrinsics as module functions | ||
| //######################################################################### |
There was a problem hiding this comment.
Why this change? This is C code and should use C comments
There was a problem hiding this comment.
It's still C99 comment, I had to add a special mark to separate the parent sections of the main template loops to reduce the dispersion. this source file contains two main loops one for defining c functions of module methods and another one to attach it into a static array of PyMethodDef.
There was a problem hiding this comment.
I don't understand. Is there something special about the parsing here that is different from the other *.c.src files?
There was a problem hiding this comment.
no there's no something special about it, except that we're dealing with a huge number of functions that need to be organized in a proper format.
//#########################################################################
//## A comment separating the main sections
//#########################################################################
/*********************************************************************************************************
** A comment separating the subsections of SIMD operations, such as memory, reorder ..etc
********************************************************************************************************/
// a comment for each group of intriniscsThere was a problem hiding this comment.
got it, thanks for the explanation.
| *************************************************************************/ | ||
| //######################################################################### | ||
| //## Defining a separate module for each target | ||
| //######################################################################### |
|
Thanks @seiko2plus |
Add new NPYV intrinsics pack(0)
TODO: