sse2neon icon indicating copy to clipboard operation
sse2neon copied to clipboard

Strategy on non-SSE intrinsics

Open jserv opened this issue 5 years ago • 8 comments

sse2neon aims to support SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2 and AES extension, and AVX intrinsics would be excluded.

@danlark1 pointed out:

Technically speaking, _mm_fmadd_ps is not an SSE extension, this was introduced with fma extension which took place even after AVX.

We do need to think of the strategy on non-SSE intrinsics to ease the platform transition efforts.

jserv avatar Jul 21 '20 09:07 jserv

Greetings, I'm the person who packaged sse2neon for the MacPorts package management system. Wouldn't it make more sense to collaborate with the SIMDe project? Their README claims that they are already incorporating your sse2neon.h header directly into their code base, and they already also have header files that provide support for AVX, FMA, and lots of other intrinsics.

jasonliu-- avatar May 24 '21 00:05 jasonliu--

The only downside of SIMDe project I see is the inability to do drop-in replacements as all intrinsics are prefixed with simde_, I know the users which are much more willing to update headers for all their code and dependencies rather than updating all call sites. If we can collaborate with simde on that, that would be great

danlark1 avatar May 24 '21 00:05 danlark1

Greetings, I'm the person who packaged sse2neon for the MacPorts package management system. Wouldn't it make more sense to collaborate with the SIMDe project? Their README claims that they are already incorporating your sse2neon.h header directly into their code base, and they already also have header files that provide support for AVX, FMA, and lots of other intrinsics.

simde already merged most parts of SSE2NEON efforts. See simde #499 for details. Here, we focus on Arm/Aarch64 specific tweaks, which can eventually get merged into simde.

jserv avatar May 24 '21 01:05 jserv

The only downside of SIMDe project I see is the inability to do drop-in replacements as all intrinsics are are prefixed with simde_

Doesn't that completely defeat the purpose of a translator/mapping?

jasonliu-- avatar May 24 '21 03:05 jasonliu--

The only downside of SIMDe project I see is the inability to do drop-in replacements as all intrinsics are are prefixed with simde_

Doesn't that completely defeat the purpose of a translator/mapping?

from the readme:

If you define SIMDE_ENABLE_NATIVE_ALIASES before including SIMDe you can use the same names as the native functions.

aqrit avatar Aug 17 '21 20:08 aqrit

Finally, regarding SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, and AES extension, we have achieved 100% coverage, meaning that SSE2NEON somehow outperforms SIMDe in respect of SSE intrinsics. It is time to pursue translating AVX. Header avx2neon.h of Intel Embree is an excellent starting point when translating more AVX intrinsics.

avx2intrin-emu.h, avxintrin-emu.h, and avxintrin-neon.h from jsource can be checked as well.

jserv avatar Dec 26 '22 05:12 jserv