Strategy on non-SSE intrinsics
sse2neon aims to support SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2 and AES extension, and AVX intrinsics would be excluded.
@danlark1 pointed out:
Technically speaking,
_mm_fmadd_psis not an SSE extension, this was introduced with fma extension which took place even after AVX.
We do need to think of the strategy on non-SSE intrinsics to ease the platform transition efforts.
Greetings, I'm the person who packaged sse2neon for the MacPorts package management system. Wouldn't it make more sense to collaborate with the SIMDe project? Their README claims that they are already incorporating your sse2neon.h header directly into their code base, and they already also have header files that provide support for AVX, FMA, and lots of other intrinsics.
The only downside of SIMDe project I see is the inability to do drop-in replacements as all intrinsics are prefixed with simde_, I know the users which are much more willing to update headers for all their code and dependencies rather than updating all call sites. If we can collaborate with simde on that, that would be great
Greetings, I'm the person who packaged sse2neon for the MacPorts package management system. Wouldn't it make more sense to collaborate with the SIMDe project? Their
READMEclaims that they are already incorporating yoursse2neon.hheader directly into their code base, and they already also have header files that provide support for AVX, FMA, and lots of other intrinsics.
simde already merged most parts of SSE2NEON efforts. See simde #499 for details. Here, we focus on Arm/Aarch64 specific tweaks, which can eventually get merged into simde.
The only downside of SIMDe project I see is the inability to do drop-in replacements as all intrinsics are are prefixed with
simde_
Doesn't that completely defeat the purpose of a translator/mapping?
The only downside of SIMDe project I see is the inability to do drop-in replacements as all intrinsics are are prefixed with
simde_Doesn't that completely defeat the purpose of a translator/mapping?
from the readme:
If you define
SIMDE_ENABLE_NATIVE_ALIASESbefore including SIMDe you can use the same names as the native functions.
Finally, regarding SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, and AES extension, we have achieved 100% coverage, meaning that SSE2NEON somehow outperforms SIMDe in respect of SSE intrinsics. It is time to pursue translating AVX. Header avx2neon.h of Intel Embree is an excellent starting point when translating more AVX intrinsics.
avx2intrin-emu.h, avxintrin-emu.h, and avxintrin-neon.h from jsource can be checked as well.