Implementations of SIMD instruction sets for systems which don't natively support them.
For the entire project: 656 files changed, 202635 insertions(+), 1724 deletions(-)
For just the simde
folder: 295 files changed, 47053 insertions(+), 896 deletions(-)
There are a total of 6876 SIMD functions on x86, 2930 (43.17%) of which have been implemented in SIMDe so far. Specifically for AVX-512, of the 5160 functions currently in AVX-512, SIMDe implements 1510 (29.26%).
Note: Intel has removed the intrinsics that were unique to Intel Xeon Phi (ER
, PF
, 4MAPS
, and 4VNNIW
) from their intrinsic list. SIMDe will retain those few implementations we already had, but this changes how our completeness statistics are calculated.
SIMDe currently implements 6670 out of 6670 (100.00%) NEON functions; up from 56.46% in the previous release!
Finally complete families
_Float16
usage; better INFHF/NANHF defs 8910057 @mr-c__fp16
if available aba26f6 @mr-cvcvtnq_{s32_f32,s64_f64}
: add SSE & AVX512 optimized implementations e134cc7 @mr-cvcvtnq_u32_f32
is a V8 function 8432c70 @mr-csimde_vmin_s16
6858b92 @M-HTsimde_vshll_n_XXX
intrinsics (#1064) beb1c61 @M-HTSIMDE_ARCH_ARM_FMA
7198d6d @mr-cst1{,q}_*_x{2,3,4}
: initial implementation (#1082) 879d1a0 @yyctwsimde_float16
to simde_float16_t
(#1100) 8a05dc6 @yyctwvmlaq_laneq_f32
and vcvtq_n_f64_u64
c7d314b @yyctwINT_MIN
is undefined behavior in C/C++ c200c16 @mr-csimde_mm512_add_epi32
(#1126) 6cde31c @AymenQwasm_f64x2_pmin
96d6e53 @mr-cf{32x4,64x2}_min
: add workaround for a gcc<6 issue d5d6d10 @mr-cf{32x4,64x2}_relaxed_{min,max}
9d1a34e @mr-c*_stream_{,load}
: use __builtin_nontemporal_{load,store}
6ce6030 @mr-c_mm_movelh_ps
for Arm64 514564e @mr-c_mm_movemask_ps
: remove unused code fba97e4 @mr-c_mm_testnzc_si128
edd4678 @mr-c_mm_testz_si128
: fix backwards short circuit logic f132275 @mr-csimde_mm256_shuffle_pd
fix for natural vector size < 128 1594d7c @mr-csimde_mm256_sign_epi{8,16,32}
(#1123) c376610 @Proudsalsasimde_x_mm512_set_m256{,d}
67e0c50 @mr-cfloat64x2_t
is not avail in A32V7 ae4c4ab @mr-cSIMDE_ARCH_X86_SSE4_2
define 5e4b308 @cbielowSIMDE_CONVERT_VECTOR_
impementations on PowerPC 4de999a @mr-cSIMDE_BUG_GCC_95399
was fixed in GCC 9.5, 10.4, 11.4, 12+ 3fa89c5 @mr-cSIMDE_BUG_GCC_98521
was fixed in 10.3 edde42e @mr-cSIMDE_BUG_GCC_94482
was fixed in 8.5, 9.4, 10+ 43d86a3 @mr-c_mm512{,_mask}_abs_pd
(#1118) 5405bbd @thomas-schlichtervec_bperm
bug was fixed in clang-14 6feb28a @mr-cSIMDE_BUG_CLANG_60655
is fixed in the upcoming 17.0 release 25cebbe @mr-csimde-detect-clang.h
: add clang 17 detection 923f8ac 684baa1 50d98c1 @Coeur_Float16
on ClangCL if not supported 8a6b8c5 @mr-cSIMDE_X86_SVML_NATIVE
for ClangCl c877fe5 @mr--Wno-switch-default
fdbd6b2 @mr-c{u,s}addh
apply arm64 windows workaround only on msvc<1938 (#1121) 14311d6 @Changqing-JINGFull Changelog: https://github.com/simd-everywhere/simde/compare/v0.7.6...v0.8.0
See draft release notes at https://github.com/simd-everywhere/simde/wiki/Release-Notes for changes since 0.7.6
_mm512{,_mask}_abs_pd
by @thomas-schlichter in https://github.com/simd-everywhere/simde/pull/1118
Full Changelog: https://github.com/simd-everywhere/simde/compare/v0.8.0-rc1...v0.8.0-rc2
See draft release notes at https://github.com/simd-everywhere/simde/wiki/Release-Notes
Full Changelog: https://github.com/simd-everywhere/simde/compare/v0.7.6...v0.8.0-rc1
See, I knew we should release more often!
neon/abd,ext,cmla{,_rot{180,270,90}}: additional wasm128 implementations 3a18dff @mr-c neon/cvtn: basic implementation of a few functions fefc785 @mr-c neon/mla_lane: initial implementation using mla+dup 554ab18 @ngzhian neon/shl,rshl: fix avx include to unbreak amalgamated hearders 3748a9f @mr-c neon/shll_n: make vshll_n_u32 test operational 356db0c @mr-c neon/qabs: restore SSE2 impl for vqabsq_s8 f614843 @mr-c
mmx: loogson impl promotions over SIMDE_SHUFFLE_VECTOR_ 51bf6f2 @mr-c x86/sse*,avx: add additional SIMD128 implementations e28a87e @mr-c
sse{,2,3,4.1},avx: more WASM shuffle implementations 097dd12 @mr-c sse*,avx: add additional SIMD128 implementations e28a87e @mr-c sse: allow native _mm_loadh_pi on MSVC x64 314452b @mr-c
avx512: typo fix for typedef of __mmask64 e8390a3 4a9f01a @mr-c avx512/madd: fix native alias arguments for _mm512_madd_epi16 bcf4adb @mr-c
simde-arch: #include Hedley for setting F16C for MSVC 2022+ with AVX2 f9cf467 @mr-c
tests: simde_assert_equal_{v,}f funcs were silently failing 395efd9 @mr-c tests: Quiet another Clang < v5 warning that resurfaced d9d2b45 @mr-c tests: audit use of HEDLEY_DIAGNOSTIC_PUSH and _POP 284c88a @mr-c test: ignore -Wc99-extensions e264ff5 @mr-c neon/aba: vaba_s32 test was not being run f86346a @mr-c sve/and: the svand_n_s8_m test is incomplete, mark it as such b962f07 @mr-c tests: combine declarations in test functions 76c7d37 @mr-c
docker: add wasm64 target 29db539 @mr-c
remove Drone.io fd10911 @mr-c
gh-actions: confirm that all header files are installed 8d5e05a @mr-c gh-actions: put wasm64 under CI 6702820 @mr-c
netlify: disable for now caa0929 @mr-c
meson install: arm/neon/ld1 & x86/avx512.h 27836b1 @mr-c Update clang version detection for 14..16 and add link 4957a9e @jan-wassenberg
/ARCH:AVX
, /ARCH:AVX2
, and /ARCH:AVX512
There are a total of 7470 SIMD functions on x86, 2971 (39.77%) of which have been implemented in SIMDe so far. Specifically for AVX-512, of the 5270 functions currently in AVX-512, SIMDe implements 1439 (27.31%)
SIMDe currently implements 3745 out of 6670 (56.15%) NEON functions. If you don't count 16-bit floats and poly types, it's 3745 / 4969 (75.37%).
Overall, SIMDe implementents 40 of 533 (7.50%) functions from MSA.
_vadd{lvq_s8, lvq_s16, lvq_u8, vq_u8}
8b4e375 dfffdde @mr-c__builtin_convertvector
implementations d06ea5b @nemequ_mm_cvttsd_si64
48edfa9 @nemequ__builtin_shufflevector
implementation de8fe89 @ngzhian_mm_alignr_{,e}pi8
implementations 6d28f04 @nemequ__builtin_shufflevector
optimizations 4003afa @ngzhian__builtin_shufflevector
if available ea3f75e @ngzhian_mm_fnmadd_*
implementations of vmls*_f* 70e0c20 @nemequrhaddq_xxx
f730009 @aqrit_mm_prefetch
implementation 26d515f @nemequ_mm_cvtps_epi32
b5fbe39 @jpcima_mm_loadu_si{16,32}
on GCC 4b7394f @nemequvcvtnq_s32_f32
on GCC e11258e @jpcima_mm_insert_ps
: incorrect handling of the control 94e7569 @MirJawadMairaj_mm_cmpgt_epi64
7117c48 @aqrit_mm_test{nz,}c_si128
e7c70a2 @mr-c_mm256_{load,store}u_m128{,i,d}
on LCC a3a39e2 @nemequ_mm{,256,512}_cmp_{ps,pd,ss,sd}
491d3fa @nemequ_mm256_insertf128_{pd,ps,si256}
bab30bb @LaurentThomas_mm256_madd_epi16
2c2dd73 @nemequ_mm_abs_epi64
5c2f423 @aqrit_mm512_mask_cmpeq_epi8_mask
88d2faf @nemequ_mm256_mask_compress_pd
d1223d4 @simba611_mm256_mask(z)_compress(storeu)_p*
a7386b5 @simba611_mm512_cvtepu32_ps
`_mm{_mask,_maskz}_cvtepi64_pd 292e1e2 @nemequ_mm{_mask,_maskz}_cvttpd_epi64
d2f518a e842f29 @nemequ_fmsub_
functions for AXV512VL b7df811 @simba611_mm512_set_epi8
only when available aa5746f @nemequ__mm
functions to __simde_mm
96c963f @psaab__ARM_FEATURE_FP16_VECTOR_ARITHMETIC
to detect Arm support eaeac09 @nemequ_mm_fmadd_ps
20922ff @nemequ__builtin_shufflevector
is misbehaving 23a2441 @mr-cCurrently non-functional. Jobs queue, but are eventually killed before they start running. Assistance fixing that is welcome!
Currenttly non-functional, partially replaced by the s390x quemu GitHub Action build. See https://github.com/simd-everywhere/simde/issues/903 for the status of POWER9 (ppc64le)
Currently failing for old GCC-5
__alignof__
38e3840 @nemequ&& 0
s in preprocessor macros. b6f21a9 @nemequstatic const
in simde-math.h. NFC 6bd6562 @sbc100__builtin_roundeven{f,}
even if defined 51b9941 @mr-c__builtin_signbit
: add cast to double for old Clang versions 160c161 @mr-cFull Changelog: https://github.com/simd-everywhere/simde/compare/v0.7.2...v0.7.4
Post v0.7.0 fixes; more portable implementations of neon intrinsics
_mm_{load,store}u_si{16,32,64}
intrinsics are now implemented along with the SSE _MM_HINT_*
defines.Please see the 0.7-rc-1 and 0.7.0-rc2 release notes for more details.
neon/orn: add AVX-512VL (ternarylogic) implementations d667aa8 @nemequ neon/ld3, neon/ld4: disable -Wmaybe-uninitialized on GCC eaaa71f @nemequ
sse: cast _MM_HINT_*
values to enum _mm_hint
on GCC 3f7e6f7 @nemequ
avx512/permutex2var: add remaining intrinsics and translations 5d8d9d2
math: add modf 580e401 @nemequ
Cleanups of SIMDE_BUG_*
definitions e090746 @mr-c