Simde Versions Save

Implementations of SIMD instruction sets for systems which don't natively support them.

v0.8.0

1 month ago

SIMDe 0.8.0

Summary

  • Complete set of implementations for all NEON intrinsics have been finished, up from 56.46% in the previous release! (@yyctw @wewe5215)
  • SIMDe PRs are tested using Fedora Rawhide (@junaruga)

For the entire project: 656 files changed, 202635 insertions(+), 1724 deletions(-)
For just the simde folder: 295 files changed, 47053 insertions(+), 896 deletions(-)

X86

There are a total of 6876 SIMD functions on x86, 2930 (43.17%) of which have been implemented in SIMDe so far. Specifically for AVX-512, of the 5160 functions currently in AVX-512, SIMDe implements 1510 (29.26%).

Note: Intel has removed the intrinsics that were unique to Intel Xeon Phi (ER, PF, 4MAPS, and 4VNNIW) from their intrinsic list. SIMDe will retain those few implementations we already had, but this changes how our completeness statistics are calculated.

Newly added function families

  • AES: 5 of 6 (83.33%)

Newly AVX512 added function families

Additions to existing families

  • AVX512BW: 7 additional, 337 of 790 (42.66%)
  • AVX512DQ: 5 additional, 112 total of 376 (29.79%)
  • AVX512F: 48 additional, 1087 total of 2812 (38.66%)
  • AVX512_FP16: 15 additional, 17 total of 1105 (1.54%)

Neon

SIMDe currently implements 6670 out of 6670 (100.00%) NEON functions; up from 56.46% in the previous release!

Newly added families

  • abal
  • abal_high
  • abd
  • abdh
  • abdl_high
  • addhn_high
  • aes
  • bfdot
  • bfdot_lane
  • cadd_rot
  • cale
  • calt
  • cmla_lane
  • cmla_rot_lane
  • copy_lane
  • cvt_high
  • cvt_n
  • cvta
  • cvtn
  • cvtp
  • cvtx
  • cvtx_high
  • div
  • dupb_lane
  • duph_lane
  • eor3
  • fmlal
  • fms
  • fms_lane
  • fms_n
  • ld2_dup
  • ld2_lane
  • ld3_dup
  • ld3_lane
  • ld4_dup
  • maxnmv
  • minnmv
  • mla_lane
  • mla_high_lane
  • mls_lane
  • mlsl_high_lane
  • mmla
  • mull_high_lane
  • mull_high_n
  • mulx
  • mulx_lane
  • pmaxnm
  • pminnm
  • qdmlal
  • qdmlal_high
  • qdmlal_high_lane
  • qdmlal_high_n
  • qdmlal_lane
  • qdmlal_n
  • qdmlsl
  • qdmlsl_high
  • qdmlsl_high_lane
  • qdmlsl_high_n
  • qdmlsl_lane
  • qdmlsl_n
  • qdmlslh
  • qdmlslh_lane
  • qdmulhh
  • qdmulhh_lane
  • qdmull_high
  • qdmull_high_lane
  • qdmull_high_n
  • qdmull_lane
  • qdmull_n
  • qdmullh_lane
  • qmovun_high
  • qrdmlah
  • qrdmlah_lane
  • qrdmlahh
  • qrdmlahh_lane
  • qrdmlsh
  • qrdmlsh_lane
  • qrdmlshh
  • qrdmlshh_lane
  • qrdmulhh_lane
  • qrshl
  • qrshlh
  • qrshrn_high_n
  • qrshrnh_n
  • qrshrun_high_n
  • qrshrunh_n
  • qshl_n
  • qshlh_n
  • qshluh_n
  • qshrn_high_n
  • qshrnh_n
  • qshrun_high_n
  • qshrunh_n
  • raddhn
  • raddhn_high
  • rax
  • recp
  • rnd32x
  • rnd32x
  • rnd32x
  • rnd64z
  • rnda
  • rndx
  • rshrn_high_n
  • rsubhn
  • rsubhn
  • set_lane
  • sha1
  • sha1h
  • sha256
  • sha512
  • shll_high_n
  • shrn_high_n
  • sli_n
  • sm3
  • sm4
  • sqrt
  • st1_x2
  • st1_x3
  • st1_x4
  • st1q_x2
  • st1q_x3
  • st1q_x4
  • subhn_high
  • sudot_lane
  • usdot
  • usdot_lane

Finally complete families

  • cvtn
  • mla_lane

Details

  • simde-f16: improve _Float16 usage; better INFHF/NANHF defs 8910057 @mr-c
  • simde_float16: prefer __fp16 if available aba26f6 @mr-c

Implementation of Arm intrinsics

NEON

  • cvtn: vcvtnq_{s32_f32,s64_f64}: add SSE & AVX512 optimized implementations e134cc7 @mr-c
  • cvtn: vcvtnq_u32_f32 is a V8 function 8432c70 @mr-c
  • min: Remove non-working MMX specialization from simde_vmin_s16 6858b92 @M-HT
  • shll: Extend constant range in simde_vshll_n_XXX intrinsics (#1064) beb1c61 @M-HT
  • various: Implement some f16XN types and f16 related intrinsics. (#1071) aae2245 @yyctw
  • qtbl/qtbx polyfills for A32V7 a2fef9e @easyaspi314
  • arm: use SIMDE_ARCH_ARM_FMA 7198d6d @mr-c
  • arm neon: Complex operations from Armv8.3-a (#1077) d08d67c @wewe5215
  • more fp16 using intrinsics supported by architecture v7 (skip version) (#1081) 5e7c4d4 @yyctw
  • st1{,q}_*_x{2,3,4}: initial implementation (#1082) 879d1a0 @yyctw
  • part 1 of implement all intrinsics supported by architecture A64 (#1090) 2eedece @yyctw
  • Add AES instructions. 23adcd2 805ccd2 @yyctw
  • Modified simde_float16 to simde_float16_t (#1100) 8a05dc6 @yyctw
  • implement all intrinsics supported by architecture A64-remaining part (#1093) 018ba24 @yyctw
  • add enable vmlaq_laneq_f32 and vcvtq_n_f64_u64 c7d314b @yyctw
  • implement all bf16-related intrinsics (#1110) c59db7c @yyctw
  • arm/neon abs: negating INT_MIN is undefined behavior in C/C++ c200c16 @mr-c

SVE Intrinsics

  • Improve performance of simde_mm512_add_epi32 (#1126) 6cde31c @AymenQ

WASM intrinsics

  • simd128: fix altivec_p7 version of wasm_f64x2_pmin 96d6e53 @mr-c
  • simd128: add missing unsigned functions ea5e283 @mr-c
  • simd128 f{32x4,64x2}_min: add workaround for a gcc<6 issue d5d6d10 @mr-c
  • detect support for Relaxed SIMD mode 2e66dd4 @mr-c
  • simd128/relaxed: begin MIPS implementations db8ad84 @mr-c
  • relaxed: add f{32x4,64x2}_relaxed_{min,max} 9d1a34e @mr-c
  • relaxed: updated names; reordered FMA operations 8cc8874 @mr-c

x86 intrinsics

  • sse{,2,4.1}, avx{,2} *_stream_{,load}: use __builtin_nontemporal_{load,store} 6ce6030 @mr-c

SSE*

  • sse: Fix issues related to MXCSR register (#1060) 653aba8 @M-HT
  • sse: implement _mm_movelh_ps for Arm64 514564e @mr-c
  • sse _mm_movemask_ps: remove unused code fba97e4 @mr-c
  • sse2 mm_pause: more archs, add a basic test 692a2e8 @mr-
  • sse4.1: use logical OR instead of bitwise OR in neon impl of _mm_testnzc_si128 edd4678 @mr-c
  • sse4.1 _mm_testz_si128: fix backwards short circuit logic f132275 @mr-c

AVX

  • run test from #926 ce9708c @mr-c
  • simde_mm256_shuffle_pd fix for natural vector size < 128 1594d7c @mr-c

AVX2

  • correction of simde_mm256_sign_epi{8,16,32} (#1123) c376610 @Proudsalsa

AVX512

  • fpclass: naive implementation 353bf5f @mr-c
  • loadu: fix native detection 305f434 @mr-c
  • set: add simde_x_mm512_set_m256{,d} 67e0c50 @mr-c
  • gather: add MSVC native fallbacks 7b7e3f6 @mr-c
  • AVX512FP16 / m512h initial support e97691c @mr-c
  • fix many native aliases 75014b9 @mr-c

CLMUL

  • fix natives, some require VPCLMULQDQ f819c52 @mr-c

SVML

  • enable SIMDE_X86_SVML_NATIVE for MSVC 2019+ 593af95 @mr-c

AES

  • aes: initial implementation of most aes instructions (#1072) 8632391 @Vineg

MIPS MSA intrinics

  • msa neon impl: float64x2_t is not avail in A32V7 ae4c4ab @mr-c

Arch support

x86(-64)

  • fix SIMDE_ARCH_X86_SSE4_2 define 5e4b308 @cbielow

arm64

  • x86 aes: add neon implementation using the crypto extension fb3554f @mr-

Altivec

  • neon/st1: disable last remaining AltiVec implementation 0521245 @mr-c

Power

  • sse2,wasm simd128: skip SIMDE_CONVERT_VECTOR_ impementations on PowerPC 4de999a @mr-c
  • wasm simd128: more powerpc fixes 7cb5691 @mr-c

Compiler Specific

GCC

  • GCC AVX512F: SIMDE_BUG_GCC_95399 was fixed in GCC 9.5, 10.4, 11.4, 12+ 3fa89c5 @mr-c
  • GCC x86/x64: SIMDE_BUG_GCC_98521 was fixed in 10.3 edde42e @mr-c
  • GCC x86: SIMDE_BUG_GCC_94482 was fixed in 8.5, 9.4, 10+ 43d86a3 @mr-c
  • Add workaround for GCC bug 111609 fdafd8e @M-HT
  • arm neon ld2: silence warnings at -O3 on gcc risc-v 8f56628 @mr-c
  • avx512 abs: refine GCC compiler checks for _mm512{,_mask}_abs_pd (#1118) 5405bbd @thomas-schlichter

Clang

  • clang powerpc: vec_bperm bug was fixed in clang-14 6feb28a @mr-c
  • clmul: aarch64 clang has difficulties with poly64x1_t 1e1bd76 @mr-c
  • aarch64: optimization bug 45541 was fixed in clang-15 7ca5712 @mr-c
  • A32V7: Don't trust clang for load multiple on A32V7 927f141 @easyaspi314
  • wasm: SIMDE_BUG_CLANG_60655 is fixed in the upcoming 17.0 release 25cebbe @mr-c
  • simde-detect-clang.h: add clang 17 detection 923f8ac 684baa1 50d98c1 @Coeur

ClangCL

  • fp16: don't use _Float16 on ClangCL if not supported 8a6b8c5 @mr-c
  • svml: don't enable SIMDE_X86_SVML_NATIVE for ClangCl c877fe5 @mr-

Emscripten

  • emcc tot: set -Wno-switch-default fdbd6b2 @mr-c

MSVC

  • avx512 types: avoid using native AVX512 types on MSVC unless required 029d749 @mr-c
  • arm neon: {u,s}addh apply arm64 windows workaround only on msvc<1938 (#1121) 14311d6 @Changqing-JING

Testing with Docker/Podman & CI

  • Update recipe for qemu git mode 54b8c8f @mr-c
  • riscv64 gcc: typo fix for endian little 7423339 @mr-c
  • add new cross sets; Ubuntu Focal and Bionic support b0b9710 @mr-c
  • native tests: also AVX512, MSA; fix WASM SIMD128 path bdd075b @mr-c
  • test-flags: support the x86 microarchitecture levels 518b777 @mr-c
  • ignore common build paths b3689ea @mr-c

Appveyor

  • preserve test log 9815161 @mr-c
  • save meson log on error 5207d83 @mr-

Circle CI

  • circleci: clang, set -Wno-unsafe-buffer-usage 24c93c2 @mr-c

GitHub Actions

  • upgrade qemu ; fixes remaining ppc64el fails! e91944b @mr-c
  • tidy matrix ordering for easier to read job names b52ac36 @mr-c
  • add clang-qemu: aarch64, riscv64, ppc64el, s390x 8a6dbab @mr-c
  • test armv7 with gcc-12 via qemu 8cd8de1 @mr-c
  • add armel to gcc and clang qemu matrices 4ca849b @mr-c
  • add armv7 to clang-qemu matrix a144aca @mr-c
  • use GCC 12 for adv x64 native testing + AVX512FP f156b41 @mr-c
  • expand mac-os/xcode testing matrix 8055410 @mr-c
  • fix macos-13+brew failure c6149de @mr-c
  • test with clang-16 e25ced8 @mr-c
  • add gcc-13 43ac8fc @mr-c
  • simplify x86 ISA matrix 6b7c1b3 @mr-c
  • run on commits to the primary branch to prime the cache 6055bfb @mr-c
  • build(deps): bump actions/checkout from 3 to 4 149d0af @dependabot[bot]
  • build(deps): bump github/codeql-action from 2 to 3 (#1138) 5026e66 @dependabot[bot]
  • build(deps): bump actions/setup-python from 4 to 5 (#1137) 2768da8 @dependabot[bot]
  • build(deps): bump actions/setup-dotnet from 3 to 4 (#1135) ed382cb @dependabot[bot]
  • build(deps): bump ad-m/github-push-action from 0.6.0 to 0.8.0 (#1134) 193be1b @dependabot[bot]
  • add new repo for clang-16 7ebd267 @mr-c
  • add clang-17 (#1127) d31de99 @mr-c
  • test mips64el using qemu on gcc12/clang16 934d86d @mr-c
  • disable {clang,gcc}-qemu mips64el; needs newer Ubuntu version 471a342 @mr-c
  • test WASM Relaxed SIMD da0604f @mr-c

Packit CI

  • Start testing SIMDe PRs using Fedora Rawhide d64b103 6ae0763 b309d89 4d55fc2 643c419 @junaruga

Travis

  • restart testing with Travis CI 93905f5 @mr-c

Misc

  • README: mark F16C as complete 2d87cf5 @mr-c
  • README: Give credit to creator/maintainer of the vcpkg for SIMDe ceb1e73 @mr-c
  • README: related projects: add AvxToNeon 13bf92a @mr-c
  • README: add more background links for supported ISAs c76450d @mr-c
  • README: turn Packit CI link into a deep link e9e1901 @mr-c
  • README: NEON is complete 7412139 @mr-c
  • docs: explain how to target a single test 2158ac7 @mr-c

New Contributors

Full Changelog: https://github.com/simd-everywhere/simde/compare/v0.7.6...v0.8.0

v0.8.0-rc2

2 months ago

See draft release notes at https://github.com/simd-everywhere/simde/wiki/Release-Notes for changes since 0.7.6

What's Changed since RC1

New Contributors

Full Changelog: https://github.com/simd-everywhere/simde/compare/v0.8.0-rc1...v0.8.0-rc2

v0.8.0-rc1

5 months ago

See draft release notes at https://github.com/simd-everywhere/simde/wiki/Release-Notes

New Contributors

Full Changelog: https://github.com/simd-everywhere/simde/compare/v0.7.6...v0.8.0-rc1

v0.7.6

11 months ago

Summary

See, I knew we should release more often!

Details

Implementation of Arm intrinsics

NEON

neon/abd,ext,cmla{,_rot{180,270,90}}: additional wasm128 implementations 3a18dff @mr-c neon/cvtn: basic implementation of a few functions fefc785 @mr-c neon/mla_lane: initial implementation using mla+dup 554ab18 @ngzhian neon/shl,rshl: fix avx include to unbreak amalgamated hearders 3748a9f @mr-c neon/shll_n: make vshll_n_u32 test operational 356db0c @mr-c neon/qabs: restore SSE2 impl for vqabsq_s8 f614843 @mr-c

x86 intrinsics

mmx: loogson impl promotions over SIMDE_SHUFFLE_VECTOR_ 51bf6f2 @mr-c x86/sse*,avx: add additional SIMD128 implementations e28a87e @mr-c

SSE*

sse{,2,3,4.1},avx: more WASM shuffle implementations 097dd12 @mr-c sse*,avx: add additional SIMD128 implementations e28a87e @mr-c sse: allow native _mm_loadh_pi on MSVC x64 314452b @mr-c

AVX512

avx512: typo fix for typedef of __mmask64 e8390a3 4a9f01a @mr-c avx512/madd: fix native alias arguments for _mm512_madd_epi16 bcf4adb @mr-c

Arch support

simde-arch: #include Hedley for setting F16C for MSVC 2022+ with AVX2 f9cf467 @mr-c

Testing with Docker/Podman & CI

tests: simde_assert_equal_{v,}f funcs were silently failing 395efd9 @mr-c tests: Quiet another Clang < v5 warning that resurfaced d9d2b45 @mr-c tests: audit use of HEDLEY_DIAGNOSTIC_PUSH and _POP 284c88a @mr-c test: ignore -Wc99-extensions e264ff5 @mr-c neon/aba: vaba_s32 test was not being run f86346a @mr-c sve/and: the svand_n_s8_m test is incomplete, mark it as such b962f07 @mr-c tests: combine declarations in test functions 76c7d37 @mr-c

Local testing with Docker/Podman

docker: add wasm64 target 29db539 @mr-c

Drone.io

remove Drone.io fd10911 @mr-c

GitHub Actions

gh-actions: confirm that all header files are installed 8d5e05a @mr-c gh-actions: put wasm64 under CI 6702820 @mr-c

Netlify

netlify: disable for now caa0929 @mr-c

Misc

meson install: arm/neon/ld1 & x86/avx512.h 27836b1 @mr-c Update clang version detection for 14..16 and add link 4957a9e @jan-wassenberg

v0.7.4

1 year ago

SIMDe 0.7.4

Summary

  • Minimum meson version is now 0.54
  • 40 new NEON families implemented, SVE API implementation started (14 families)
  • Initial support for x86 F16C API
  • Initial support for MIPS MSA API
  • Initial support for Arm Scalable Vector Extensions (SVE) API
  • Initial support for WASM SIMD128 API
  • Initial support for the E2K (Elbrus) architecture
  • MSVC has many fixes, now compiled in CI using /ARCH:AVX, /ARCH:AVX2, and /ARCH:AVX512

X86

There are a total of 7470 SIMD functions on x86, 2971 (39.77%) of which have been implemented in SIMDe so far. Specifically for AVX-512, of the 5270 functions currently in AVX-512, SIMDe implements 1439 (27.31%)

Newly added function families

Additions to existing families

  • AVX512F: 579 additional, 856 total of 2660 (31.80%)
  • AVX512BW: 178 additional, 335 total of 828 (40.46%)
  • AVX512DQ: 77 additional, 111 total of 399 (27.82%)
  • AVX512_VBMI: 9 additional, 30 total of 30 :100:
  • KNCNI: 113 additional, 114 total of 595 (19.16%)
  • VPCLMULQDQ: 1 additional, 2 total of 2 :100:

Neon

SIMDe currently implements 3745 out of 6670 (56.15%) NEON functions. If you don't count 16-bit floats and poly types, it's 3745 / 4969 (75.37%).

Newly added families

  • addhn
  • bcax
  • cage
  • cmla
  • cmla_rot90
  • cmla_rot180
  • cmla_rot270
  • fma
  • fma_lane
  • fma_n
  • ld2
  • ld4_lane
  • mlal_high_n
  • mlal_lane
  • mls_n
  • mlsl_high_n
  • mlsl_lane
  • mull_lane
  • qdmulh_lane
  • qdmulh_n
  • qrdmulh_lane
  • qrshrn_n
  • qrshrun_n
  • qshlu_n
  • qshrn_n
  • qshrun_n
  • recpe
  • recps
  • rshrn_n
  • rsqrte
  • rsqrts
  • shll_n
  • shrn_n
  • sqadd
  • sri_n
  • st2
  • st2_lane
  • st3_lane
  • st4_lane
  • subhn
  • subl_high
  • xar

MSA

Overall, SIMDe implementents 40 of 533 (7.50%) functions from MSA.

Details

Implementation of Arm intrinsics

NEON

  • aarch64 + clang-1[345] fix for "implicit conversion changes signedness" a22c3cc @mr-c
  • neon: Implement f16 types 21496f6 @Glitch18
  • neon: port additional code to new style 1c744fd @nemequ
  • neon: replace some more abs/labs/llabs usage with simde_math_* versions c59853a @nemequ
  • neon: refactor to use different types on all targets c17957a @nemequ
  • neon: test for MMX/SSE instead of x86 when choosing implementation 0366dab @nemequ
  • neon/abd: add much better implementations c3ddbbe @nemequ 220db33 @ngzhian
  • neon/abs: add SSE2 integer abs implementations 6396dc8 @aqrit
  • neon/addhn: initial implementation e9ee066 @nemequ
  • neon/add: Implement f16 functions e69239c @Glitch18
  • neon/add{l,}v: SSE2/SSSE3 opts _vadd{lvq_s8, lvq_s16, lvq_u8, vq_u8} 8b4e375 dfffdde @mr-c
  • neon/{add,sub}w_high: use vmovl_high instead of vmovl + get_high b897331 @nemequ
  • neon/bcax: initial implementation 96ce481 0ed3dea @Glitch18
  • neon/bsl: Implement f16 functions edb75b5 @Glitch18
  • neon/cage: Initial f16 implementations 20df81d @Glitch18
  • neon/cagt: Implement f16 functions 452a6d3 @Glitch18
  • neon/ceq: Implement f16 functions f24ab3d @Glitch18
  • neon/ceqz: Implement f16 functions dd2ebf2 de301cd @Glitch18
  • neon/cge: Implement f16 functions a512986 f3ad0d4 647dc12 @Glitch18
  • neon/cgez: complete implementation of CGEZ family 6d86a20 @Glitch18
  • neon/cgt: Add implementation of remaining functions 9930c43 @Glitch18
  • neon/cgt, simd128: improve some unsigned comparisons on x86 ae6702a @nemequ
  • neon/cgtz: Add implementations of remaining functions 4d749b5 @Glitch18
  • neon/cle: add some x86 implementations 5906cc9 d81c7e7 @nemequ 7894c7d @Glitch18
  • neon/clez: Add implementaions of scalar functions bc72880 @Glitch18
  • neon/clt: Add implementations of scalar functions & SSE/AVX512 fallbacks bc636e1 6a19637 @Glitch18
  • neon/cltz: Add scalar functions and natural vector fallbacks 2960ef0 @Glitch18
  • neon/cmla, neon/cmla_rot{90,180,270}: check compiler versions e98152f @nemequ
  • neon/cmla, neon/cmla_rot{90,180,270}: CMLA requires armv8.3+ 280faae @nemequ
  • neon/cmla, neon/cmla_rot{90,180,270}, neon/fma: initial implementation 2aff4f9 @Glitch18
  • neon/cnt: add x86 implementations of vcntq_s8 a558d6d @nemequ
  • neon/cvt: add __builtin_convertvector implementations d06ea5b @nemequ
  • neon/cvt: add out-of-range and NaN tests 7d0e2ac @nemequ
  • neon/cvt: add some faster x86 float->int/uint conversions ceaaf13 @nemequ
  • neon/cvt: Add vcvt_f32_f64 and vcvt_f64_f32 implementations 8398f73 @Glitch18
  • neon/cvt: cast result of float/double comparison dc215cd @ngzhian
  • neon/cvt: disable some code on 32-bit x86 which uses _mm_cvttsd_si64 48edfa9 @nemequ
  • neon/cvt: don't use vec_ctsl on POWER 8f9582a @nemequ
  • neon/cvt: fix a couple of s390x implementations' NaN handling a8bd33d @nemequ
  • neon/cvt: fix compilation with -ffast-math d1d070d @nemequ
  • neon/cvt: Implement f16 functions b6a9882 @Glitch18
  • neon/cvt, relaxed-simd: add work-around for GCC bug #101614 11aa006 @nemequ
  • neon/cvt, simd128: fix compiler errors on PPC 965e68e @nemequ
  • neon/cvt: clang bug 46844 was fixed in clang 12.0 71e03a6 @mr-c
  • neon/dot_lane: add remaining implementation 3f1c1fa 4a9ca8a @Glitch18
  • neon/dup_lane: Complete implementation of function family 12fb731 df320d1 @Glitch18 014ee00 9461557 @nemequ
  • neon/dup_lane: use dup_n 2b4a009 @ngzhian
  • neon/dup_n: Implement f16 functions 14fdf88 @Glitch18
  • neon/dup_n: replace remaining functions with dup_n implementations 27a13b0 @nemequ
  • neon/dupq_lane: native and portable 893db57 @ngzhian
  • neon/ext: add __builtin_shufflevector implementation de8fe89 @ngzhian
  • neon/ext: add _mm_alignr_{,e}pi8 implementations 6d28f04 @nemequ
  • neon/ext: clean up shuffle-based implementation f1de709 @nemequ
  • neon/ext: simde_*{to,from}_m64 reqs MMX_NATIVE 13ee902 @mr-c
  • neon/ext: unroll SIMDE_CONSTIFY for testing macro implemented functions 62834fa @mr-c
  • neon/fma: add a couple x86 and PPC implementations 7a2860b @nemequ
  • neon/fma: add more extensive feature checking e541dd1 @nemequ
  • neon/fma_lane: Implement fmaq_lane functions a77e6ad 555ef3e @Glitch18
  • neon/fma_n: initial implementation 06d5a62 @nemequ dab4342 @nemequ
  • neon/get_high: add __builtin_shufflevector optimizations 4003afa @ngzhian
  • neon/get_low: use __builtin_shufflevector if available ea3f75e @ngzhian
  • neon/hadd,hsub: optimization for Wasm ebe09d8 @ngzhian
  • neon/ld1: add Wasm SIMD implementation a79bc15 @ngzhian
  • neon/ld1_dup: native and portable (64-bit vectors), f64 debb3c8 @ngzhian 6c71aac @Glitch18
  • neon/ld1_dup: split from ld1, dup_n fallbacks, WASM implementations 4c586e0 @nemequ
  • neon/ld1: Implement f16 functions 6e89a9c f26f775 @Glitch18
  • neon/ld1_lane: Implement remaining functions de2de8d @Glitch18 9051a51 @ngzhian
  • neon/ld1q: u8_x2, u8_x3, u8_x4 341006c @ngzhian
  • neon/ld1[q]_*_x2: initial implementation cd14634 @dgazzoni
  • neon/ld{2,3,4}: disable -Wmaybe-uninitialized on all recent GCC e142a59 @nemequ
  • neon/ld{2,3,4}: silence false positive diagnostic on GCC 7 3f737a3 @nemequ
  • neon/ld2: Implement remaining functions e68f728 @Glitch18 3b3014f @ngzhian 078bb00 @nemequ 041b1bd @mr-c
  • neon/ld4_lane: native and portable implementations a973cab @ngzhian 179fb79 @Glitch18 0d1ab79 @nemequ
  • neon/ld4: use conformant array parameters 723a8a8 @nemequ
  • neon/ld4: work around spurious warning on clang < 10 64e9db0 @nemequ
  • neon/min: add SSE2 vminq_u32 & vqsubq_u32 implementation 2cf165e 117de35 @nemequ
  • neon/{min,max}nm: add some headers for -ffast-math ebe5c7d @nemequ
  • neon/{min,max}nm: use simde_math_* prefixed min/max functions c1607d2 @nemequ
  • neon/mlal_high_n: initial implementation d6f75fa @dgazzoni
  • neon/mlal_lane: initial implementation 82e36ed 2168ca0 @nemequ
  • neon/mls: add _mm_fnmadd_* implementations of vmls*_f* 70e0c20 @nemequ
  • neon/mlsl_high_n: initial implementation ca1a4c3 @dgazzoni
  • neon/mlsl_lane: initial implementation de78ae9 @nemequ
  • neon/mls_n: initial implementation 042c6eb @nemequ
  • neon/movl: improve WASM implementation ccffc23 @nemequ
  • neon/mul: add improved SSE2 vmulq_s8 implementation c6c6361 @nemequ
  • neon/mul: implement unsigned multiplication using signed functions 979552a @nemequ
  • neon/mul_lane: Add mul_laneq functions 86b039c 5d2e4bc @Glitch18
  • neon/mull_lane: initial implementation 4dd488d @nemequ
  • neon/neg: Complete implementation of function family 6423a26 @Glitch18
  • neon/padd: Add scalar function implementations fe21dc1 @Glitch18
  • neon/pmax: Add scalar function implementations a287eaa @Glitch18
  • neon/pmin: Add scalar function implementations 38f7499 @Glitch18
  • neon/qabs: add some faster implementations 6cd925e @nemequ
  • neon/qadd: add several improved x86 and vector extension versions 4e48e5c @nemequ
  • neon/qadd: fix warning in ternarylogic call in vaddq_u32 fad2470 @nemequ
  • neon/qadd: improve SSE implementation 8fbe7cd @nemequ
  • neon/qdmulh: add scalar & shuffle-based implementations 8cf3afc @nemequ 68e7a0e @Glitch18
  • neon/qdmulh_lane: native and portable 79dc1ee @ngzhian 1c64794 @Glitch18
  • neon/qdmulh_n: native and portable implementations 55a9c07 @ngzhian
  • neon/qdmull: add WASM implementations 7d7a43b @nemequ
  • neon/qrdmulh_lane: initial implementation dc2ea75 @nemequ 3794620 @ngzhian 9ab1446 @Glitch18
  • neon/qrdmulh: native aliases for scalar functions should be A64 f7820fc @nemequ
  • neon/qrdmulh: steal WASM q15mulr_sat implementation for qrdmulhq_s16 ccacf94 @nemequ
  • neon/qrshrn_n: Add scalar, native and portable function implementations ffa09ca @Glitch18 2595b3e @ngzhian
  • neon/qrshrun_n: Add scalar, native and portable function implementations 49300fa @Glitch18 d5e805b @ngzhian
  • neon/qshlu_n: initial implementation 77af9f1 f7b59a5 @Glitch18
  • neon/qshrn_n: initial implementation d9260dc @nemequ b4eed3e @Glitch18
  • neon/qshrun_n: native, scalar, and portable implementations c29f9fb @ngzhian eeaad75 @Glitch18
  • neon/qsub: add some SSE and vector extension implementations 1cb520a @nemequ
  • neon/recpe: recpe_f32 and recpe_f64, native and portable 629d129 5a27732 @ngzhian eb18b7c @nemequ 9d8e77f @Glitch18
  • neon/recps: recps/recpsq for native, scalar, and portable e8a8a09 7e420a1 @ngzhian 9c67d34 @Glitch18
  • neon/reinterpret: f16_u16 and u16_f16 implementations 9aedd5d @Glitch18 7f9794a @ngzhian
  • neon/rhadd: optimizations for rhaddq_xxx f730009 @aqrit
  • neon/rnd: use correct SVML function for simde_vrndq_f64 f19193b @mr-
  • neon/rndi, sse2: work around several functions missing in GCC 0b6a9c1 @nemequ
  • neon/rndn: Add scalar function implementation d5d6509 d01618a 90c910b @Glitch18 050f935 @nemequ
  • neon/rshl: Add scalar function implementations c641cbd @Glitch18
  • neon/rshr_n: Add scalar function implementations 465c1ec 3a0ef81 @Glitch18
  • neon/rshrn_n: native and portable implementations a703711 @ngzhian
  • neon/rsqrte: Implement remaining functions 75c1495 @Glitch18 990b458 @nemequ 8781eb6 @ngzhian
  • neon/rsqrts: vrsqrts_f32 and vrsqrtsq_f32 native and portable de8c592 @ngzhian ed5e971 @Glitch18
  • neon/rsra_n: Add scalar function implementations 4944075 @Glitch18
  • neon/shl: Add scalar implementations 89fdad8 @Glitch18
  • neon/shll_n: native and portable implementations 98ac861 @ngzhian
  • neon/shl_n: Add scalar function implementations 267ab66 @Glitch18
  • neon/shlu_n: faster WASM implementations 5576d8a @nemequ
  • neon/shr_n: Add scalar function implementations e3e4b8e @Glitch18 e751352 @nemequ
  • neon/shrn_n: s16 s32 s64 u16 u32 u64 portable, native, WASM 8810cdd @ngzhian 40b4549 @ngzhian
  • neon/sqadd: initial implementation eab9d99 @Glitch18 1c0dabf @nemequ
  • neon/sra_n: Add scalar function implementations 272c2cf @Glitch18
  • neon/sri_n: add 128-bit, native, portable & scalar implementations aa832e1 @nemequ dcbcab5 @Glitch18 f6cf839 @ngzhian
  • neon/st1: Add f16 functions f58bd3c @Glitch18
  • neon/st2: Implement remaining functions 43c4b52 @Glitch18
  • neon/st2_lane: portable and native for 8ee1eb4 @ngzhian 4cbed4a @Glitch18
  • neon/st2,st1: use zip + st1 to implement st2 7929406 @ngzhian
  • neon/st2: vst2(q) f32 s8 s16 s32 u8 u16 u32 1e38dcb @ngzhian
  • neon/st3: Add shuffle vector implementations 52da8d4 @Glitch18
  • neon/st3_lane: portable and native ae308b2 @ngzhian 982d2a9 @Glitch18
  • neon/st3q_u8: Wasm optimization 687460c @ngzhian
  • neon/st4_lane: portable and native b231820 @ngzhian 5be1b07 @Glitch18
  • neon/subhn: initial implementation ca62754 @nemequ
  • neon/sub: Implements the two remaining scalar functions 74e5b82 @Glitch18
  • neon/subl_high: initial implementation 36d6d11 @dgazzoni
  • neon/tbl: add WASM implementation of vtbl1_u8 d05fa59 @nemequ
  • neon/tst: implement scalar functions 41c2f8a @Glitch18
  • neon/types: remove duplicate NEON float16_t definitions 7f40f35 @dgazzoni
  • neon/types: reverse logic for SIMDE_ARM_NEON_FORCE_NATIVE_TYPES 7776a8c @nemequ
  • neon/types: use vector extensions for public types when available 790e263 @nemequ
  • neon/vdup: vdupq_lane_f32 native and portable e2ae5dc @ngzhian
  • neon/vld1q_dup: native and portable implementations 650d531 @ngzhian
  • neon/vld2_u8: native and portable implementation 85d2ed2 @ngzhian
  • neon/vld2: vld2_{u16,u32} and vld2q_{u8,u16,u32,f32} b43d434 @ngzhian
  • neon/vld4: Wasm optimization of vld4q_u8 07387bf @ngzhian
  • neon/vmovq: define vmovq_n as aliases for vdup_n ff7472b @ngzhian
  • neon/xar: initial implementation 50cd8af @Glitch18
  • neon/zip1: add armv7 implementations d4ded0a @nemequ

SVE Intrinsics

  • Initial import of a portable SVE implementation. f8f8382 9fd7d68 7311dd3 @nemequ
  • sve/add: initial implementation 70d5b0a 21b39aa 747e076 dd42b49 971aefb @nemequ
  • sve/and: initial implementation 5c56617 3382f4e @nemequ
  • sve/cmplt: replace vec_and with & for s390 implementations 7c599ea @nemequ
  • sve/dup: add *_m variants b90ae4d bad00e9 1da79a2 @nemequ
  • sve/ptest: simplify svptest_first c7e4699 @nemequ
  • sve/ptest: _BitScanForward64 and __builtin_ctzll is not available in MSVC 2017 47bccc7 @mr-c
  • sve/qadd: initial implementation 8aaa62b @nemequ
  • sve/sel: initial implementation 113ec2b a1e423e @nemequ
  • sve/types: add mmask4 functions for 256-bit vectors 33fbaa2 @nemequ
  • sve/whilelt: add svwhilelt_*_{u32,s64,u64} implementations 36927be 2b29fef @nemequ
  • sve/whilelt: correct type-o in __mmask32 initialization a53d550 @mr-c
  • sve/true,whilelt,cmplt,ld1,st1,sel,and: skip AVX512 native implementations on MSVC 2017 accce42 @mr-c

WASM intrinsics

  • Add WebAssembly SIMD128 implementation. db758eb 20664a6 57efb02 20682c1 804b833 65db4cf bdc8698 271d1e4 631cf53 7078ab4 5c8d7b3 0e43903 c734535 34b775d 22609d4 f4ee32a 516eb02 1d4075c f73db2d c66df66 c2fda16 06b3462 d45f735 b7b69fb 8a748d7 6c57794 e60f1e0 c37dfd3 fdfa16a c4aa8b4 96226ff 732f519 2890ad4 706de03 fca719e 5638afa d013847 3d4b2ff 783c752 3378ab3 42f0a0b e8da237 22c0dee d9e3615 9848a4c 8a21137 5b1a330 dbd2e5c 09d8f79 e1bc968 @nemequ 2380aa4 @coderzh
  • wasm: load lane memcpy instead of cast to address UBSAN issues 7631312 @wrv
  • wasm: f32x4 and f64x2 nearest roundeven dc75f4c @wrv
  • relaxed-simd: initial support for the WASM relaxed SIMD proposal 083bd2f 3e5515a bf136e7 48954b6 9715924 @nemequ
  • clang wasm: add workaround to fix wasm_i64x2_shl bug 256d9df @Changqing-JING
  • simd128: clang 13 fixed bugs affecting simde_wasm_{v128_load8_lane,i64x2_load32x2} 7bc774f @mr-c
  • wasm simd128: correct trunc_sat _FAST_CONVERSION_RANGE target type e861f2c @mr-c
  • simd128 extract_lane: set unreachable value as float to appease msvc 0964774 @mr-c
  • simd128: Test some operations more strictly 9ca1f6d @keithw
  • simd128: move unary minus to appease msvc native arm64 86677d9 @mr-c
  • Wasm q14mulr_sat_s: match Wasm spec 366e6d5 @keithw
  • Wasm f32/f64 nearest: match Wasm spec da0c8af @keithw
  • Wasm f32/f64 floor/ceil/trunc/sqrt: match Wasm spec af1ad7c @keithw
  • Wasm f32/f64 abs: match Wasm spec b4ecb3c @keithw
  • Wasm f32/f64 max: match Wasm spec a0e27b9 @keithw
  • Wasm f32/f64 min: match Wasm spec 8091bbb @keithw

x86 intrinsics

  • Fix native aliases for amd64-only functions f0e9755 @nemequ
  • Add @aqrit's SSE2 min/max implementations d90e835 @nemeq
  • x86: fix AVX native → SSE4.2 native f6fc25a @mr-c
  • x86: ignore warnings about inefficient functions on lcc 416c243 @makise-homura
  • The fix for GCC bug #95483 wasn't in a release until 11.2 11d95f8 @nemequ
  • fix array size wrong size (caught by GCC 12) c6179cb @Lithrein
  • clmul: _mm512_clmulepi64_epi128 implicitly requires AVX512F 10a2f28 @mr-c

SSE*

  • sse: avoid including windows.h when possible 750f20d @boris-kuz
  • sse: don't use armv7 impl of _MM_TRANSPOSE4_PS on armv8 b5fb757 @nemequ
  • sse, mmx: fix clang-11 on POWER a0e9f9f @nemequ
  • sse: prefer SIMDE_SHUFFLE_VECTOR implementation of _mm_shuffle_ps 377e350 @nemequ
  • sse: replace _mm_prefetch implementation 26d515f @nemequ
  • sse: use portable implementation to work around llvm bug #344589 79738de @nemequ
  • sse: Suppress min/max macro definitions from windows.h 3465b57 @quyykk
  • sse: Fixed simde_mm_prefetch warnings 9c3d0dc @Epixu
  • sse: remove errant MMX requirement from simde_mm_movemask_ps 61b9341 @mr-c
  • sse, sse2: clean up several shuffle macros cc6dc18 @nemequ
  • sse, sse2: fix vec_cpsign order test 1465c48 @nemequ
  • sse, sse2: sync clang-12 changes for vec_cpsgn 1ba1596 @simba611
  • sse, sse2: work around GCC bug #100927 80472b7 @nemequ
  • sse: remove unbalanced HEDLEY_DIAGNOSTIC_PUSH 0c26988 @mr-c
  • sse: Add LoongArch LSX support 31367e2 @XiWeiGu
  • sse{2,} tests: use INT32_MIN to appease MSVC 3ad047b @mr-c
  • sse2: add fast-math WASM implementation of _mm_cvtps_epi32 24c503f @nemequ
  • sse2: add parenthesis around macro arguments b394520 @nemequ
  • sse2: correct typos in simde_x_mm_broadcastlow_pd f8ce9bb @rosbif
  • sse2: don't require constants for _mm_srai_epi{16,32} 8bee92a @????
  • sse2: fix incompatible argument in A32 impl. of _mm_cvtps_epi32 b5fbe39 @jpcima
  • sse2: fix set but not used variable in _mm_cvtps_epi32 f460666 @nemequ
  • sse2: ignore broken _mm_loadu_si{16,32} on GCC 4b7394f @nemequ
  • sse2: prefer shuffle implementation of _mm_shuffle_epi32 to NEON d2ce706 @nemequ
  • sse2: remove AArch64 implementation of _mm_movemask_epi8 c595f6b @nemequ
  • sse2: remove statement expr requirement for NEON srli/srai macros da4d24f @nemequ
  • sse2, sse4.1: pull in improved packs/packus implementations from WASM 7b1df61 @nemequ
  • sse2: use simde_math_{add,sub}s_* for mm{add,sub}s_* functions 09d725d @nemequ
  • sse2: vcvtnq_s32_f32 is armv8-specific 98075d0 @nemequ
  • sse2: workaround missing vcvtnq_s32_f32 on GCC e11258e @jpcima
  • sse2: Fixed parameters to _mm_clflush d46f0e7 @thomasdwu
  • sse2 gcc: bug 99754 was fixed in GCC 12.1 6453f55 @mr-c
  • sse2: msvc arm64: disable false-positive warnings a5a0a9a @mr-c
  • sse2/avx: move some native aliases around to satisfy MSVC 2017 /ARCH:AVX512 da2988e @mr-c
  • ssse3: Add SSE2 integer abs implementation 2de8624 @aqrit
  • sse4.1 _mm_insert_ps: incorrect handling of the control 94e7569 @MirJawadMairaj
  • sse4.1: add some casts to make clang -Weverything happy 5f000af @nemequ
  • sse4.1: fix AArch64 implementation of simde_x_mm_blendv_epi64 978d1f7 @milot-mirdita
  • sse4.1: _mm_blendv_epi8: add sse2 and update wasm_simd128 implementions 2dbc124 @aqrit
  • sse4.1: remove statement expr dependency in blend functions 01fb894 @nemequ
  • sse4.1: replace NEON implementations with shuffle-based implementations 29a3cb4 @nemequ
  • sse4.1: use NEON types instead of vector in insert implementations 489e36c @nemequ
  • sse4.2: re-enable native _mm_cmpgt_epi64 7117c48 @aqrit
  • sse4.2: work around more warnings on old clang 3f186a0 @nemequ
  • sse4.1: fix A32V7 version of _mm_test{nz,}c_si128 e7c70a2 @mr-c

AVX

  • avx: work around missing _mm256_{load,store}u_m128{,i,d} on LCC a3a39e2 @nemequ
  • avx: try to detect prior inclusion of AVX header and handle it e8b7a2e @nemequ
  • avx, avx512/cmp: properly handle NaN in _mm{,256,512}_cmp_{ps,pd,ss,sd} 491d3fa @nemequ
  • avx: use internal symbols in clang fallbacks for cmp_ps/pd functions 35b86b7 @nemequ
  • avx: work around incorrect maskload/store definitions on clang < 3.8 a9313de @nemequ
  • avx: add native calls for _mm256_insertf128_{pd,ps,si256} bab30bb @LaurentThomas
  • avx: add test for simde_mm256_permute2f128{_pd,_si256} 04a0497 @mr-c
  • avx{,2}: fix maskload illegal mem access 39f723e @k-dominik
  • avx{,2}: use SIMDE_FLOAT{32,64}_C to fix warnings from msvc 698bc2e @mr-c
  • avx{,2}: some intrinsics are missing from older MSVC versions bb274b8 @mr-c

AVX2

  • avx2: add vector/shuffle implementation of _mm256_madd_epi16 2c2dd73 @nemequ
  • avx2: fix undefs for many native aliases 2ca5480 @anrodrig
  • avx2: added vector size conditional for unpack 287bda9 @simba611
  • avx2: separate natural vector length for float, int, and double types 6d1896d @nemequ

AVX512

  • avx512/{knot,kxor,cmp,cmpeq,compress,cvt,loadu,shuffle,storeu} Additional AVX512{F,BW,VBMI2,VL} ops 1f8d1d2 @mr-c
  • avx512: work around several bugs in older versions of clang e64231e @nemequ
  • avx512: add several new functions ccc0757 @anrodrig b3535c3 @nemequ
  • avx512: implement mm*_mask(z)compress(storeu)* dab908e @simba611
  • avx512: implement mm_mask(z)_unpack* funcs 7aa3155 @simba611
  • avx512: initial implementation f35090a @simba611
  • avx512/4dpwssd: implement complete function family 5bbf50f @simba611
  • avx512/4dpwssds: initial implementation 22b8b97 @simba611
  • avx512/abs: add SSE2 implementation of _mm_abs_epi64 5c2f423 @aqrit
  • avx512/abs: work around buggy pd functions in GCC 7 - 8.2 605c92a @anrodrig
  • avx512/bitshuffle: initial implementation c92a13b @simba611
  • avx512/cmpeq: implement _mm512_mask_cmpeq_epi8_mask 88d2faf @nemequ
  • avx512/cmpge: finish implementing all functions 9a4d0de 0b5de15 @nemequ
  • avx512/cmp{g,l}e: AVX-512 implementations of non-mask functions ca1812d @nemequ
  • avx512/cmple: finish implementations of all cmple functions 06aa828 @nemequ
  • avx512/cmpneq: initial implementation of 128-bit and 256-bit functions 34194f2 @nemequ
  • avx512/compress: implement _mm256_mask_compress_pd d1223d4 @simba611
  • avx512:compress: implement _mm256_mask(z)_compress(storeu)_p* a7386b5 @simba611
  • avx512/compress: Mitigate poor compressstore performance on AMD Zen 4 54563e4 @mr-c
  • avx512/conflict: implement missing functions b6887ce c8f2755 @simba611
  • avx512/cvt: add _mm512_cvtepu32_ps `_mm{_mask,_maskz}_cvtepi64_pd 292e1e2 @nemequ
  • avx512/cvtt: add _mm{_mask,_maskz}_cvttpd_epi64 d2f518a e842f29 @nemequ
  • avx512/dbsad: initial implementation d659f42 0c76c5e @simba611
  • avx512/dpbf16: initial implementation 18b4e74 0ec8d72 @simba611
  • avx512/dpbusd: initial implementation 913a0a4 ff0d35a @simba611
  • avx512/dpbusds: complete function family 34f2488 @simba611
  • avx512/dpwssd: initial implementation 973df0e @simba611
  • avx512/dpwssds: initial implementation fe93582 @simba611
  • avx512/fixupimm: initial implementation 441339e @simba611
  • avx512/fmsub: implement _fmsub_ functions for AXV512VL b7df811 @simba611
  • avx512/insert: implement inserti{,_mask,maskz}{32x8,64x2} mm512{_mask,_maskz}_insert{f32x8,64x2} 2c8b052 @simba611 8e306d1 @simba611 0ba2085 @nemequ
  • axv512/insert: unroll SIMDE_CONSTIFY for testing macro implemented functions f09c61c @mr-c
  • avx512/knot,kxor: native calls not availabe on MSVC 2017 9a95e7c @mr-c
  • avx512/loadu: _mm{,256}_loadu_epi{8,16,32,64} skip native impl on MSVC < 2019 aa20919 @mr-c
  • avx512/load_pd: initial implementation 8445684 @operasfantom
  • avx512/load_ps: initial implementation d588049 @operasfantom
  • avx512/madd: explicitly promote 16-bit elements to 32-bit e5dd146 @nemequ
  • avx512/madd: fix arguments for native aliases ae545ce @nemequ
  • avx512/mullo: implement mm512_mullo_epi64 with mask(z) 8545d26 @8545d26
  • avx512/multishift: initial implementation 6b125ec @simba611
  • avx512/or, avx512/xor: regenerate tests using 32-bit ints instead of 64 e1de51d @nemequ
  • avx512/or: implement mm512_mask(z)_or_ps/d functions 6cda738 b7933e6 @simba611
  • avx512/permutex2var: hard-code types in casts instead of using typeof 8893116 @nemequ
  • avx512/permutex2var: work around incorrect definition on old clang 647279d @nemequ
  • avx512/popcnt: initial implementation d5ec32a b17b646 @simba611
  • avx512/range: initial implementation mm512_range_ps/d functions d59e3f5 37ab069 8bf0305 @simba611 8bc81ca 8ccb363 6b8d8b8 b8e63b4 @nemequ
  • avx512/range_rounnd,round: move range_round functions out of round d382488 @simba611
  • avx512/rol: implement remaining functions 9a52011 @simba611
  • avx512/rol,ror: more tests within useful range of imm8 088f810 @mr-c
  • avx512/rolv: initial implementation a2e7632 b1745c5 1fa7764 @simba611
  • avx512/round, avx512/roundscale: add shorter vector fallbacks b542b01 @simba611
  • avx512/roundscale: initial implementation e47e703 6ddf1a2 98e6a60 @simba611
  • avx512/roundscale{,_round}: skip many mm{,_mask,_maskz}roundscale_round{ss,sd} testing on MSVC + NATIVE_AVX 3369366 @mr-c
  • avx512/roundscale_round: implement remaining functions db7a52a @simba611
  • avx512/roundscale_round: quiet a false positive MSVC warning 3a6dcf7 @mr-c
  • avx512/scalef: initial implementation 581bf31 482bf32 @simba611 22be4e8f60c159 @nemequ
  • avx512/set, avx512/popcnt: use _mm512_set_epi8 only when available aa5746f @nemequ
  • avx512/setzero: fix native aliases c900d5e @EleonoreMizo
  • avx512/shldv: initial implementation cddc500 @simba611 9b08cfc @nemequ
  • avx512/ternarylogic: initial implementation 30eb81e @nemequ 7faedd6 @simba611
  • avx512/unpack{hi,lo}: implement mask variants of unpacklo b2c176f @simba611
  • avx512/unpack{hi,lo}: implement mm256_mask(z)_unpack* functions ca8c102 @simba611
  • avx512/unpacklo: added vector size conditional 3924339 @simba611
  • avx512/unpacklo: implement mask variants of unpacklo 0c4775e @simba611
  • avx512/unpacklo: implement mm512_unpacklo_* functions 8582277 @simba611
  • avx512/xor: implement mm512_mask(z)_xor_pd/s functions 854f913 @simba611
  • Properly map __mm functions to __simde_mm 96c963f @psaab
  • simde/scalef: add scalef_ss/sd d9898e5 @simba611
  • gcc i686 mm*_dpbf16_ps: skip vector ops due to rounding error b721e9d @mr-c
  • axv512: add simde_mm512_{cvtepi32_ps,extractf32x8_ps,_cmpgt_epi16_mask} bf1fbae @mr-c
  • avx512: define __mask64 & __mask32 if not yet defined d850b83 @mr-c

GFNI

  • gfni: improve ARM NEON implementation a99a3ec @rosbif
  • gfni: add ARM, PPC and WASM implementations of gf2p8mul intrinsics 61126b3 @rosbif
  • gfni: add cast to work around -Wimplicit-int-conversion warning d066a1c @nemequ
  • gfni: remove unintentional dependency on vector extensions bdfa828 @nemequ
  • gfni: work around clang bug #50932 7d4beba @nemequ
  • gfni: work around error with vec_bperm on clang-10 on POWER 8620bd0 @nemequ
  • gfni: replace vec_and and vec_xor with & and ^ on z/arch f5577dc @nemequ
  • gfni: add many x86, ARM, z/Arch, PPC and WASM implementations 97eb961 @rosbif

XOP

  • xop: fix NEON implementation of maccs functions to use NEON types 6ecc0e3 @nemequ

F16C

  • f16c: initial implementation 62c1087 @nemequ
  • f16c: use __ARM_FEATURE_FP16_VECTOR_ARITHMETIC to detect Arm support eaeac09 @nemequ
  • msvc 2022: enable F16C if AVX2 present a66cbb0 @mr-c
  • f16c: rounding not yet implemented for simde_mm{256,}_cvtps_ph 5d2b53d @mr-c

FMA

  • fma: work around broken implementations of some functions on MCST LCC 269db2a @makise-homura
  • fma: add mls-based NEON implementations of fnmadd functions 55416aa @nemequ
  • fma: drop weird high-priority implementation in _mm_fmadd_ps 20922ff @nemequ
  • fma: use fma/fms instead of mla/mls on NEON 2fe84e5 @nemequ
  • fma: use NEON types in simde_mm_fnmadd_ps NEON implementation 44d38bd @nemequ
  • fma: fix return value of simde_mm_fnmadd_ps on NEON 87198d9 @nemequ
  • Fixed FMA detection macro on msvc 286ba3d @dhbloo

SVML

  • svml: trivial indentation fix 2176652 @nemequ
  • svml: remove some dead stores from cdfnorminv 11d97ba @nemequ
  • svml: simde_mm256_{clog,csqrt}_ps native reqs AVX not SSE 7d16fa6 @mr-c

MIPS MSA intrinics

  • Begin working on implementing MIPS MSA. e9c002a @nemequ
  • msa/add_a: initial implementation 6b37bb3 @nemequ
  • msa/addvi: initial implementation 8711327 @nemequ
  • msa/subv: initial implementation 75b3b73 @nemequ
  • msa/andi: initial implementation 31b7ce7 @nemequ
  • msa/and: initial implementation 6635520 @nemequ
  • msa/adds: initial implementation c37559c @nemequ
  • msa/adds_a: initial implementation bb84c44 @nemequ
  • msa/madd: initial implementation 1b89ab3 @nemequ
  • Many work-arounds for GCC with MSA, and support in the docker image. e5dbb93 @nemequ

Arch support

  • various: correct PPC and z/Arch versions plus typo ac8d722 @rosbif
  • arch: __ARM_ARCH now (v8.1+) encodes the minor version b0b22d1 @nemequ
  • arch: set SIMDE_ARCH_ARM for AArch64 on MSVC 1d8befc @nemequ
  • arch: Add LoongArch LASX/LSX support d0cc0ab @XiWeiGu

arm64

  • arm64 windows: fix simd128.h build error dad8cad @Changqing-JING
  • mips/msa: fix for Windows ARM64 0f988c9 @Changqing-JIN
  • arm/neon: workaround on ARM64 windows bug b54dfcb @Changqing-JING

z/Arch

  • Correctly detect and handle z/Arch and its vector extensions 4a3f466 @nemequ
  • Fix z/Arch without zvector. b8af226 @nemequ
  • sse, sse2: add several z/Arch implementations 4f628ac @nemequ
  • sse2, sse4.1: additional z/Arch implementations for ksw2 ee24439 @milot-mirdita
  • Many additional z/Architecture implementations of x86 functions 5a2b035 @nemequ
  • se2, sse4.1: additional z/Arch implementations for ksw2 ee24439 @milot-mirdita
  • sse4.1, neon/bsl: v/Arch implementations of blendv/bsl functions 80a8484 @nemequ
  • z/Architecture implementations for remaining min/max functions 694d547 @nemequ
  • neon/cvt: z/Arch implementations 107fab8 @nemequ
  • sse, sse4.1: z/Arch implementations of some rounding functions 9fb1509 @nemequ
  • sse, sse2, neon/dup_n: lots of z/Arch splat-based implementations 874d51f @nemequ
  • gfni: add z/Arch version c12f111 @rosbif
  • x86,arm/neon: Correct z/Arch versions 50fba9b @rosbif
  • features: add z/arch to SIMDE_NATURAL_VECTOR_SIZE d41999b @nemequ
  • arm/neon/qdmulh s390 gcc-12: __builtin_shufflevector is misbehaving 23a2441 @mr-c

Altivec

  • sse, sse2: generate to/from altivec functions for SSE/SSE2 types. dd3ff53 @nemequ
  • docker: power9-clang ignore deprecated-altivec-src-compat warnings b70f1a2 @mr-c
  • sse4.1: PPC AltiVec has no vec_splat_s64 debbf73 @rosbif
  • arch: fix SIMDE_ARCH_POWER_ALTIVEC_CHECK to include AltiVec check 8534e64 @nemequ
  • simd128: add AltiVec implementations of any/all_true a3b2630 @nemequ

e2k (Elbrus)

  • e2k: Introduce E2K (Elbrus) architecture 093b2c5 @makise-homura
  • e2k, ppc: Make shifts unsigned 24ddeba @makise-homura

Power

  • gcc power: bugs 1007[012] fixed in GCC 12.1 c23208d @mr-c
  • gcc power: vec_cpsgn argument reversal fixed in 12.0 296362c @mr-c

Testing with Docker/Podman & CI

  • CI: meson newer than 0.56 skips tests d08cb7b @mr-c
  • download-sde: be more tolerant of changes on Intel's web site 87bb927 @nemequ
  • meson: require meson version 0.54 349da2b @makise-homura
  • testing: Require exact matches for abs functions 9085d94 @jpcima
  • test: replace 1e-##precision with to_slop functions 9adcc21 @nemequ
  • test: allow passing INT_MAX for precision for exact comparisons e903b7f @nemequ
  • codecov: ignore test/ directory 65e7903 @nemequ
  • cmake: generate most declare-suites.h files 5d62f0d @nemequ
  • tests: update download-iig.sh to account for Intel changes 2fdc9a5 @nemequ
  • test: fix download script for SDE b3b4975 @nemequ
  • update SDE download link 24338a2 @mr-c
  • meson docs: don't use deprecated syntax 1a1a6eb @mr-c
  • SDE: add -future flag to support all x86 features caa3c6d @wrv
  • check-flags.sh: add lock around installing SDE 373e1e3 @nemequ
  • download-iig: tweak script to fix download location 082a875 @nemequ
  • sde: don't print URL in download-sde script. 55fc0e2 @nemequ
  • Default to -DSIMDE_CONSTRAINED_COMPILATION when building tests 3d14f8e @nemequ
  • test.h: test binary equivalence of f32/f64 when slop is zero 221f3b3 @keithw

Appveyor

  • appveyor: use ccache 56b2ff2 @mr-c
  • appveyor: build & test with MSVC 2022 94b6983 @mr-c
  • appveyor: report ccache stats and increase compression 9232671 @mr-c
  • appveyor: add /Z7 flag to improve ccache bb63806 @mr-c
  • appveyor: turn all warnings into errors b207819 @mr-c
  • appveyor: build tests with AVX{,2}, but don't run them bcd9589 @mr-c
  • appveyor: test MSVC with ARCH512 64d7434 @mr-c
  • appveyor: return to normal cache compression 0c3dfa4 @mr-c

Azure

  • Azure: publish test results 51c24d8 @mr-c
  • Azure: Use Ubuntu-20.04 instead of "-latest" 1cf39df @mr-c
  • Azure CI: user newer clang for check 3b663fe @mr-c

Circle CI

  • circle-ci: fix longsoon build 3db6d7a @mr-c
  • circle-ci: ccache for non-native builds ee79d7d @mr-c
  • circle-ci: i686 was actually compiling for x86_64 :-( 7670e63 @mr-c
  • circle-ci: test i686 with gcc-11 -O2 0a69604 @mr-c
  • circle-ci: modernize build, especially for i686 & loongson 22b73ba @mr-c
  • circle-ci: i686 gcc, only gcc-11 + O2 for now 7e70d02 @mr-c
  • circle-ci & cirrus: pipx instead of pip for meson 3211797 @mr-c

Cirrus CI

  • cirrus: add -Db_lundef=false to sanitizer buld 5a0fc02 @nemequ

Local testing with Docker/Podman

  • docker: add -march=z14 -mzvector to s390x-gcc-10 build. 8f60406 @nemeq
  • docker: use z13 instead of z14 for s390x architecture a524be2 @nemequ
  • docker: install meson from pip df63f88 @nemequ
  • docker: use meson 0.55.0 instead of 0.54.0. 5112bf2 @nemequ
  • docker: add platform dependent fixes for docker 3dd58b9 @Glitch18
  • docker: fix script exiting bug 6770ec0 @Glitch18
  • docker: only rebuild image if older than a week d9b1322 @nemequ
  • docker: fix build when the image doesn't exist yet ab3b509 @nemequ
  • docker: skip date check when building image for the first time a1c4728 @Glitch18
  • docker: allow overriding the BUILD_IMAGE setting ca6f690 @nemequ
  • docker: Add a prompt before rebuilding image c2cff9f @Glitch18
  • docker: Fix BUILD_IMAGE always being set to 'y' 368a777 @Glitch18
  • docker: use -O2 instead of -O3 on emscripten 3173499 @nemequ
  • docker: fix quoting error 830981b @mr-c
  • docker: aarch64-clang ; match drone.io flags bbe4416 @mr-c
  • docker: skip mips64el from cross-building d3f5fae @mr-c
  • docker: tighten libstdc++NN-dev package selection c44539c @mr-c
  • docker: pass -future flag to sde for i686-all-gcc-9 d8658ea @mr-c
  • docker: icc, disable depracation notice 505f24a @mr-c
  • docker: add Intel ICX testing 4a4eeb6 @mr-c
  • docker: add more cross building profiles for modern compilers 89e2c5b @mr-c
  • docker: qemu package doesn't exist & is unneeded 9ec8375 @mr-c
  • docker: enable use of ccache 4d42b90 @mr-c
  • docker: icx ignore no-tautological-constant-compare warning 97315b8 @mr-c
  • docker: add test with Debian default flags, also for armel 0a44b50 @mr-c
  • docker: sde tigerlake allows for advanced AVX512 testing 54b5d4e @mr-c
  • docker: apt-get update before each other apt command 5560ca0 @nemequ
  • docker: add a bunch of cross files b718597 @nemequ
  • docker: Dockerfile, Use netselect-apt to speed up image build e98cf70 @Glitch18
  • docker emscripten: remove experimental wasm flag for v8 496d88d @wrv
  • docker: use qemu-*-static's elf interpreter prefix to simplify 1921112 @mr-c
  • docker: power: meson cpu_family is just powerpc64 0643db3 @mr-c
  • docker i686: cap at prescott, not =native e12eab6 @mr-c
  • docker: ppc64el, riscv64, s390x cross compiling 3fa1d18 @mr-c
  • docker emscripten: fix v8 path, install ccache 8537dd9 @mr-c

Drone.io

Currently non-functional. Jobs queue, but are eventually killed before they start running. Assistance fixing that is welcome!

  • drone: read testlog.txt if tests fail eb71d89 @nemequ
  • drone: configure apt to retry failed downloads 1c442b4 @nemequ

GitHub Actions

  • gh-actions: add some bionic-era GCC builds ccdd24b @nemequ
  • gh-actions: add several clang builds e4b4646 @nemequ
  • gh-actions: add some bionic-era GCC builds. ccdd24b @nemequ
  • gh-actions: temporarily disable emscripten build 71ea291 @nemequ
  • codeql: analyze the merge commit d3a40e1 @mr-c
  • gh-actions: automatically detect whether to use SDE bb69b54 @nemequ
  • gh-actions: disable clang-3.9 build 7fcb64d @nemequ
  • gh-actions: use ctest to run CMake tests so we can output on failure 03f6ebe @nemequ
  • gh-actions: try commit message witohut quotes on implementation-status 3f81cac @nemequ
  • gh-actions: add action to update the implementation-status repo 333f077 @nemequ
  • gh-actions: use -O2 instead of -O3 on emscripten 636f145 @nemequ
  • gh actions: Add Windows ARM64 CI f12fd00 @tommyvct
  • gh-actions: only run mSVC Arm checks on msvc-arm branch 3d8a516 @nemequ
  • gh-actions: switch emscripten build to Meson bde2cb1 @nemequ
  • gh-actions: ubuntu-16.04 has been retired, migrate to ubuntu-18.04 6d0c65c @mr-c
  • gh-actions: pin to macos-10.15 instead of -latest d64de8c @mr-c
  • ga-actions: trim flags for icx/icpc 201dcdb @mr-c
  • gh-actions, circleci: debian testing gcc: -Wno-error=stringop-overread af24d0c @mr-c
  • gh-actions, docker: turn off emscripten's -Wunsafe-buffer-usage for the tests 3caf71d @mr-c
  • gh-actions: test using Intel® oneAPI DPC++/C++ Compiler instead of ICC df144ff @mr-c
  • gh-actions: Ubuntu 22.04 + system meson dd0b662 @mr-c
  • gh-actions: Update codecov to v3 for Node 16 support bd7f8df @wrv
  • gh-actions: Update macos build to 11 c30a29b @wrv
  • gh-actions: Comment out Ubuntu 18.04 build as will be unsupported in April 2023 6cefe47 @wrv
  • gh-actions: Update to actions/checkout@v3 to avoid Node 12 warning 511b5b7 @wrv
  • gh-actions: add -fp-model precise for icx/icpx 7ec32ff @wrv
  • gh-actions: update OSSAR action versions a1a63ac @wrv
  • gh-actions: cancel workflows if there is a newer commit 8c56459 @mr-c
  • gh-actions: test with gcc-12 f6db95d @mr-c
  • gh-actions: remove GCC 4.7 build 3997b8f @nemequ
  • gh-actions: add action to push to the simde-no-tests repository 1b4647f @milot-mirdita
  • gh-actions: move push-to-no-tests.yml into the right directory. 7fbb9c9 @nemequ
  • gh-actions: give up on getting commit ID in message for status repo 05ecb5d @nemequ
  • gh-actions: add missing jobs property ddd453a @nemequ
  • gh-actions, docker: add -fno-lax-vector-conversions to clang flags ccdfca9 @nemequ
  • gh-actions: add -ffast-math builds for GCC and clang de616e7 @nemequ
  • gh-actions: resume testing on aarch64 4d1639a @mr-c
  • gh-actions: cross-build & test powepc64le, s390x (later) f0f3d09 @mr-c
  • gh-actions: sleef: no ccache due to -march=native c709922 @mr-c
  • gh-actions: use ccache to speed up builds 73dddb7 @mr-c
  • gh-actions: clang 1[45]; gcc 12 on riscv64 with qemu e5c02d4 @mr-c
  • gh-actions: Resume running the mscv arm tests on all branches 782d816 @mr-c
  • gh-actions: Emscripten: temporarily only run "native" tests 1b6bde7 @mr-c
  • gh-actions: actionlint/shellcheck inspired cleanups 8182065 @mr-c
  • gh-actions qemu: resuming build+test on s390x cb6a0da @mr-c
  • gh-actions: drop cmake for meson. 9d69cff @mr-c

Travis

Currenttly non-functional, partially replaced by the s390x quemu GitHub Action build. See https://github.com/simd-everywhere/simde/issues/903 for the status of POWER9 (ppc64le)

  • travis: use -march=native and GCC on s390x 5b9b2af @nemequ
  • travis power9: try using all the cores to speed up b91516f @mr-c
  • travis: remove Travis completely 17a27e7 @nemequ
  • travis: bring back some Travis builds 0ec9926 @nemequ

Netlify

Currently broken

  • netlify: build amalgamated SVE header 41898ab @nemequ
  • netlify: deploy wasm/simd128.h aa29a8b @nemequ

Semaphore CI

Currently failing for old GCC-5

  • semaphore CI: fix test execution by using mason 1b05684 @mr-c

Misc

  • Improve abs function performance on SSE/SSE2 093f6ee @jpcima
  • Upgrade Hedley to v15 0d070e1 @nemequ
  • detect-clang: fix version numbers for clang < 4.0 8a2c645 @nemequ
  • align: add MCST LCC to compilers known to support __alignof__ 38e3840 @nemequ
  • common: add an MCST LCC check for vector features. e38fe50 @nemequ
  • complex: fix checks for GCC C complex math support ad8c7e0 @nemequ
  • Fix SIMDe link in no-tests README 21f7a2a @maxbachmann
  • common: enable OpenMP by default on LCC ff34d1b @nemequ
  • README: more thoroughly document OpenMP support 46c65e1 @nemequ
  • Add some files to .gitignore 8381a57 @nemequ
  • check-flags.sh: move download location from ~ to /opt/intel a361527 @nemequ
  • simde-features: fix C&P error 00fd88d @rosbif
  • {neon,simd128,avx512/abs}: provide vector versions of i64 abs d3976e0 @nemequ
  • common: improve check for C11 generic selections 11d2a6d @nemequ
  • common: don't use aligned OpenMP clause on MCST LCC a9a5a0d @nemequ
  • math: use simde_math_-prefixed abs/labs/llabs 813f4f0 @nemequ
  • diagnostic: silence -Wreserved-identifier warning from LLVM 0b6f5b2 @nemequ
  • Fix compilation with clang on POWER 5c43ac0 @nemequ
  • Work around issues preventing compilation on NVCC 3815c04 @nemequ
  • Don't set SIMDE_NO_CHECK_IMMEDIATE_CONSTANT in tests. 0c9fe4c @nemequ
  • common: move conversion functions for u32 <-> f32 into common 37e187c @nemequ
  • Add SIMDE_FAST_EXCEPTIONS option d01d58e @nemequ
  • Use SIMDE_HUGE_FUNCTION_ATTRIBUTES on several functions. 552c202 @nemequ
  • Add -s ENVIRONMENT=shell to emscripten flags 69d7655 @nemequ
  • Fix an assortment of small bugs 8b5d68c @simba611
  • Remove all && 0s in preprocessor macros. b6f21a9 @nemequ
  • Add constrained compilation mode a992f5b @simba611
  • Fix gcc-10 compilation on s/390x a10f12e @nemequ
  • simde-diagnostic: Include simde-arch 61cd8aa @Glitch18
  • Add many fast floating point to integer conversion functions 1fbe712 @nemequ
  • common: Use AArch64 intrinsics if _M_ARM64EC is defined 2a9e7b7 @tommyvct
  • Add -Wdeclaration-after-statement to the list of ignored warnings. bba815d @nemequ
  • Work around compound literal warning with clang 90523a2 @dgazzoni
  • Various fixes for -fno-lax-vector-conversions 39d902e @nemequ
  • Fix warnings with -fno-lax-vector-conversions e5ff228 @ngzhian
  • Improve widening pairwise addition implementations 3b950bb @nemequ
  • Wrap static assertions in code to disable -Wreserved-identifier d1fc7b5 @nemequ
  • Add missing static const in simde-math.h. NFC 6bd6562 @sbc100
  • wasm128, sse2: disable -Wvector-conversion when calling vgetq_lane_s64 679b970 @nemequ
  • test: skip NAN producing (sub-)tests for -ffast-math eb99f7c @mr-c
  • README: add CodeCov.io badge; freshen chat link 1c48030 @mr-c
  • emscripten; don't use __builtin_roundeven{f,} even if defined 51b9941 @mr-c
  • __builtin_signbit: add cast to double for old Clang versions 160c161 @mr-c

New Contributors

Full Changelog: https://github.com/simd-everywhere/simde/compare/v0.7.2...v0.7.4

v0.7.4-rc1

1 year ago

v0.7.2

3 years ago

Summary

Post v0.7.0 fixes; more portable implementations of neon intrinsics

Details

  • common: fix SIMDE_FLOAT64_C macro when SIMDE_FLOAT64_TYPE is defined 1d28a5d @rosbif
  • complex: split complex math out into separate header 0678336 @nemequ
  • diagnostic: silence a few -Weverything diagnostics on clang < 5 6f8d285 @nemequ

Implementation of NEON intrinsics:

  • neon/ceq: implement vceq{s_f32,d_f64} f4f42dc @nemequ
  • neon/abd: trivial formatting fix 0b8c8ca @nemequ
  • neon/abd: add missing scalar functions 517a613 @nemequ
  • neon/abs: add vabsd_s64 4091e3e @nemequ
  • neon/abs: vabsd_s64 wasn't added to GCC until 9.1.0 52051cb @nemequ
  • neon/add: implement vaddd_s64 and vaddd_u64 03d4d1b @nemequ
  • neon/cagt: implement vcagt{s_f32,d_f64} 731cf71 @nemequ
  • neon/c{ge,gt,le,lt}: some improved 64-bit comparisons 97f4dfb @nemequ
  • neon/ext: work around bug in GCC prior to 9.0 0c29a5f @nemequ
  • neon/padd: vpadd_f32 was buggy in older clang versions 623cbf7 @nemequ
  • neon/rnd: add NaN and ties to test suite fa950a2 @nemequ
  • neon/rndm: initial implementation 5bf93ad @nemequ
  • neon/rndn: initial implementation 2c624b5 @nemequ
  • neon/rndp: initial implementation 7f1f499 @nemequ
  • neon/uqadd: clang prior to 9 used incorrect types for the scalar funcs fa0eca0 @nemequ
  • neon/uzp1,neon/uzp2: change some dependencies from SSE to SSE2 c00a0e5 @rosbif

x86 intrinsics

SSE*

  • sse: fix overflow handling for simde_mm_cvt_ss2si a4658d8 @mr-c
  • sse: add SIMDE_MM_{GET,SET}_FLUSH_ZERO_MODE 340bf13 @nemequ
  • sse, sse2: add range checks to several conversion functions c3d7abf @nemequ
  • sse2: update test for simde_mm_set1_epi32 8854ede @nemequ
  • sse2: fix armv7 NEON implementation for simde_mm_shufflehi_epi16 338dac0 @nemequ
  • sse2: change some dependencies from SSE to SSE2 c00a0e5 @rosbif
  • sse2: fix potentially unused variable in loadu functions f43bfed @nemequ
  • sse2: use void* for destinations of loadu functions 98c63ae @nemequ
  • sse4.1: check for SHUFFLE_VECTOR before using it in _mm_cvtepu32_epi64 cb73aec @nemequ
  • sse4.2: some improved 64-bit comparisons 97f4dfb @nemequ

AVX

  • avx: use void* for destinations of loadu functions 98c63ae @nemequ

AVX512

  • permutex2var: fix some signed/unsigned mismatch warnings 951caa1 @nemequ
  • avx512/s{r,l}li: the imm8 paramters should be unsigned ecc388d @nemequ

XOP

  • xop: initial implementation 6cc0cef @nemequ
  • xop: add a bunch of NEON implementations b602fbc @nemequ
  • xop: fix NEON implementation of simde_mm_maccsd_epi16 8d499b5 @nemequ

Testing with Docker/Podman & CI

  • docker: add gdb and valgrind to installed packages 4500040 @nemequ
  • ci: move icc build from Travis to GitHub Actions 712f01a @nemequ
  • gh-actions: run on pull requests 43e7053 @mr-c
  • drone: re-organize drone builds 73fe36a @nemequ
  • drone: adjust branch triggers 9eba966 @nemequ
  • README: update CI information ca440ae @nemequ
  • circleci: add Circle CI 5d5350c @nemequ
  • circleci: actually build in 32-bit mode 4267926 @nemequ
  • cirrus: add Cirrus CI support 0212a07 @nemequ
  • cirrus: run asan/ubsan instead of just another GCC build a1c9f1d @nemequ
  • docker: allow for an optional persistent build directory 610fa3d @nemequ
  • gh-actions, semaphore: move GCC and clang builds to Semaphore 49d0d82 @nemequ
  • ci: disable ci/* builds for various providers 28f8775 @nemequ
  • travis: disable all builds 687851b @nemequ

Misc

  • cmake: don't explicitly list source files in the x86 directory 88c6f7e @nemequ
  • meson: link to libm if available 251bc0d @nemequ
  • simde-align: allow alignment > 8 on MSVC ≥ 19.16 (VS 2017) 0968271 @jsbache
  • README: fix a couple of outdated links 6001182 @nemequ

v0.7.0

3 years ago

Version 0.7.0 Summary

  • Portable implementation of the NEON intrinsics: 57% finished
  • Some more WASM implementations of x86 intrinsics
  • Various SSE*, AVX*, and SVML enhancements
  • Various new and improved implementations for AltiVec, Neon, POWER architectures.
  • The "new" SSE2 _mm_{load,store}u_si{16,32,64} intrinsics are now implemented along with the SSE _MM_HINT_* defines.
  • All of the CLMUL intrinsics have been implemented. "CLMUL_instruction_set" Wikipedia; CLMUL @ Intel Intrinsics Guide.

Please see the 0.7-rc-1 and 0.7.0-rc2 release notes for more details.

Changes since 0.7.0-rc2

Implementation of NEON intrinsics:

neon/orn: add AVX-512VL (ternarylogic) implementations d667aa8 @nemequ neon/ld3, neon/ld4: disable -Wmaybe-uninitialized on GCC eaaa71f @nemequ

x86 intrinsics

SSE*

sse: cast _MM_HINT_* values to enum _mm_hint on GCC 3f7e6f7 @nemequ

AVX512

avx512/permutex2var: add remaining intrinsics and translations 5d8d9d2

Misc

math: add modf 580e401 @nemequ

Cleanups of SIMDE_BUG_* definitions e090746 @mr-c