BLAS-like Library Instantiation Software Framework
This release contains several new features and optimizations related to threaded execution, as well as internal changes that improve maintainability and lay the groundwork for future refactoring. The build system and kernel sets saw lots of new code and tweaks to old code, and of course there were many bugfixes.
bli_pthread_switch_t
API. (Field Van Zee, Devin Matthews)bli_init()
to use TLS where feasible. (Field Van Zee, Edward Smyth, Minh Quan Ho)gemm
, gemmt
, and trmm
macrokernels. (Field Van Zee, Devin Matthews, Leick Robinson, Minh Quan Ho)thrcomm_t
fields to avoid false sharing of cache lines. (Leick Robinson)rntm_t
management code. (Field Van Zee, Devin Matthews)rntm_t
nt/ways fields with 1 (not -1). (Field Van Zee, Jeff Diamond, Leick Robinson, Devin Matthews)invscalv
, invscalm
, invscald
operations.NaN
/Inf
handling in sumsqv
. (Devin Matthews)rntm_t
), and mem_t
(from the cntl_t
) to the thrinfo_t
object.thrinfo_t
tree. (Devin Matthews)bli_packm_blk_var1.c
. (Devin Matthews)bli_l3_determine_kc()
. (Devin Matthews)cntx_t
pointer caching in gks. (Field Van Zee, Harihara Sudhan S)const
keyword to pointers in kernel APIs. (Field Van Zee, Nisanth M P)void*
pointers.BLIS_ONE_I
, BLIS_MINUS_ONE_I
, BLIS_NAN
. (Devin Matthews)gemmsup
kernels. (Devin Matthews)lt
, lte
, gt
, gte
operations and other miscellaneous updates.INSERT_
macro sets via variadic macros. (Devin Matthews)gemmt
, trmm
, and trsm
to match that of gemm
. (Devin Matthews)bli_l3_sup_var1n2m.c
and unified _sup_packm_a/b()
. (Devin Matthews)herk
/her2k
/syrk
/syr2k
. (Devin Matthews)trmm[3]
/trsm
performance bug introduced in cf7d616
. (Field Van Zee, Leick Robinson)hemm
/symm
. (Field Van Zee, Nisanth M P)bli_cntx_set_ukr_prefs()
. (Field Van Zee, Leick Robinson, Devin Matthews, Jeff Diamond)sizeof(type)
in edge case macros. (@moon-chilled)bli_pool.c
. (Devin Matthews)VEXTRACTF64X2
in bli_x86_asm_macros.h
. (Harsh Dave)bli_type_defs.h
where BLIS_BLAS_INT_TYPE_SIZE
was misspelled. (Devin Matthews)printf()
args in bli_thread_range_tlb.c
to avoid compiler warnings. (Lee Killough)bli_l3_check.c
.const
to all interfaces above the (micro)kernels. (Devin Matthews)xpbys
in gemm macrokernel.[cz]symv_()
, [cz]syr_()
, [cz]rot_()
. (Field Van Zee, James Foster)BLIS_DISABLE_BLAS_DEFS
is defined. (Field Van Zee, Edward Smyth, Devin Matthews)bli_config.h
before bli_system.h
in cblas.h
so that BLIS_ENABLE_SYSTEM
is defined in time for proper OS detection. (Edward Smyth)armsve
kernels. (RuQing Xu)dgemmsup
with extended MR and NR. (RuQing Xu)packm
kernels are stored within the cntx_t
so that BLIS only stores two packm
kernels per datatype: one for MRxk upanels and one for kxNR upanels. (Devin Matthews)scal2v
reference kernel when alpha == 1.haswell
gemmsup
kernels. (Daniël de Kok, Bhaskar Nallani, Madeesh Kannan)power10
microkernels. (Nisanth M P)power10
kernels other than sgemm
, dgemm
. (Nisanth M P)bli_gemm_small()
prototype mismatch. (Jeff Diamond)gemmlike
sandbox.power10
sandbox. (Nisanth M P)gemmlike
sandbox bug that stems from reuse of bli_thrinfo_sup_grow()
.altra
and altramax
. (Jeff Diamond, Leick Robinson)-mabi=
during RISC-V builds. (Lee Killough)sifive_x280
subconfig and kernel set. (Aaron Hutchinson, Lee Killough, Devin Matthews, and Angelika Schwarz)configure
. (Devin Matthews)--disable-tls
. (Field Van Zee, Nick Knight)-lrt
on Android with Bionic libraries. (Lee Killough)-fPIC
option when shared library build is disabled. (Field Van Zee, Nick Knight)-fPIC
option insertion to subconfigs' make_defs.mk
files. (Field Van Zee, Nick Knight)INCDIR
prefix so that user can #include "blis.h"
instead of #include <blis/blis.h>
and/or "cblas.h"
instead of <blis/cblas.h>
if CBLAS is enabled). (Field Van Zee, Jed Brown, Devin Matthews, Mo Zhou)nvc
) in configure
. (Ajay Panyala)zen3
subconfig to support NVHPC compilers. (Abhishek Bagusetty)kernels
subdirs in addons. (AMD, Mithun Mohan)power
umbrella configuration family (which currently includes power9
and power10
subconfigs). (Nisanth M P)BLIS_VERSION_STRING
in blis.h
instead of via command line argument during compilation. (Field Van Zee, Mohsen Aznaveh, Tim Davis)regen-symbols.sh
as gen-libblis-symbols.sh
. (Field Van Zee)clang
targetting MinGW. (Isuru Fernando)/proc/cpuinfo
) for POWER7, POWER9 and POWER10 microarchitectures. (Alexander Grund)#line
directives to flattened blis.h
to facilitate easier debugging. (Devin Matthews)--nosup
and --sup
shorthand options to configure
.configure --help
output. (Lee Killough)configure
to pass all shellcheck
checks. (Lee Killough).dir-locals.el
to enhance emacs formatting of C files. (Lee Killough)power10
subconfig. (Field Van Zee, Nicholai Tukanov)#include <io.h>
for Windows. (@h-vetinari)firestorm
(Apple M1) subconfig. (Devin Matthews)znver3
, which needs gcc >= 10.3. (Jed Brown)configure --help
text. (Lee Killough)grep
.output.testsuite
to .gitignore
.test/3
drivers to take parameters via command line arguments. (Field Van Zee, Jeff Diamond, Leick Robinson)arm64
entry to .travis.yml
so that Travis CI will compile/test ARM builds. (Field Van Zee, RuQing Xu)gemmlike
sandbox via AppVeyor. (Jeff Diamond)-q
quiet mode option to testsuite.test/3
drivers. (Field Van Zee, Leick Robinson)cblat1
or zblat1
are linked with a build of BLIS that was compiled with --complex-return=intel
. (Bart Oldeman)docs/Discord.md
) and logo to README.md
.mm_algorithm
files (for bp and pb) to docs/diagrams
.README.md
.docs/Multithreading.md
.This release contains a slew of improvements, new kernels and APIs, bugfixes, and more (including lots of code reduction). It also contains foundational support for an exciting new class of expert functionality: creating new operations without the need to duplicate the middleware that sits between the API and kernels.
obj_t
that relate to storing function pointers to custom packm
kernels, microkernels, etc as well as accessor functions to set and query those fields. (Devin Matthews)packm
microkernels and variants via the aforementioned new obj_t
fields. (Devin Matthews)gemm
and gemmtrsm
microkernels. This also required updating of APIs and definitions of all existing microkernels in kernels
directory. Edge-case handling functionality is now facilitated via new preprocessor macros found in bli_edge_case_macro_defs.h
. (Devin Matthews)gemmsup
thread barriers when not packing A or B. This boosts performance for many small multithreaded problems. (Field Van Zee, AMD)herk
, her2k
, syrk
, syr2k
in terms of gemmt
. (Devin Matthews)setijv
and getijv
to set/get vector elements.eqsc
, eqv
, and eqm
operations to test equality between two scalars, vectors, or matrices.setijm
and getijm
to prevent use of negative indices.membrk
files/variables/functions to pba
.err_t*
"return" parameter to bli_malloc_*()
and friends.sba
and pba
to static initialization.bli_pack_get_pack_a()
, bli_pack_get_pack_b()
.bli_init()
to be called more than once (without segfaulting). (@lschork2, Minh Quan Ho, Devin Matthews)bli_pool_finalize()
that prevented BLIS from being re-initialized. (AMD)pool_t
-growing logic in bli_pool.c
, and always allocate at least one element in .block_ptrs
array. (Minh Quan Ho)bli_error.c
. (Minh Quan Ho)bli_macro_defs.h
to a new header, bli_lang_defs.h
.BLIS_SIMD_NUM_REGISTERS
to BLIS_SIMD_MAX_NUM_REGISTERS
and BLIS_SIMD_SIZE
to BLIS_SIMD_MAX_SIZE
for improved clarity. (Devin Matthews)?axpby_()
and ?gemm_batch_()
. (Meghana Vankadari, AMD)gemm3m
APIs to BLAS and CBLAS layers. (Bhaskar Nallani, AMD)?gemm_()
invocations where m or n is unit by calling ?gemv_()
. (Dipal M Zambare, AMD)bli_slamch()
and bli_dlamch()
to use constants from standard C library rather than values computed at runtime. (Devin Matthews)a64fx
subconfiguration that uses empirically-tuned blocksizes (Stepan Nassyr, RuQing Xu)armsve
subconfig that computes blocksizes via an analytical model. (Stepan Nassyr)gemm
kernels for Arm SVE. (Stepan Nassyr)gemmsup
kernels to the armv8a
kernel set for use in new Apple Firestorm subconfiguration. (RuQing Xu)dpackm
kernels (16xk and 10xk) with in-register transpose. (RuQing Xu)dpackm
kernels by Linaro Ltd. to 512-bit for size 12xk. (RuQing Xu)bli_gemm_armv8a_asm_d6x8.c
to accommodate clang. (RuQing Xu)saxpyf
/daxpyf
/caxpyf
kernels to zen
kernel set. (Dipal M Zambare, AMD)vzeroupper
instruction to haswell
microkernels. (Devin Matthews)beta == 0
handling in s/d armsve
and armv7a
gemm
microkernels. (Devin Matthews)kappa_i
in the two assembly cpackm
kernels in haswell
kernel set. (Devin Matthews)gemmsup
haswell
kernels whereby the vhaddpd
instruction is used with uninitialized registers. (Devin Matthews)power10
microkernel I/O. (Nicholai Tukanov)gemmlike
sandbox to allow rapid prototyping of gemm
-like operations.power10
sandbox, including a new testsuite. (Nicholai Tukanov)configure
option --[en|dis]able-amd-frame-tweaks
that allows BLIS to compile certain framework files (each with the _amd
suffix) that have been customized by AMD for improved performance (provided that the targeted configuration is eligible). By default, the more portable counterparts to these files are compiled. (Field Van Zee, AMD)is_win
) for Windows in configure
. (Devin Matthews)-march=haswell
instead of -march=skylake-avx512
on Windows. (Devin Matthews, @h-vetinari)configure
breakage on MacOSX by accepting either clang
or LLVM
in vendor string. (Devin Matthews)armsve
subconfig.configure
option to control whether or not to use @rpath
. (Devin Matthews)configure
. (Devin Matthews)@path
-based install name on MacOSX and use relocatable RPATH
entries for testsuite binaries. (Devin Matthews)CC
, CXX
, FC
, PYTHON
, AR
, and RANLIB
, configure
will now print an error message and abort if a user specifies a specific tool and that tool is not found. (Field Van Zee, Devin Matthews)blis.pc.in
for out-of-tree builds. (Andrew Wildman)copyv
, setv
, and swapv
kernels in zen
subconfig. (Dipal M Zambare, AMD)firestorm
. (RuQing Xu)armsve
subconfig to arm64
configuration family. (RuQing Xu)thunderx2
subconfiguration. (Devin Matthews)configure
. (Chengguo Sun)blis.h
file for the BLIS and BLAS testsuite objects. (Devin Matthews)xerbla_()
as a "weak" symbol on MacOSX. (Devin Matthews)common.mk
whereby the header path to cblas.h
was omitted from the compiler flags when compiling CBLAS files within BLIS.sed
script to build
directory.configure
, common.mk
, and others.make check
. (Devin Matthews)make install
in Travis CI. (Devin Matthews)blis.h
is C++-compatible. (Devin Matthews)test/3
to be robust against missing datasets as well as to fixed a few minor issues.test_axpbyv.c
and test_gemm_batch.c
test driver files to test
directory. (Meghana Vankadari, AMD)her
, her2
, herk
, and her2k
drivers in test
directory. (Madan mohan Manokar, AMD)setijv
, getijv
, eqsc
, eqv
, eqm
.docs/Addons.md
.README.md
.README.md
.docs/Sandboxes.md
.docs/Multithreading.md
. (Devin Matthews)docs/KernelHowTo.md
.docs/Performance.md
to report Fujitsu A64fx (512-bit SVE) results. (RuQing Xu)docs/Performance.md
to report Graviton2 Neoverse N1 results. (Nicholai Tukanov)docs/FAQ.md
with new questions.docs/FAQ.md
. (Gaëtan Cassiers)