Library for specialized dense and sparse matrix operations, and deep learning primitives.
This release accumulated more than 1200 changes since the last release and is a major preparation for our future v2 of the library. Beside stability improvements, refining existing functionality and bug-fixes, there were several introductions of new functionality: packed/compact data layout functions for solving linear equations, new flavors of SMM-kernels along with relaxed limitations (transb), and overall support for low-precision based on the Bfloat16 FP-format.
The Deep Learning (DL) domain is still under active research and development including co-design. The API however is rather stable (DLv2 since v1.8) with an implementation that continues to receive major development. Towards LIBXSMMv2, the DL domain will undergo major code reduction (implementation) while providing the same or more functionality (first sign is the removal of the Winograd code in this release).
THANK YOU FOR YOUR CONTRIBUTION - we had again several direct (and indirect) contributions, reports, and involvement from people who came across the project. We would like to thank you all for the effort and time you spent working on Open Source!
INTRODUCED
TransB=T
is now allowed (in addition to TransB=N
).LIBXSMM_DUMP_BUILD=1
).IMPROVEMENTS
CHANGES
FIXES
Note about platform support: an explicit compile-error (error message) is generated on platforms beside of Intel (or compatible processors) since upstreamed code was reported to produce "compilation failure". Beside of the introduced artificial error, any platform is supported with generic code (tested with ARM cross-compiler). Of course, any Open Source contribution to add JIT support is welcome.
Note about binary compatibility: LIBXSMM's API for Small Matrix Multiplications (SMMs) is stable, and all major known applications (e.g., CP2K, EDGE, NEK5K, and SeisSol) either rely on SMMs or are able (and want) to benefit from an improved API of the other domains (e.g., DL). Until at least v2.0, binary compatibility is not maintained (SONAME version goes with the semantic version).
Development accumulated many changes since the last release (v1.9) as this version (v1.10) kept slipping because of validation was not able to keep up and started over several times. On the positive side this may allow to call it the "Supercomputing 2018 Edition" which is complemented by an updated list of references including the SC'18 paper "Anatomy of High-Performance Deep Learning Convolutions on SIMD Architectures". Among several external articles, the Parallel Universe Magazine published "LIBXSMM: An Open Source-Based Inspiration for Hardware and Software Development at Intel".
The intense development of LIBXSMM brought many improvements and detailed features across domains as well as end-to-end support for Bfloat16 in LIBXSMM's Deep Learning domain (DL). The latter can be already exercised with the GxM framework which was added to the collection of sample codes. Testing and validation were updated for latest compilers and upcoming Linux distributions. FreeBSD is now formally supported (previously it was only tested occasionally). RPM-, Debian- and FreeBSD package updates will benefit from the smoothed default build-targets and compiler flags.
LIBXSMM supports "one build for all" while exploiting the existing instructions set extensions (CPUID based code-dispatch). Developers may enjoy support for pkg-config (.pc
files in the lib
folder) for easier linkage when using the Classic ABI (e.g., PKG_CONFIG_PATH=/path/to/libxsmm/lib pkg-config libxsmm --libs
).
THANK YOU FOR YOUR CONTRIBUTION - we had several direct (and indirect) contributions, reports, and involvement from people who came across the project. We would like to thank you all for the effort and time you spent working on Open Source!
INTRODUCED
IMPROVEMENTS / CHANGES
make install
).FIXES
This release enables JIT-code generation of small matrix multiplications for SSE3 targets. Previously, only AVX and beyond has been supported using JIT code. SSE JIT-code generation is only supported for the MM domain (matrix multiplication). The compatibility of the library has been further refined and fine-tuned. The application binary interface (ABI) narrowed from above 500 functions down to ~50% due to adjusted symbol visibility. This revision prepares for a smooth transition to v2.0 and really internalizes low-level internals (descriptor handling, etc.), and two deprecated functions have been removed. More prominent, prefetch enumerators have been renamed e.g., LIBXSMM_PREFETCH_AL2 renamed to LIBXSMM_GEMM_PREFETCH_AL2.
INTRODUCED
IMPROVEMENTS / CHANGES
FIXES
Overview: while v1.9 is in the works, this release fixes two issues, and pushes for an improved (OSX w/ Intel Compiler) and wider OS/Compiler coverage (MinGW, BSD, see Compatibility). Among minor or exotic issues resolved in this release, the stand-alone JIT-generated matrix transposes (out-of-place) are now limited to matrix shapes such that only reasonable amounts of code are generated. There has been also a rare synchronization issue reproduced with CP2K/smp in LIBXSMM v1.8.1 (and likely earlier), which is resolved since the previous release (v1.8.2).
JIT code generation/dispatch performance: JIT-generating code (non-transposed GEMMs) is known to be blazingly fast, which this release (re-)confirms with the extended dispatch microbenchmark: single-threaded code generation (uncontended) of matrix kernels with M,N,K := 4...64 (equally distributed random numbers) takes less than 25 µs on typical systems, and non-cached code dispatch takes less than 50x longer than calling a function that does nothing whereas cached code-dispatch takes less than 15x longer than an empty function (code dispatch is roughly three orders of magnitudes faster than code generation i.e., Nanoseconds vs. Microseconds).
INTRODUCED
IMPROVEMENTS / CHANGES
FIXES
This last release of the 1.8.x line (before 1.9) accumulated a large number of changes to tweak interfaces, and to generally improve usability. The documentation vastly improved and extended, is more structured, and also available per ReadtheDocs (with online full-text search). In preparation of a fully revised implementation of the DNN API (rewrite), the interface of the DNN domain (Tensor API) changed in an incompatible way (our policy should have delayed this to v1.9). However, the current main user of the DNN API has been updated (integration with TensorFlow). Also notable, v1.8.2 introduces JIT-code generation with Windows call-convention (support limited to 4-argument kernels i.e., no prefetch signature for the MM domain, and no support for DNN/convolution kernels).
INTRODUCED
CHANGES
FIXES
This release brings some new features (matcopy/2d-copy and tcopy based on JIT-generated code) as well as a number of bug fixes (TGEMM), improvements (KNM), and refinements (LIBXSMM_GEMM_WRAP control, etc). Given the completed copy/transpose support, this release prepares for a complete stand-alone GEMM routines.
INTRODUCED
CHANGES
FIXES
This set of changes brings the Padding API to life and implements the necessary mechanisms to cover a wider range of cases. This may allow to run a larger variety of TensorFlow workloads using LIBXSMM. The implementation also brings Winograd-based convolutions (chosen automatically when using LIBXSMM_DNN_CONV_ALGO_AUTO). Moreover, support for the Intel Xeon Phi processor code-named "Knights Mill" ("KNM") has been added (QFMA and VNNI instructions can be executed using the Intel SDE).
INTRODUCED
CHANGES
FIXES
This release finishes the memory allocation interface and documents the two memory allocation domains (default and scratch). Otherwise this release focuses on code quality (sample code) with no fixes or breaking changes when compared to version 1.7.
INTRODUCED
CHANGES
FIXES
This version releases a revised DNN API to better suit with an upcoming TensorFlow integration. There is also some foundation laid out to distinct scratch memory from regular/default memory buffers.
INTRODUCED
CHANGES
FIXES
This is a bug-fix release with focus on the SPMDM domain. There are also a number of code quality improvements. This is potentially the last 1.6.x release with a number of API changes scheduled for the DNN domain (v1.7).
INTRODUCED
CHANGES
FIXES