Library for specialized dense and sparse matrix operations, and deep learning primitives.
This release is porting master/main's build-system back to v1.16. The necessary code changes have been minimized. However, since some non-trivial code changes are required, the release is labeled v1.17. The release became necessary due to the aging 1.16-line of code and new compilers emerged since then. For example, issues like #562 are among similar issues when using, e.g., GNU GCC 10.x or 11.x.
Note: version v1.17 leverages the same code base as version v1.16x. All new features, fixes, and development progress remain unreleased. As per LIBXSMM's policy to keep the master/main branch stable, one can take the latter to leverage new features, fixes, and development progress.
INTRODUCED
IMPROVEMENTS / CHANGES
The build systems controls several options, and generally the set of options evolved since v1.16, which is the main reason for code changes. A positive impact of more changes is thorough (re-)validation. This release adjusted to LIBXSMM's evolved test environment (1.16.x cannot be revalidated). Code validation of v1.17 again reaches the level of the original v1.16 and further includes new compiler available since then.
This update promotes fixes in LIBXSMM's master/main branch and resolves two CVEs. Version 1.16.3 continues to leverage the same code base as version 1.16.2 and 1.16.1. All new features, fixes, and development progress remain unreleased. As per LIBXSMM's policy to keep the master/main branch stable, one can rely on the latter to leverage new features, fixes, and development progress.
IMPROVEMENTS / CHANGES / FIXES
This minor update can resolve an issue where the OS installation (on a legacy system) does not signal about saving the state for contexts using instruction set extensions like SSE. The problem was resolved in LIBXSMM's main development branch already a long time ago. The problem was discovered in certain Virtual Machine installations (VMs) as well as on some OS installations (e.g., here).
INTRODUCED
IMPROVEMENTS / CHANGES / FIXES
Note: version 1.16.2 leverages the same code base as version 1.16.1 (except for a single line of code applying above mentioned fix). All new features, fixes, and development progress remain unreleased. As per LIBXSMM's policy to keep the master/main branch stable, one can take the latter to leverage new features, fixes, and development progress.
This (minor) release fixes the issues mentioned below as well as improving on platform support.
THANK YOU to the Department of Chemistry at University of Zurich for generously covering Cray system access.
IMPROVEMENTS / CHANGES
77blas.h
(issue)FIXES
libxsmm_xdiff
and removed _Bool
dependency (issue).This is a maintenance release which is meant to capture the project´s continuous development into a stable release. A validated release allows our users to leverage several improvements and fixes (see below) especially in the light of upcoming new features.
THANK YOU FOR YOUR CONTRIBUTION - your contribution matters! This project received several contributions whether as pull request, issue report, feature suggestion, or as an informal inquiry. We would like to thank you for your effort and time spent for Open Source software!
INTRODUCED
IMPROVEMENTS / CHANGES
FIXES
Version 2.0 was our anticipated next release. With v1.15 the goal is to flawlessly upstream LIBXSMM with OS-distributions that soon start building packages with GNU GCC 10 (further details).
Beyond new compiler support, LIBXSMM received a slight but consistent performance improvement even for core-functionality, namely SMM-kernels including batch-reduce. The DNN domain was developed the most and continues to deliver like a rolling release. The DNN backend broadened support for low/mixed-precision kernels and kernel-fusion (batch-reduce plus-X as used by convolutional neural networks).
INTRODUCED
FP64
, FP32
, bfloat16
, int16
, and int8
. Low-precision support exists in several type-combinations with respect to input and accumulation type leveraging AVX-512 extensions (VNNI and Bfloat16).module av
).IMPROVEMENTS / CHANGES
FIXES
This release brings notable fixes and improvements (see below) prior to merging our reworked DL backend. This version is likely the last release of our 1.x series. For the upcoming major release of LIBXSMM, the API remains compatible for core functionality except for the DL domain. Even for the DL domain, there are only API adjustments rather than big changes (straight forward or minor).
THANK YOU FOR YOUR CONTRIBUTION: jewillco, yurivict, antoscha, breuera, jeremylt, HiSPEET, and legrosbuffle. We would like to thank all direct contributors as well as people who informally spent effort and time for this Open Source software!
INTRODUCED
IMPROVEMENTS / CHANGES
FIXES
This release delivers improvements made to the build system and internal structures. The main purpose is to continuously deliver smooth build and run experience for latest OS environments.
THANK YOU FOR YOUR CONTRIBUTION - your contribution matters! This project received direct (and indirect) contributions whether as issue report, feature suggestion, or involvement from people who came across the project. We would like to thank you all for the effort and time you spent for Open Source software!
IMPROVEMENTS / CHANGES
FIXES
This release fixes issues related to the prefix directory inside of the pkg-config files which affected maintainer builds (Linux and FreeBSD package), the package manager Spack, or people using pkg-config to determine build/linker flags. In addition, some presets are made to smooth maintainer builds under FreeBSD.
IMPROVEMENTS / CHANGES
FIXES
This release aims to improve usability along with resolving several non-critical bugs. Beyond this, an implementation of the BLAS(-like) batched GEMM has been added (?GEMM_BATCH
). The interface currently only supports the C/C++ language. However, it can be called implicitly (Fortran 77 like) or used by intercepting existing calls (static and dynamic linkage).
LIBXSMM has an interface for batched GEMMs since several versions supporting pointers as well as arrays of indexes plus Byte-sized strides to extract data from arrays of structures (AoS). The new BLAS interface only supports straight arrays of pointers to operand matrices but allows multiple groups of homogeneous batches. All batch interfaces are implemented in sequential (ST) and multi-threaded (MT) form plus synchronization in case of MT.
INTRODUCED
IMPROVEMENTS / CHANGES
FIXES