Libxsmm Versions Save

Library for specialized dense and sparse matrix operations, and deep learning primitives.

1.17

2 years ago

This release is porting master/main's build-system back to v1.16. The necessary code changes have been minimized. However, since some non-trivial code changes are required, the release is labeled v1.17. The release became necessary due to the aging 1.16-line of code and new compilers emerged since then. For example, issues like #562 are among similar issues when using, e.g., GNU GCC 10.x or 11.x.

Note: version v1.17 leverages the same code base as version v1.16x. All new features, fixes, and development progress remain unreleased. As per LIBXSMM's policy to keep the master/main branch stable, one can take the latter to leverage new features, fixes, and development progress.

INTRODUCED

  • Validated with compilers released after the original v1.16 (GNU GCC 10.x, 11.x, and several Clang releases).

IMPROVEMENTS / CHANGES

  • Improved default for static code-paths using certain ISA extensions (no need to adjust INTRINSICS setting).

The build systems controls several options, and generally the set of options evolved since v1.16, which is the main reason for code changes. A positive impact of more changes is thorough (re-)validation. This release adjusted to LIBXSMM's evolved test environment (1.16.x cannot be revalidated). Code validation of v1.17 again reaches the level of the original v1.16 and further includes new compiler available since then.

1.16.3

2 years ago

This update promotes fixes in LIBXSMM's master/main branch and resolves two CVEs. Version 1.16.3 continues to leverage the same code base as version 1.16.2 and 1.16.1. All new features, fixes, and development progress remain unreleased. As per LIBXSMM's policy to keep the master/main branch stable, one can rely on the latter to leverage new features, fixes, and development progress.

IMPROVEMENTS / CHANGES / FIXES

  • CVE-2021-39535
  • CVE-2021-39536

1.16.2

2 years ago

This minor update can resolve an issue where the OS installation (on a legacy system) does not signal about saving the state for contexts using instruction set extensions like SSE. The problem was resolved in LIBXSMM's main development branch already a long time ago. The problem was discovered in certain Virtual Machine installations (VMs) as well as on some OS installations (e.g., here).

INTRODUCED

  • New functionality and new features continue to remain with LIBXSMM's main revision (under development).

IMPROVEMENTS / CHANGES / FIXES

  • Adopt code-path even if OS does not properly signal support for an ISA extension.

Note: version 1.16.2 leverages the same code base as version 1.16.1 (except for a single line of code applying above mentioned fix). All new features, fixes, and development progress remain unreleased. As per LIBXSMM's policy to keep the master/main branch stable, one can take the latter to leverage new features, fixes, and development progress.

1.16.1

3 years ago

This (minor) release fixes the issues mentioned below as well as improving on platform support.

THANK YOU to the Department of Chemistry at University of Zurich for generously covering Cray system access.

IMPROVEMENTS / CHANGES

  • Muted compiler warnings caused by separate OpenMP runtime (Clang based tool chains).
  • Sample code: prevent OpenBLAS undefined type when including 77blas.h (issue)

FIXES

  • Fixed compilation and runtime issues with Clang-based Cray Compiler as well as Cray Classic Compiler.
  • Revised Fortran implementation of libxsmm_xdiff and removed _Bool dependency (issue).

1.16

3 years ago

This is a maintenance release which is meant to capture the project´s continuous development into a stable release. A validated release allows our users to leverage several improvements and fixes (see below) especially in the light of upcoming new features.

THANK YOU FOR YOUR CONTRIBUTION - your contribution matters! This project received several contributions whether as pull request, issue report, feature suggestion, or as an informal inquiry. We would like to thank you for your effort and time spent for Open Source software!

INTRODUCED

  • Zero-config for all platforms with absolutely no configuration needed for header-only. Simplifies using Visual Studio as no up-front configuration or in-build custom steps are needed. Simplifies 3rd-party build systems incorporating LIBXSMM for both header-only and classic ABI.
  • Updated Hello LIBXSMM, and added code examples for C/C++ and Fortran, included minimal "support" for Bazel (request). The latter is not meant to change our Makefile based build setup but can rather help to get people started who prefer Bazel.
  • Fortran interface for user-data dispatch and a Fortran code sample using this interface to dispatch multiple kernels at once. The C interface was introduced earlier (v1.15).
  • Experimental: element-wise kernels with matrix elements (meltw), e.g., to scale, reduce, type-convert, etc.

IMPROVEMENTS / CHANGES

  • Extended [list of applications](https://libxsmm.readthedocs.io/#applications and projects) using LIBXSMM. Our documentation also lists applications among popular categories (at the bottom of the left-hand side menu).
  • Fixed performance bug in matcopy routine; added microbenchmarks.
  • Improved verbose output (watermarks, additional warnings).
  • Disabled memory wrapper at compile-time (opt-in only).
  • Fully moved to Python3 shebang (fallback to Python2).
  • Improved Fortran interface (overloads, etc.).
  • Further improved support for GNU GCC 10.
  • Extended sparse functionality.

FIXES

  • Avoid manipulating GNU´s feature flags (improves header-only library).
  • Fixed detecting Intel VTune 2020 (SYM=1 with source'd profiler).
  • Consistently emit unaligned LD/ST (intrinsics based code).

1.15

4 years ago

Version 2.0 was our anticipated next release. With v1.15 the goal is to flawlessly upstream LIBXSMM with OS-distributions that soon start building packages with GNU GCC 10 (further details).

Beyond new compiler support, LIBXSMM received a slight but consistent performance improvement even for core-functionality, namely SMM-kernels including batch-reduce. The DNN domain was developed the most and continues to deliver like a rolling release. The DNN backend broadened support for low/mixed-precision kernels and kernel-fusion (batch-reduce plus-X as used by convolutional neural networks).

INTRODUCED

  • Small matrix multiplication and batch-reduce kernels are available for the following input types FP64, FP32, bfloat16, int16, and int8. Low-precision support exists in several type-combinations with respect to input and accumulation type leveraging AVX-512 extensions (VNNI and Bfloat16).
  • New C-APIs (Fortran to follow): (1) kernel introspection, takes kernel-function pointer, fills info-structure with FLOPS-count, code-size, and more; no search overhead, (2) register user-defined data with LIBXSMM's fast key-value database/query, e.g., to lower dispatch overhead for multiple kernels used in one task.
  • Fortran API: more flavors of certain generic procedures; can potentially avoid temporary values due to exact match (procedure overload).
  • Example vectorizing along finite elements (DGFEM) using LIBXSMM for sparse weight matrices.
  • Example showing sparse weight matrix multiplication (deep learning).
  • Reproducer for next-gen. CP2K/collocate implementation.
  • Module file generated during build (module av).

IMPROVEMENTS / CHANGES

  • Allow to omit full configure step under Windows; improved build VS support. Note, Windows calling convention is still pending but in the works. Necessary state is currently not call-preserved, which may or may not work (as a workaround it may help to use wrapper-functions for LIBXSMM's kernels).
  • Dropped code generation for convolutions, which are now based on batch-reduce kernels, and revised batch-reduce API to support (1) absolute addresses like in previous releases, (2) relative offsets/indexes, and (3) constant/identical offset/stride.
  • LIBXSMM/EXT: OpenMP support under macOS (w/ Apple's LLVM based compiler).
  • Entire code base of LIBXSMM uses SPDX-License-Identifier (BSD-3-Clause).
  • Verbose message about timer accuracy (virtualized platforms).
  • Generally improved verbosity (insight/detail, and accuracy).
  • New instructions support in backend.
  • Slightly lowered dispatch overhead.
  • NUMA-aware GxM framework.

FIXES

  • Issues #334, #347, #371, #368, and #369.
  • Zero defects as of Synopsys Coverity.
  • Rebuild issue (build system).
  • Library initialization.

1.14

4 years ago

This release brings notable fixes and improvements (see below) prior to merging our reworked DL backend. This version is likely the last release of our 1.x series. For the upcoming major release of LIBXSMM, the API remains compatible for core functionality except for the DL domain. Even for the DL domain, there are only API adjustments rather than big changes (straight forward or minor).

THANK YOU FOR YOUR CONTRIBUTION: jewillco, yurivict, antoscha, breuera, jeremylt, HiSPEET, and legrosbuffle. We would like to thank all direct contributors as well as people who informally spent effort and time for this Open Source software!

INTRODUCED

  • Native PROCEDURE types for generic 3-/6-arguments (arity) functions (Fortran interface).
  • Intercepted memory allocation for applications based on LIBXSMM's scratch memory.
  • LIBXSMM guarantees non-NULL kernels for valid requests since several versions.
    Empty shape requests are now considered valid (SMM, MCOPY, and TCOPY).
  • Getting Started section added to documentation ("Hello LIBXSMM").

IMPROVEMENTS / CHANGES

  • Termination statistic now distinct SMMs and degenerated SMMs (GEMV).
  • Support Immintrin-debug (https://github.com/intel/Immintrin-debug).
  • Emit warning if compiler support only enables low-resolution timers.
  • Support PGI Compiler based on GNU GCC settings; still some issues.
  • Generally enable ISA extensions even if not permitted by OS (XSAFE).
  • Enforce AVX-512 under OSX i/Mac Pro (OSX: XSAFE/ZMM disabled).
  • VTUNE=0: disables profiler support (even if detected and SYM=1).
  • Memory info to handle foreign pointers (not allocated by library).
  • Scratch memory allocation: avoid unnecessary warning (verbose).
  • Improved scratch memory allocation statistics (watermark, etc.).
  • Implemented exit-handler for Fortran programs using STOP.
  • Avoid compiler warnings previously suppressed by flags.
  • Make: only permit matching static/shared library builds.
  • Accommodate Clang based compiler under Windows.
  • Improved RNG performance for very short sequences.
  • Updated Visual Studio projects and setup (VS2019).
  • Updated and revised documentation.
  • Updated articles and applications.
  • Contribution #355 incorporated.
  • Lowered dispatch overhead.

FIXES

  • Fixed issue (2019/02/24) dispatching compiler-generated code (affected SpMDM and DL).
  • Fixed casting literal -1 to an unsigned integer when 64-bits were intended.
  • Resolved issue related to structure alignment/padding/copy (CCE).
  • Potentially invalid kernel cache with concurrently finalized library.
  • Potentially treated non-OpenMP lock as OpenMP lock.
  • Avoid potentially recursive locking at termination.
  • Fixed potential hang with header-only.
  • Incorrect LDC for intercepted GEMV.
  • Issues fixed: #340 and #347.

1.13

4 years ago

This release delivers improvements made to the build system and internal structures. The main purpose is to continuously deliver smooth build and run experience for latest OS environments.

THANK YOU FOR YOUR CONTRIBUTION - your contribution matters! This project received direct (and indirect) contributions whether as issue report, feature suggestion, or involvement from people who came across the project. We would like to thank you all for the effort and time you spent for Open Source software!

IMPROVEMENTS / CHANGES

  • Fortran: enabled libxsmm_ptr* to eventually return C_NULL_PTR.
  • Avoid treating Spack environment as maintainer build (apply SSE4 flags).
  • Renamed structure-of-array (SOA) dense routines into "packed".
  • Internal preparation for upcoming features (memory allocation).
  • Improved build system (most recent OS environments).

FIXES

  • Precondition for working around missing _Float128 definition (#339).
  • Conceptionally avoid accessing a zero-sized array (Fortran interface).
  • Corrected number of scratch-memory pools (LIBXSMM_VERBOSE).

1.12.1

4 years ago

This release fixes issues related to the prefix directory inside of the pkg-config files which affected maintainer builds (Linux and FreeBSD package), the package manager Spack, or people using pkg-config to determine build/linker flags. In addition, some presets are made to smooth maintainer builds under FreeBSD.

IMPROVEMENTS / CHANGES

  • Building samples: detect Intel MKL (when installed by a package manager).
  • Improved build system under FreeBSD (detect BLAS library, etc).

FIXES

  • Issue #331, issue #333, issue #334, and spack/spack#11413.
  • OpenMP build issue in one of the code samples (GCC 9.1).

1.12

4 years ago

This release aims to improve usability along with resolving several non-critical bugs. Beyond this, an implementation of the BLAS(-like) batched GEMM has been added (?GEMM_BATCH). The interface currently only supports the C/C++ language. However, it can be called implicitly (Fortran 77 like) or used by intercepting existing calls (static and dynamic linkage).

LIBXSMM has an interface for batched GEMMs since several versions supporting pointers as well as arrays of indexes plus Byte-sized strides to extract data from arrays of structures (AoS). The new BLAS interface only supports straight arrays of pointers to operand matrices but allows multiple groups of homogeneous batches. All batch interfaces are implemented in sequential (ST) and multi-threaded (MT) form plus synchronization in case of MT.

INTRODUCED

  • Interface and implementation of batched GEMMs (GEMM_BATCH).
  • Tensorflow wrapper code for LSTM operation.
  • Interceptor for GEMMM_BATCH, and GEMV.

IMPROVEMENTS / CHANGES

  • LSTM: enabled additional tensor formats for Bfloat16.
  • Validated with GNU GCC 9.1 release.

FIXES