RocBLAS Versions Save

Next generation BLAS implementation for ROCm platform

rocm-6.1.0

1 month ago

Additions

  • Level 1 and Level 1 Extension functions have additional ILP64 API for both C and FORTRAN (_64 name suffix) with int64_t function arguments.
  • Cache flush timing for gemm_ex.

Changes

  • Some Level 2 function argument names have changed 'm' to 'n' to match legacy BLAS, there was no change in implementation.
  • Standardized the use of non-blocking streams for copying results from device to host.

Fixes

  • Fixed host-pointer mode reductions for non-blocking streams.

rocm-6.0.2

3 months ago

rocBLAS code for ROCm 6.0.2 did not change. The library was rebuilt for the updated ROCm 6.0.2 stack.

rocm-6.0.0

5 months ago

Added

  • Addition of beta API rocblas_gemm_batched_ex3 and rocblas_gemm_strided_batched_ex3
  • Added input/output type f16_r/bf16_r and execution type f32_r support for Level 2 gemv_batched and gemv_strided_batched
  • Added rocblas_status_excluded_from_build to be used when calling functions which require Tensile when using rocBLAS built without Tensile
  • Added system for async kernel launches setting a failure rocblas_status based on hipPeekAtLastError discrepancy

Optimized

  • Trsm performance for small sizes m < 32 && n < 32

Deprecated

  • In a future release atomic operations will be disabled by default so results will be repeatable. Atomic operations can always be enabled or disabled using the function rocblas_set_atomics_mode. Enabling atomic operations can improve performance.

Removed

  • rocblas_gemm_ext2 API function is removed
  • in-place trmm API from Legacy BLAS is removed. It is replaced by an API that supports both in-place and out-of-place trmm
  • int8x4 support is removed. int8 support is unchanged
  • The #define STDC_WANT_IEC_60559_TYPES_EXT has been removed from rocblas-types.h. Users who want ISO/IEC TS 18661-3:2015 functionality must define STDC_WANT_IEC_60559_TYPES_EXT before including float.h, math.h, and rocblas.h
  • The default build removes device code for gfx803 architecture from the fat binary

Fixed

  • Make offset calculations for rocBLAS functions 64 bit safe. Fixes for very large leading dimension or increment potentially causing overflow:
    • Level2: gbmv, gemv, hbmv, sbmv, spmv, tbmv, tpmv, tbsv, tpsv
  • Lazy loading to support heterogeneous architecture setup and load appropriate tensile library files based on the device's architecture
  • Guard against no-op kernel launches resulting in potential hipGetLastError

Changed

  • Default verbosity of rocblas-test reduced. To see all tests set environment variable GTEST_LISTENER=PASS_LINE_IN_LOG

rocm-5.7.1

7 months ago

rocBLAS code for ROCm 5.7.1 did not change. The library was rebuilt for the updated ROCm 5.7.1 stack.

rocm-5.7.0

8 months ago

Added

  • yaml lock step argument scanning for rocblas-bench and rocblas-test clients. See Programmers Guide for details.
  • rocblas-gemm-tune is used to find the best performing GEMM kernel for each of a given set of GEMM problems.

Fixed

  • make offset calculations for rocBLAS functions 64 bit safe. Fixes for very large leading dimensions or increments potentially causing overflow:
    • Level 1: axpy, copy, rot, rotm, scal, swap, asum, dot, iamax, iamin, nrm2
    • Level 2: gemv, symv, hemv, trmv, ger, syr, her, syr2, her2, trsv
    • Level 3: gemm, symm, hemm, trmm, syrk, herk, syr2k, her2k, syrkx, herkx, trsm, trtri, dgmm, geam
    • General: set_vector, get_vector, set_matrix, get_matrix
    • Related fixes: internal scalar loads with > 32bit offsets
    • fix in-place functionality for all trtri sizes

Changed

  • dot when using rocblas_pointer_mode_host is now synchronous to match legacy BLAS as it stores results in host memory
  • enhanced reporting of installation issues caused by runtime libraries (Tensile)
  • standardized internal rocblas C++ interface across most functions

Deprecated

  • Removal of STDC_WANT_IEC_60559_TYPES_EXT define in future release

Dependencies

  • optional use of AOCL BLIS 4.0 on Linux for clients
  • optional build tool only dependency on python psutil

rocm-5.6.1

8 months ago

rocBLAS code for ROCm 5.6.1 did not change. The library was rebuilt for the updated ROCm 5.6.1 stack.

rocm-5.6.0

10 months ago

Optimizations

  • Improved performance of Level 2 rocBLAS GEMV on gfx90a GPU for non-transposed problems having small matrices and larger batch counts. Performance enhanced for problem sizes when m and n <= 32 and batch_count >= 256.
  • Improved performance of rocBLAS syr2k for single, double, and double-complex precision, and her2k for double-complex precision. Slightly improved performance for general sizes on gfx90a.

Added

  • Added bf16 inputs and f32 compute support to Level 1 rocBLAS Extension functions axpy_ex, scal_ex and nrm2_ex.

Deprecated

  • trmm inplace is deprecated. It will be replaced by trmm that has both inplace and out-of-place functionality
  • rocblas_query_int8_layout_flag() is deprecated and will be removed in a future release
  • rocblas_gemm_flags_pack_int8x4 enum is deprecated and will be removed in a future release
  • rocblas_set_device_memory_size() is deprecated and will be replaced by a future function rocblas_increase_device_memory_size()
  • rocblas_is_user_managing_device_memory() is deprecated and will be removed in a future release

Removed

  • is_complex helper was deprecated and now removed. Use rocblas_is_complex instead.
  • The enum truncate_t and the value truncate was deprecated and now removed from. It was replaced by rocblas_truncate_t and rocblas_truncate, respectively.
  • rocblas_set_int8_type_for_hipblas was deprecated and is now removed.
  • rocblas_get_int8_type_for_hipblas was deprecated and is now removed.

Dependencies

  • build only dependency on python joblib added as used by Tensile build
  • fix for cmake install on some OS when performed by install.sh -d --cmake_install

Fixed

  • make trsm offset calculations 64 bit safe

Changed

  • refactor rotg test code

rocm-5.5.1

11 months ago

rocBLAS code for ROCm 5.5.1 did not change. The library was rebuilt for the updated ROCm 5.5.1 stack.

rocm-5.5.0

1 year ago

Added

  • added functionality rocblas_geam_ex for matrix-matrix minimum operations
  • added HIP Graph support as beta feature for rocBLAS Level 1, Level 2, and Level 3(pointer mode host) functions
  • added beta features API. Exposed using compiler define ROCBLAS_BETA_FEATURES_API
  • added support for vector initialization in the rocBLAS test framework with negative increments
  • added windows build documentation for forthcoming support using ROCm HIP SDK
  • added scripts to plot performance for multiple functions

Optimizations

  • improved performance of Level 2 rocBLAS GEMV for float and double precision. Performance enhanced by 150-200% for certain problem sizes when (m==n) measured on a gfx90a GPU.
  • improved performance of Level 2 rocBLAS GER for float, double and complex float precisions. Performance enhanced by 5-7% for certain problem sizes measured on a gfx90a GPU.
  • improved performance of Level 2 rocBLAS SYMV for float and double precisions. Performance enhanced by 120-150% for certain problem sizes measured on both gfx908 and gfx90a GPUs.

Fixed

  • fixed setting of executable mode on client script rocblas_gentest.py to avoid potential permission errors with clients rocblas-test and rocblas-bench
  • fixed deprecated API compatibility with Visual Studio compiler
  • fixed test framework memory exception handling for Level 2 functions when the host memory allocation exceeds the available memory

Changed

  • install.sh internally runs rmake.py (also used on windows) and rmake.py may be used directly by developers on linux (use --help)
  • rocblas client executables all now begin with rocblas- prefix

Removed

  • install.sh removed options -o --cov as now Tensile will use the default COV format, set by cmake define Tensile_CODE_OBJECT_VERSION=default

rocm-5.4.4

1 year ago

rocBLAS code for ROCm 5.4.4 did not change. The library was rebuilt for the updated ROCm 5.4.4 stack.