Ispc Versions Save

Intel® Implicit SPMD Program Compiler

v1.23.0

2 months ago

ISPC release with bug fixes and a few language improvements. The release is based on patched LLVM 16.0.6.

Language changes:

  • Improved const variables initialization:

    1. Variables with const qualifiers can be initialized using the values of previously initialized const variables including arithmetic operations above them.
    2. Enum values can be used as constants.
  • One can use the result of selection operator as lvalue now.

Compiler switches behavior:

  • --dump-file=<dir> forces now to dump the whole IR modules after each pass.

ISPC Runtime improvements:

  • Added ISPCRT_GPU_DRIVER environment variable that allows to choose the specific driver. If more than one supported GPU is present in the system, they may be managed by several GPU drivers. The user can select the GPU driver using this variable.

Infrastructure/build changes:

  • Removed the build dependency from llvm-dis.
  • Lock the time zone to UTS to fix build reproducibility.

Bug fixes:

  • Fixed ABI compatibility of bool types returned to C/C++ code.
  • Fixed build error when bison emulates POSIX Yacc.
  • Fixed target definition for neon-i16x8, sse2-i32x8 and ps5.
  • Fixed ICE when generating unwind info for aarch64 code on Windows.

Recommended versions of Runtime Dependencies when targeting GPU:

Linux:

Alternatively, you can use a validated gfx driver stack supporting Intel® Arc™ available at https://dgpu-docs.intel.com/driver/installation.html

Windows:

Components revisions used in GPU-enabled build:

v1.22.0

5 months ago

ISPC release with template operators support; improved debugging experience of ISPC code on Windows; multiple stability and performance fixes and more. The release is based on patched LLVM 16.0.6.

ISPC distribution changes:

  • ISPC binaries were compiled with LTO by Clang/LLVM toolchain on all supported platforms and architectures using superbuild. ISPC binaries got faster a few percent in average.
  • Examples were excluded from ISPC archives. They are placed alongside as separate archives ispc-examples-v1.22.0.zip and ispc-examples-v1.22.0.tar.gz.

Language changes:

  • Added support for template operators.
  • Revised the usage of function specifiers with templates. For more details please refer to Function Templates section of documentation.

Infrastructure changes:

  • Release built with LTO (except aarch64 Linux).
  • Supported building ISPC with LLVM 17 although GPU support wasn't tested.

New compiler switches:

  • --dwarf-version switch accepts DWARF 5 version.
  • --dwarf-version switch forces DWARF format debug info generation on Windows. It allows to debug ISPC code linked with MinGW generated code (#2129).

Bug fixes:

  • Fixed performance regression on GPU caused by missed memory effects for genx intrinsics declarations.
  • Fixed performance regression caused by change in the loop unswitch LLVM pass.
  • Fixed C compatibility of ISPC generated headers (#2650, #2652).
  • Added unwind table to ISPC generated functions for Windows targets. It fixed issues with incorrect backtrace during debugging and profiling (#2345, #1318).
  • Fixed emitted code for negate of short float vectors (#2628).
  • Fixed several issues that were related to the usage of bool in different cases (#2272, #2333, #2367, #2689).

Recommended versions of Runtime Dependencies when targeting GPU: Linux:

Alternatively, you can use a validated gfx driver stack supporting Intel® Arc™ available at https://dgpu-docs.intel.com/driver/installation.html

Windows:

Components revisions used in GPU-enabled build:

v1.21.1

6 months ago

A minor ISPC update with interop related fixes for ISPCRT needed to Intel® oneAPI Rendering Toolkit release.

This update contains only Linux oneAPI x86, macOS universal and Windows binaries. Use v1.21.0 binaries for other platforms.

v1.21.0

8 months ago

ISPC release with template function specializations support; changed rules for signed integer overflow, which match C/C++ behavior and lead to more aggressive optimizations; an enhanced ISPC Runtime; multiple stability and performance fixes and more. The release is based on patched LLVM 15.0.7.

Language changes:

  • Added support for function template specializations with explicit template arguments.
// Primary template
template <typename T, typename C> noinline int goo(T argGooOne, C argGooTwo);

// Specialization with explicit template arguments
template <> noinline int goo<int, float>(int argGooOne, float argGooTwo);

// Not supported yet: specialization with implicit template arguments (requires template arguments type deduction)
template <> noinline int goo(int argGooOne, float argGooTwo);
  • Modified behavior for signed integer overflow.

Now, in case of signed integer overflow, ispc will assume undefined behavior similar to C and C++. This change may cause compatibility issues. You can manage this behavior using the --[no-]wrap-signed-int compiler switch. The default behavior (before version 1.21.0) can be preserved by using --wrap-signed-int, which maintains defined wraparound behavior for signed integers, though it may limit some compiler optimizations.

New hardware support:

Added support of Intel Meteor Lake Xe-LPG graphics:

  • added two new ISPC targets: xelpg-x16 and xelpg-x8
  • added two new device names: mtl-m and mtl-p

Infrastructure changes:

  • ISPC now uses LLVM's new pass manager. Optimization pipeline was modified by introducing early LoopFullUnrollPass which matches ISPC unrolled loops with manually unrolled loops in many cases.
  • Introduced ISPC superbuild, which facilitates building ISPC with Xe dependencies (LLVM, L0, vc-intrinsics, SPIRV-Translator). It can generate an archive with dependencies or consume a pre-built archive to build ISPC only. It also enables generating LTO or LTO+PGO enabled builds of LLVM and ISPC.
  • Supported building ISPC with LLVM 16.

New compiler switches:

  • --mcmodel switch, which accepts small and large values. The definition is similar to gcc/clang. When large model is used, it enables programs larger than 2Gb.
  • --opt=disable-gathers and --opt=disable-scatters options, which disable generation of gathers and scatters instructions on platforms that support them (for performance experiments).
  • --[no-]wrap-signed-int switches, which [does not] preserve(s) wrap-around behavior on signed integer overflow.

ISPC Runtime improvements:

  • Added ispcrtSetTaskingCallbacks to the ISPCRT API, allowing the override of default implementations of ISPCLaunch, ISPCAlloc, and ISPCSync.
  • Removed compile-time Level Zero dependency from ISPCRT, no longer necessary after the ISPCRT split into CPU and GPU parts.

Recommended versions of Runtime Dependencies when targeting GPU:

Linux:

Alternatively, you can use a validated gfx driver stack supporting Intel® Arc™ available at https://dgpu-docs.intel.com/driver/installation.html

Windows:

Components revisions used in GPU-enabled build:

v1.20.0

11 months ago

ISPC release with compile time improvements, enhancements in the ISPC Runtime, and a number of code generation fixes. The release is based on patched LLVM 15.0.7.

ISPC distribution changes

ISPC binaries got faster and smaller. ISPC binaries got smaller approximately by 1/3 and a few percent faster. The distribution macOS now includes x86_64, arm64 and Universal Binaries. On Linux a snap package with the latest ISPC is available.

ISPC Runtime:

  • ispcrt was split under the hood into GPU and CPU parts, which are loaded dynamically. This means you don't need GPU dependencies when running CPU-only code using ispcrt
  • ispcrt got support for fences to enable CPU/GPU asynchronous computations.
  • ispcrt does not depend on OpenMP runtime anymore, but requires TBB.

New targets

For better fine-tuning when targeting old platforms, sse4 targets were split into sse4.1 and sse4.2 targets. All changes are backward compatible - sse4 are aliased to sse4.2 and multi-target compilation allows only one of sse4 target, so build systems are not confused.

Improvements for contributors

We got a brand new Github Codespaces config, so you are welcome to start hacking on ISPC in browser. Give it a try!

Linux:

Alternatively, you can use a validated gfx driver stack supporting Intel® Arc™ available at https://dgpu-docs.intel.com/releases/stable_602_20230323.html

Windows:

Components revisions used in GPU-enabled build:

https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/855eb27 https://github.com/intel/vc-intrinsics/commit/29fe787 https://github.com/oneapi-src/level-zero/commit/0d56d8e (v1.10.0) https://github.com/llvm/llvm-project/commit/8dfdcc7 (llvmorg-15.0.7) + patches from llvm_patches folder

UPDATE: macOS packages were updated on June 12, 2023 - *dylib were not signed and notarized, it was fixed.

UPDATE 2: ispc-v1.20.0-linux-oneapi.tar.gz Linux package was added to be used with oneAPI distribution. The only difference with ispc-v1.20.0-linux.tar.gz is version TBB lib being used.

v1.19.0

1 year ago

ISPC release with long-awaited function templates technical preview; new hardware support for 4th generation Intel® Xeon® Scalable (codename Sapphire Rapids) CPUs, Intel® Data Center GPU Max (codename Ponte Vecchio), and updated support for Intel® Arc™ GPUs; improved performance and compile time; an enhanced ISPC Runtime; a bunch of stability fixes and more. The release is based on patched LLVM 14.0.6.

Language changes:

Function templates support was introduced in ISPC and it's currently in technical preview, meaning that current language definition might change in future versions. For more details please refer to Function Templates section of documentation.

ISPC has got several other language changes needed for ISPC/SYCL interoperability (an experimental feature):

  1. Support of __regcall attribute.
  2. A new language construct invoke_sycl which is used to call SYCL function from ISPC. The function must be declared on ISPC side with extern "SYCL" __regcall qualifiers.
  3. Support of extern "C" functions definitions.

New hardware support:

  1. Targets for 4th generation Intel® Xeon® Scalable (codename Sapphire Rapids) CPUs were introduced: avx512spr-x4, avx512spr-x8,avx512spr-x16, avx512spr-x32, avx512spr-x64. The key difference with other AVX512 targets is native support for FP16.
  2. New xehpc-x16/xehpc-x32 targets were added for Intel® Data Center GPU Max (codename Ponte Vecchio). A new pvc device name was introduced.
  3. New device names acm-g10, acm-g11, and acm-g12 were added for Intel® Arc™ Graphics. The dg2 device name has been removed.
  4. Support for Aarch64 targets was enabled on Windows.

ISPC Runtime:

  1. A chunking allocator was introduced that can be enabled with ISPCRT_MEM_POOL (see details are here).
  2. An API was added to link input modules through ispcrtStaticLinkModules (using linking on vISA level under the hood) and ispcrtDynamicLinkModules (using binary linking under the hood).
  3. Support for creating multiple devices within a single context was added, and an API was added to get a function pointer from a module. It's also possible to construct ISPC RT objects from native handlers now.
  4. ISPC RT verbose mode was added that can be enabled through ISPCRT_VERBOSE.

Performance:

There's a significant performance boost on Xe targets caused by updates in the ISPC optimization pipeline and the usage of the new spill-cost IGC finalizer function, which dramatically reduces spill size.

Utilities:

  1. ISPC link mode has been introduced, allowing to link several LLVM bitcode or SPIR-V files and output the result as LLVM bitcode or SPIR-V. For example:
    ispc link test_a.bc test_b.bc --emit-spirv -o test.spv
    
  2. CMake utilities was improved, and support was added for building an ISPC GPU target from multiple ISPC files, linking them with ispc --link. An application's ISPC CMakeLists would look like this:
    add_ispc_library(my_ispc_lib filea.ispc fileb.ispc)
    ispc_target_include_directories(my_ispc_lib <some directory path>)
    ispc_target_compile_definitions(my_ispc_lib -DMY_DEFINE=1)
    
    add_ispc_library(my_ispc_kernel filec.ispc)
    ispc_target_link_libraries(my_ispc_kernel my_ispc_lib)
    

Runtime Dependencies when targeting GPU:

Linux:

Windows:

Components revisions used in GPU-enabled build:

UPDATE: macOS packages were updated on June 12, 2023 - *dylib were not signed and notarized, it was fixed.

v1.18.1

1 year ago

A minor ISPC update with security fix - zlib dependency was removed. We previously shipped:

  • Windows binaries without zlib support.
  • Linux binaries with zlib support, zlib was statically linked.
  • macOS binaries with zlib support, zlib was a dynamic dependency.

This update contains release binaries for Linux only, use v1.18.0 binaries for other platforms.

v1.18.0

1 year ago

An ISPC release with a bunch of stability and performance fixes, improvements for ISPC Runtime, and complete stdlib support for float16 type. This release is based on patched LLVM 13.0.1.

-E switch was introduced to run preprocessor only. An old bug preventing the compiler to crash in case of preprocessor error was fixed and now the compiler will properly crash. As some users considered an old behavior convenient in some cases, --ignore-preprocessor-errors switch was introduced to maintain the old behavior.

Targets naming was changed for the targets with native masking support to drop "base type" from the naming scheme, the old naming is accepted for compatibility. This affected AVX512 target names, the new names are avx512skx-x4, avx512skx-x8, avx512skx-x16, avx512skx-x32, avx512skx-x64, and avx512knl-x16.

For debugging and for those, who are interested in understanding compiler internals, --ast-dump switch was introduced. The produced dump of AST (Abstract Syntax Tree) is intentionally made to look like clang AST dump for convenience.

Standard library gained full support for float16 type. Note that it is fully supported only on the targets with native hardware support. On the other targets emulation is still not guaranteed but may work in some cases.

Among other fixes, it is worth mentioning the following:

  • fixed a bug #1308 affecting multi-target compilation
  • a bunch of fixes to make it easier to build ISPC on FreeBSD, even though FreeBSD is not officially supported

Improvements for the ISPC Runtime in this release:

  • flexible task system selection during build
  • support of ISPCRT build separate from ISPC
  • support of ISPCRT build for CPU only
  • version check in CMake
  • new API to get the type of allocated memory (ispcrtGetMemoryViewAllocType and ispcrtGetMemoryAllocType)
  • new API for memory copy on device (ispcrtCopyMemoryView)
  • support of device-only memory without corresponding application memory.

Performance on Xe targets was significantly improved in this release due to optimizations in ISPC and Vector Backend.

Runtime Dependencies when targeting GPU:

Linux:

Windows:

Components revisions used in GPU-enabled build:

v1.17.0

2 years ago

An ISPC release with massive update of Xe targets, including support for forthcoming XeHPG GPUs, improvements for double type on AVX512 targets, and multiple standard library improvements. Windows and Linux binaries in this release support both CPU and GPU targets, while macOS binary supports only CPU. This release is based on patched LLVM 12.0.1.

Improvements for CPU targets:

  • Performance improvements for double type on AVX512 targets - better use of gather/scatter instructions, 2-5x improvements for rsqrt() and rcp() standard library functions.
  • New avx512skx-i32x4 target.
  • aos_to_soa and soa_to_aos performance improvements for -x8 and -x16 targets on CPU.
  • --math-lib=svml mode was fixed and extended - it requires Intel® C++ Compiler (icc or icx) to link the binary.
  • zen1, zen2, and zen3 CPU definitions were added.
  • Added experimental support for PS5 platform.

ISPC language got experimental support for IEEE 754 half-precision data type - float16. Not all library functions are supported yet with this type. The key focus in this release was on hardware natively supporting this type.

This update includes breaking changes in compiler switches for Xe targets:

  • Graphics targets genx-x8 and genx-x16 were renamed to gen9-x8 and gen9-x16.
  • Compiler architectures for graphics target were renamed from genx32 and genx64 to xe32 and xe64.
  • Xe targets were renamed from uppercase to lowercase (so instead of SKL/TGLLP it is now skl/tgllp).
  • A new --device switch (which is an alias for the existing --cpu switch) was introduced. Now the recommended way to specify the required platform for CPU and GPU is: --device=<platform>

Also this release changes export and task functions definition on GPU. Now GPU kernel is ISPC task function only, export functions cannot be invoked from host (i.e. called from ISPC Runtime/L0 Runtime) anymore. export functions are ready to be linked with and called from other GPU modules. Currently, ISPC experimentally supports such interoperability with Explicit SIMD SYCL* Extension (ESIMD).

New Xe targets were added:

  • xelp-x8 and xelp-x16. XeLP refers to XeLP generation of hardware (TigerLake chips and alike).
  • xehpg-x8 and xehpg-x16. XeHPG is the architecture name for the forthcoming Intel® Arc™ GPUs codename Alchemist..

GPU part has a bunch of stability, performance, and usability improvements including but not limited to alloca() with constant parameter support, assume() support, improved performance for double math functions and integer division.

ISPC Runtime performance was improved several times by fixing the setting of local group size for kernels, using events as a synchronization mechanism, and utilizing HW compute and copy engines. There is also a new structure ISPCRTModuleOptions to pass additional options to VC backend if needed. Currently, ISPCRTModuleOptions allows setting of stack size for VC backend which is used to compile SPIR-V.

Runtime Dependencies when targeting GPU:

Linux:

Windows:

Components revisions used in GPU-enabled build:

KhronosGroup/SPIRV-LLVM-Translator@ed25f1b intel/vc-intrinsics@3a5f4b4 oneapi-src/level-zero@2824c1f (v1.7.4) llvm/llvm-project@fed4134 (llvmorg-12.0.1) + patches from llvm_patches folder

UPDATE: Linux binary was updated on 01/28/2022 to fix a problem with GPU support.

v1.16.1

2 years ago

A minor ISPC update, which has a bug fix for issue #2111 and is based on patched version of LLVM 12.0.1.

The bug fix affects x86 targets only and shows up as incorrect code generation for the sequence of shuffle() and reduce_add() stdlib functions.

If you are building ispc from the sources, note that the fix is implemented as a patch for LLVM backend and LLVM must be built with this patch applied in order for this fix to take an effect. Stock build of LLVM 12.0.1 will not contain this bug fix.