Ispc Versions Save

Intel® Implicit SPMD Program Compiler

v1.16.0

2 years ago

An ISPC release with language extensions for performance fine tuning, cpu definitions for AlderLake and SapphireRapids targets, support for macOS ARM targets, and massive update of Intel GPUs support. Windows and Linux binaries in this release support both CPU and GPU targets, while macOS binary supports only CPU. This release is based on patched LLVM 12.0.0.

The language changes include the following:

The ability to directly call LLVM intrinsics from ISPC source. This should be handy for performance fine tuning and reaching the hardware instructions not yet covered by the standard library. Note that it is an experimental feature and is enabled only with --enable-llvm-intrinsics switch. Please refer to LLVM Intrinsic Functions section of the user manual for more details.
assume() optimization hint, which can be used for communicating assumptions to the optimizer. It will not lead to runtime check, unlike assert() calls. This is intended for optimizations like removing null pointer checks, removing loop reminders, communicating alignment information to the optimizer, and etc. Please refer to Compiler Optimization Hints section of the user manual for more details.
Support for stack memory allocations through alloca() calls.
trunc() standard library functions.

Changes for CPU targets:

CPU definitions for AlderLake and SapphireRapids were added: alderlake and sapphirerapids respectively.
CPU definition for Apple ARM chips were added: apple-a7, apple-a10, apple-a11, apple-a12, apple-a13, apple-a14.
Support for macOS ARM targets was added.

Using GPU-enabled binaries you can build ISPC programs and run them on Intel(R) Core(tm) Processors with Gen9 graphics (formerly Skylake, Kaby Lake, Coffee Lake) and Gen12 graphics (TigerLake mobile CPU) using --target options (genx-x8 and genx-x16) and --cpu option for specifying particular platform (e.g. --cpu=TGLLP).

The main GPU feature of the current release is Windows support. There are also a bunch of stability and performance improvements. Here are some of them:

ISPC Runtime got support of unified shared memory and multi GPU. Also, there is a new TaskQueue::submit() method which allows to start executing, but don't wait for the completion.
Thread private memory was mapped to SVM in VC backend. It greatly improves stability of the current release. It may affect performance on Gen9 graphics but we do not expect any significant changes on Gen12.
L0 binary generation was reworked through libocloc. Supported on Linux only.

More details about the current state of GPU support are available here: https://ispc.github.io/ispc_for_gen.html

For build instructions check our docker recipe: https://github.com/ispc/ispc/blob/main/docker/ubuntu/xpu_ispc_build/Dockerfile

GPU support is still in Beta stage so you may experience some issues but we strongly encourage you to try it out and give us feedback! You can reach us through Github discussions and issues, or on Twitter (@ispc_updates).

Runtime Dependencies when targeting GPU:

Linux:

Intel(R) Graphics Compute Runtime https://github.com/intel/compute-runtime/releases/tag/21.21.19914
Level Zero Loader https://github.com/oneapi-src/level-zero/releases/tag/v1.2.3
OpenMP Runtime. Consult your Linux distribution documentation for the installation of OpenMP runtime instructions. No specific version is required.

Windows:

Intel(R) Graphics - BETA Windows(R) 10 DCH Drivers 30.0.100.9667 https://downloadcenter.intel.com/download/30522/Intel-Graphics-BETA-Windows-10-DCH-Drivers
Level Zero Loader https://github.com/oneapi-src/level-zero/releases/tag/v1.2.3

Components revisions used in GPU-enabled build:

KhronosGroup/SPIRV-LLVM-Translator@0592c4f intel/vc-intrinsics@2d0795c oneapi-src/level-zero@0d30b1f (v1.2.3) llvm/llvm-project@d28af7c (llvmorg-12.0.0) + patches from llvm_patches folder

v1.15.0

3 years ago

An ISPC release with several improvements for CPU and Beta support of Intel graphics hardware architectures. The binaries in this release include CPU versions for Windows, Linux, and macOS, and a GPU-enabled Linux binary, which supports both CPU and GPU. CPU binaries are based on patched LLVM 11.0.0, GPU binary is based on patched LLVM 10.0.1.

CPU changes include:

New loop unroll pragmas: #pragma unroll and #pragma nounroll directives provide loop unrolling optimization hints to the compiler. This pragma may be used immediately before a loop statement. Currently, this functionality is limited to uniform for and do-while.
More efficient packed_[load|store]_active() stdlib functions implementation (up to 2.5x faster), which now supports 64 bit types.
New cpus: icelake-server, tigerlake , alderlake, sapphirerapids.
Several stability fixes related to SOA types, bool varying type initialization, broken alignment information, type scoping.
Compile time improvements.

ISPC support was added to CMake 3.19 so now you can use the standard CMake approach to find ISPC on the system and use it in your build. https://cmake.org/cmake/help/latest/release/3.19.html#languages

Using GPU-enabled Linux binary you can build ISPC programs and run them on Intel(R) Core(tm) Processors with Gen9 graphics (formerly Skylake, Kaby Lake, Coffee Lake) and Gen12 graphics (TigerLake mobile CPU) using --target options (genx-x8 and genx-x16) and --cpu option for specifying particular platform (e.g. --cpu=TGLLP).

Stability and performance were significantly improved in this release. Here is the list of new features:

Initial support of ahead of time compilation to oneAPI Level Zero binary format using --emit-zebin switch. You can use this binary from ISPC Runtime by setting ISPCRT_USE_ZEBIN env variable to 1. Please note that SPIR-V format is still a recommended and default way.
Initial function pointers implementation.
Global atomics support.
Double math functions support.
Memory functions support.
Reworked masking approach. We disabled genx hardware mask by default and use a software mask by default.
Improved address spaces differentiation.
Initial debug support.
TGLLP (TigerLake mobile CPU) support (--cpu=TGLLP).

We also added examples to demonstrate interoperability with oneAPI DPC++ Compiler. More details about current state of GPU support are available here: https://ispc.github.io/ispc_for_gen.html

For build instructions check our docker recipe: https://github.com/ispc/ispc/blob/main/docker/ubuntu/gen/Dockerfile

GPU support is in Beta stage so you may experience some issues but we strongly encourage to try it out and give us feedback! You can reach us through Github discussions and issues, ISPC mailing list ([email protected]), or on Twitter (@ispc_updates).

Runtime Dependencies:

Intel(R) Graphics Compute Runtime https://github.com/intel/compute-runtime/releases/tag/20.50.18716 Level Zero Loader https://github.com/oneapi-src/level-zero/releases/tag/v1.0.22 OpenMP Runtime. Consult your Linux distribution documentation for the installation of OpenMP runtime instructions. No specific version is required.

Components revisions used in GPU-enabled build:

KhronosGroup/SPIRV-LLVM-Translator@ab5e12a intel/vc-intrinsics@2de2dd4 oneapi-src/level-zero@c6fa2cd (v1.0.22) llvm/llvm-project@ef32c61 (llvmorg-10.0.1) + patches from llvm_patches folder

UPDATE: macOS build was updated on 21 Dec 2020.

v1.14.1

3 years ago

A minor ISPC update with a bug fix for AVX512 detection problem on macOS (for more details see issue #1854) and update of GPU version to use Level0 v1.0. CPU binaries are based on patched LLVM 10.0.1.

Runtime Dependencies for GPU-enabled build:

Intel(R) Graphics Compute Runtime https://github.com/intel/compute-runtime/releases/tag/20.33.17675
Level Zero Loader https://github.com/oneapi-src/level-zero/releases/tag/v1.0
OpenMP Runtime Consult your Linux distribution documentation for installation of OpenMP runtime instructions.

Components revisions used in GPU-enabled build: KhronosGroup/SPIRV-LLVM-Translator@1a5c52f intel/vc-intrinsics@f39ff1e oneapi-src/level-zero@fcc7b7a (v1.0) llvm/llvm-project@ef32c61 (llvmorg-10.0.1) + patches from llvm_patches folder

v1.14.0

3 years ago

An ISPC release with several improvements for CPU and initial support of Intel graphics hardware architectures. The binaries in this release include CPU versions for Windows, Linux, and macOS, as previous releases, plus a GPU-enabled Linux binary, which supports both CPU and GPU. CPU binaries are based on patched LLVM 10.0.1.

CPU changes include:

new avx2-i8x32, avx2-i16x16, avx512skx-i8x64, avx512skx-i16x32 targets.
"generic" targets were removed.
several stability fixes, including bugs discovered during fuzzing ISPC by YARPGen.
integer division performance improvements.
support for __vectorcall calling convention on Windows x64 (enabled by '--vectorcall')

Using GPU-enabled Linux binary you can build ISPC programs and run them on Intel(R) Core(tm) Processors with Gen9 graphics (formerly Skylake, Kaby Lake, Coffee Lake) using new '--target' options: 'genx-x8' and 'genx-x16'. For code generation ISPC uses Vector Compute backend which is the part of 'Intel(R) Graphics Compute Runtime' through SPIR-V interface. This release also includes ISPC Runtime based on 'oneAPI Level Zero' for GPU and 'OpenMP Runtime' for CPU, which creates unified abstraction for executing ISPC code on CPU and GPU.

More details are available here: https://ispc.github.io/ispc_for_gen.html

For build instructions check our docker recipe: https://github.com/ispc/ispc/blob/main/docker/ubuntu/gen/Dockerfile

The stability and performance of GPU part of this release is not mature yet but we strongly encourage to try it out and give us feedback! You can reach us through Github issues, ISPC mailing list ([email protected]), or on Twitter (@ispc_updates).

Runtime Dependencies

Intel(R) Graphics Compute Runtime https://github.com/intel/compute-runtime/releases/tag/20.29.17408
Level Zero Loader https://github.com/oneapi-src/level-zero/releases/tag/v0.91.21
OpenMP Runtime Consult your Linux distribution documentation for installation of OpenMP runtime instructions.

Components revisions used in this build: KhronosGroup/SPIRV-LLVM-Translator@1e661b2 intel/vc-intrinsics@a0b66f2 oneapi-src/level-zero@317bc0d (v0.91.21) llvm/llvm-project@d32170d (llvmorg-10.0.0)

v1.13.0

4 years ago

An ISPC update, which graduates cross-compilation support to production and has multiple code generation improvements and bug fixes. AVX512 targets may get the biggest performance boost due to changed internal representation of masks (we observed up to 5% speedups), and new switch --opt=disable-zmm, which disables using zmm registers in favour of ymm for avx512skx-i32x16 target. All targets will definitely benefit from LLVM 10.0 backend used in this release.

Here is the list of other changes:

new switch --support-matrix was added to display information about supported cross-compilation targets, which are managed by --target-os=<os>, --target=<ispc-target>, and --arch=<arch> switches.
representation of 'bool' type in storage was changed to match C/C++ (i.e. one bool occupies one byte) for better interoperability.
type aliases for unsigned types were added: uint8, uint16, uint32, uint64, and uint. To detect if these types are supported you can check if ISPC_UINT_IS_DEFINED macro is defined.
extract()/insert() for boolean arguments, and abs() for all integer and FP types were added to standard library.
FreeBSD was added to the list of supported target OSes, but it's not well tested.

Supported platforms in this release are below. Rows are hosts, columns are targets. x86 and arm are both 32 and 64 bits, where appropriate.

	Windows	Linux	macOS	Android	iOS	PS4	FreeBSD
Windows	x86	x86, arm	x86	x86, arm		x86	x86, arm
Linux		x86, arm	x86	x86, arm			x86, arm
macOS		x86, arm	x86	x86, arm	arm		x86, arm

v1.12.0

4 years ago

This ISPC update includes experimental cross OS compilation support, ARM and AARCH64 support and a bunch of language features and stability fixes.

Here are the details:

ISPC is now a cross OS compiler - You can build ISPC programs for Windows, Linux, macOS, iOS, Android and PS4 targets from Windows, Linux and macOS hosts.
ARM and AARCH64 support has been enabled for ISPC. ARM support currently exists for neon-i32x4, neon-i8x16 and neon-i16x8 targets. AARCH64 is supported for neon-i32x4 as well as for a new "double-pumped" 8-wide target: neon-i32x8.
A new 128-bit AVX2 target (avx2-i32x4) was added.
Added a CPU definition for Ice Lake client CPUs (--cpu=icl). Note that there is no special target for new instructions in Ice Lake flavor of AVX512 yet. For now, You can use SKX targets (avx512skx-i32x8 and avx512skx-i32x16) with --cpu=icl.
Removed the generic targets for KNC and KNL, so ISPC does not have KNC support anymore. KNL is still supported through native target (avx512knl-i32x16).
Removed AVX1.1 (IvyBridge) targets (use AVX1 targets instead).
Introduced new language features:
- noinline function qualifier.
- rsqrt_fast() and rcp_fast() functions.
- Static initialization for varying.
A new command line option --emit-llvm-text was added to dump LLVM IR in text format.

An ISPC top-of-trunk build is now available in the Compiler Explorer

The release is based on a patched LLVM 8.0.0 backend.

v1.11.0

4 years ago

An ISPC update with a bunch of new features and stability bug fixes based on a patched LLVM 8.0.0 backend.

Notable new features are:

A new 256-bit AVX512 target (avx512skx-i32x8).
Modified -O1 switch to optimize for size.
#pragma once in auto-generated headers.
Better debugging support with -O0.

Also we resumed support for PS4 build.

To efficiently write ISPC programs you can now use the ISPC plug-in for VSCode.

v1.10.0

4 years ago

An ISPC update, which brings several new features, has a bunch of stability and performance bug fixes, and infrastructure improvements for those who are interested in participating in hacking on the ISPC trunk. We also are also deprecating KNC support and the KNL-generic target (in favor of the native KNL target, i.e. avx512knl-i32x16).

We've added:

a streaming store and load implementation (see "Streaming Load and Store Operations" section in documentation)
support for 64 bit wide types in aos_to_soa/soa_to_aos intrinsics
an option to specify assembler style (see --x86-asm-syntax switch documentation is help message)
a pragma to disable warnings locally (search for #pragma ignore in documentation)

Our examples include a new SGEMM example which demonstrates different versions of matrix multiply with various level of optimality. It is useful for learning how to start from a naive implementation and then add various optimizations afterwards. Also, our build system is now based on CMake, as are the examples. So you can use it as a reference for integrating ISPC to your CMake-based project.

For those who are interested in hacking ISPC or trying a bleeding edge development version, we have CI on Linux (Travis-CI) and Windows (Appveyor), including automatic package builds on Windows. We also have Dockerfiles, which demonstrate bringing up your environment for ISPC development.

The release is based on a patched LLVM 5.0.2 backend.

v1.9.2

4 years ago

An ISPC update, which brings out-of-the-box debug support on Windows, better performance of most of the targets and a bunch of stability and performance bug fixes.

The release is based on patched LLVM 5.0 backend.

Windows build is now supports only VS2015 and newer. If you are using earlier versions, the only known problem that you may encounter is a problem with print ISPC library function.

AVX512 targets are the main beneficiaries of a newer LLVM backend and demonstrate the biggest performance improvements. SVML support is also now available on these targets (requires linking by ICC compiler).

v1.9.1

4 years ago

An ISPC update with new native AVX512 target for future Xeon CPUs and improvements for debugging, including new switch --dwarf-version to support debugging on old systems.

The release is based on patched LLVM 3.8.