VkFFT Versions Save

Vulkan/CUDA/HIP/OpenCL/Level Zero/Metal Fast Fourier Transform library

v1.3.4

3 months ago

-Bugfixes of v1.3.3 combined into stable version increment (no new functionality) -Tests reference: https://github.com/vincefn/pyvkfft/issues/32#issuecomment-1904238557

v1.3.3b

4 months ago

Multi-upload R2C and R2R algorithms -This update removes the limit of ~2^12 for R2C and R2R systems - they can all now be done in up to three uploads with coverage ~2^32 for all dimensions, same as C2C. -Added versions of all R2C and R2R algorithms, implemented as load/store callbacks. This functionality will be enhanced in the future to support arbitrary user callbacks (I just need to find out how this can be done for a multiple-API user interaction). -Bugfixes

v1.3.3

4 months ago

Multi-upload R2C and R2R algorithms -This update removes the limit of ~2^12 for R2C and R2R systems - they can all now be done in up to three uploads with coverage ~2^32 for all dimensions, same as C2C. -Added versions of all R2C and R2R algorithms, implemented as load/store callbacks. This functionality will be enhanced in the future to support arbitrary user callbacks (I just need to find out how this can be done for a multiple-API user interaction). -Bugfixes

v1.3.2

6 months ago

-Added double-double support in VkFFT. Requires cpu initialization in full quad precision, so only supports gcc with quadmath dependency for now. Potentially possible to add full FP128 support or some other FP128 library (like mpir) in the future. -Data has to be stored in double-double before VkFFT kernels calls (no fp128<->double-double conversion on the GPU yet). -Full 1e-32 precision, but same range as FP64. See Library for Double-Double and Quad-Double Arithmetic by Y Hida for more information on double-double. -Double-double requires FMA contraction to be disabled (due to ab-cd contraction rounding mismatch). Doesn't work on Vulkan as I haven't found how to do that yet. -Added DST I-IV support. -Fixed warnings (https://github.com/DTolm/VkFFT/issues/138) -Added proper check for app to be zero before initializeVkFFT call and zeroing on deletion (https://github.com/DTolm/VkFFT/issues/134) -Added an option to provide a staging buffer in the application and VkGPU handle (https://github.com/DTolm/VkFFT/issues/129) -Added guards for build type (https://github.com/DTolm/VkFFT/issues/128) -Changed default innermost stride for real buffers in out-of-place R2C from size[0]+2 to size[0] (https://github.com/DTolm/VkFFT/issues/139) -Allow specifying glslang version (https://github.com/DTolm/VkFFT/pull/135) -Improved instruction count and accuracy for radix-7. -Fixed missing deallocation calls for the inverse Bluestein axes. Fixed the buffer layout size in Vulkan in some cases. -Refactored the code generator and container struct layout for better handling complex numbers (-5k loc). -Added more precision tests and benchmarks.

v1.3.1

9 months ago

-Major library design change - from single header to multiple header approach, which improves structure and maintainability. Now instead of copying a single file, the user has to copy the vkFFT folder contents. -VkFFT has been rewritten to follow the multiple-level platform structure, described in the VkFFT whitepaper. All algorithms have been split into respective files, which should ease an understanding of the library design by everybody. Multiple code duplication places have been restructured and unified (mainly the read/write part of kernels and pre/post-processing). -All math operations and most variables have been abstracted to a union container approach, that can either contain numbers or variable names. Not a full compiler, but the code generated is close to machine-like. There are no math sprintf calls in the actual code generator now. More details can be found here: https://youtu.be/lHlFPqlOezo -VkFFT supports arbitrary number of dimensions now. By defining VKFFT_MAX_FFT_DIMENSIONS, it is now possible to mimic fftw guru interface. Default 4. Innermost stride is always fixed to be 1, but there can be an arbitrary number of outer strides. to achieve innermost batching, initialize N+1 dim FFT and omit the innermost one using omitDimension[0] = 1. -Enabled fp16 for all backends. -Accuracy verification of the new version can be found here: https://github.com/vincefn/pyvkfft/issues/25 -The new code structure will facilitate the implementation of many new features and performance improvements, so stay tuned.

v1.2.31

1 year ago

Important bugfix to FP64 usage in FP32 kernels in some backends. Performance improvements.

release: main benchmark version. Vulkan should be included in the graphics driver

Vulkan_cuFFT: requires CUDA 9. The only difference to release version is enabled cuFFT benchmark these executables require Vulkan 1.1

CUDA_cuFFT: requires CUDA 9. CUDA backend of VkFFT

OpenCL: requires OpenCL 1.2 (lower versions are also supported but performance is not tested). OpenCL backend of VkFFT

v1.2.30

1 year ago

Rader FFT and Mult algorithms for primes >13 Precision improvements Metal backend

release: main benchmark version. Vulkan should be included in the graphics driver

Vulkan_cuFFT: requires CUDA 9. The only difference to release version is enabled cuFFT benchmark these executables require Vulkan 1.1

CUDA_cuFFT: requires CUDA 9. CUDA backend of VkFFT

OpenCL: requires OpenCL 1.2 (lower versions are also supported but performance is not tested). OpenCL backend of VkFFT

v1.2.26

1 year ago

Radix 6, 8, 9, 10, 12, 14, 15, 16, 32 support Bluestein tuning LUT optimizations Level Zero support Bugfixes

release: main benchmark version. Vulkan should be included in the graphics driver

Vulkan_cuFFT: requires CUDA 9. The only difference to release version is enabled cuFFT benchmark these executables require Vulkan 1.1

CUDA_cuFFT: requires CUDA 9. CUDA backend of VkFFT

OpenCL: requires OpenCL 1.2 (lower versions are also supported but performance is not tested). OpenCL backend of VkFFT

v1.2.17

2 years ago

Some bugfixes and accuracy improvements.

release: main benchmark version. Vulkan should be included in the graphics driver

Vulkan_cuFFT: requires CUDA 9. The only difference to release version is enabled cuFFT benchmark these executables require Vulkan 1.1

CUDA_cuFFT: requires CUDA 9. CUDA backend of VkFFT

OpenCL: requires OpenCL 1.2 (lower versions are also supported but performance is not tested). OpenCL backend of VkFFT

v1.2.12

2 years ago

Full DCT coverage: DCT-I, II, III and IV.

release: main benchmark version. Vulkan should be included in the graphics driver

Vulkan_cuFFT: requires CUDA 9. The only difference to release version is enabled cuFFT benchmark these executables require Vulkan 1.1

CUDA_cuFFT: requires CUDA 9. CUDA backend of VkFFT

OpenCL: requires OpenCL 1.2 (lower versions are also supported but performance is not tested). OpenCL backend of VkFFT