Portable and vendor neutral framework for parallel programming on heterogeneous platforms.
@simd_length
attribute. [#726]memory::size()
returns the number of dtype
entries instead of byte-length. [#711]OCCA_<MODE>_ENABLED
are set in parent scope. [#720]ENABLE_<OPTION>
have been renamed OCCA_ENABLE_<OPTION>
. [#733]memoryPool
has graduated from an experimental feature and is now in the main occa
namespace. [#741]device::finishAll()
. [#723]OCCA is a community driven project that relies on the support of people like you. Thank you everyone who contributed to this release!
Full Changelog: https://github.com/libocca/occa/compare/v1.6.0...v2.0.0
OCCA is a community driven project that relies on the support of people like you. Thank you everyone who contributed to this release!
** Full Changelog**: https://github.com/libocca/occa/compare/v1.5.0...v1.6.0
A device memory pool implementation is available in the occa::experimental
namespace, targeting applications that frequently allocate/deallocate memory. See Example 17 for more details.
Provide feedback or share your use cases for this feature in the Experimental discussion category.
An unwrap
function has been added to the core classes—device
, memory
, stream
, and streamTag
—which returns a void*
pointer to the mode-specific object used to implement each class.
This advance feature is intended to facilitate interoperability between occa and other accelerated libraries. Application developers are responsible casting the returned pointer to the correct mode-specific type.
In the future, a type-safe interface will be provided for the C++ API.
occa::json
kernel properties now take precedence over the corresponding environment variables. [#622]OCCA is a community driven project that relies on the support of people like you. Thank you everyone who contributed to this release!
Full Changelog: https://github.com/libocca/occa/compare/v1.4.0...v1.5.0
stream::finish()
allows for synchronization with a specific stream.device::finishAll()
synchronizes all streams on a device.
device::finish()
, which only synchronizes the current stream.OCCA is a community driven project that relies on the support of people like you. Thank you everyone who contributed to this release!
Full Changelog: https://github.com/libocca/occa/compare/v1.3.0...v1.4.0
OCCA now provides CMake package files which are configured during installation. These package files define an imported target, OCCA::libocca
, and look for all required dependencies.
For example, the CMakeLists.txt of downstream projects using OCCA would include
find_package(OCCA REQUIRED)
add_executable(downstream-app ...)
target_link_libraries(downstream-app PRIVATE OCCA::libocca)
add_library(downstream-lib ...)
target_link_libraries(downstream-lib PUBLIC OCCA::libocca)
In the case of a downstream library, linking OCCA using the PUBLIC specifier ensures that CMake will automatically forward OCCA's dependencies to applications which use the library.
During installation, the Env Modules file
To use this modulefile, add the following line to your .modulerc file
module use -a <occa-install-prefix>/modulefiles
then call
module load occa
The CUDA and HIP backends now support the creation of non-blocking streams. An example has been added demonstrating how to enable this feature.
Additionally, a new API has been added wrap native backend streams [#525]
An interface has been added for logging the memory high watermark. [#522] OCCA preprocessor error messages have also been improved [#572]
A new attribute, @nobarrier
, prevents the automatic addition of barriers to @inner
loop blocks. [#544]
When @inner
loop ranges are known at compile-time, compiler optimization directives are added to translated kernel code for the CUDA, HIP, OpenCL, and SYCL backends.
If @inner
loop ranges are passed as a kernel argument, the OKL translator will not automatically add optimization directives. In this case, the attribute @max_inner_dims
can be used to achieve the same effect.
OCCA is a community driven project that relies on the support of people like you. Thank you everyone who contributed to this release!
Full Changelog: https://github.com/libocca/occa/compare/v1.2.0...v1.3.0
We introduce 3 awesome new features to the OCCA library. They are still in the :black_joker: experimental stage, mainly due to performance reasons. We found an initial approach to enabling inlined lambdas and wanted to see how far we could go with them.
Future work includes profiling and optimizing build + launch of the inlined lambdas. How we cache kernel builds and fetch from the cache is still up in the air, but looking forward to tacking this fun problem :smiley:.
Here we generate a for-loop that goes through [0, N)
and tiled by tileSize
occa::forLoop()
.tile({N, tileSize})
.run(scope, OCCA_FUNCTION([=](const int index) -> void {
// ...
}));
We can do it manually by calling .outer
and .inner
occa::forLoop()
.outer(occa::range(0, N, tileSize))
.inner(tileSize)
.run(scope, OCCA_FUNCTION([=](const int outerIndex, const int innerIndex) -> void {
const int index = innerIndex + (tileSize * innerIndex);
// ...
}));
We give an example where an index array is passed rather than a simple occa::range
Additionally, this @inner
loop has 2 dimensions so the expected OCCA_FUNCTION
should be taking in an int2
for the inner indices
occa::array<int> indices;
// ...
occa::forLoop()
.outer(indices)
.inner(X, Y)
.run(scope, OCCA_FUNCTION([=](const int outerIndex, const int2 innerIndex) -> void {
// ...
}));
We introduce a simple wrapper on occa::memory
which is typed and contains some of the core map
and reduce
functional methods.
Example
const double dynamicValue = 10;
const double compileTimeValue = 100;
occa::scope scope({
// Passed as arguments
{"dynamicValue", dynamicValue}
}, {
// Passed as compile-time #defines
{"defines/compileTimeValue", compileTimeValue}
});
occa::array<double> doubleValues = (
values.map(OCCA_FUNCTION(scope, [](int value) -> double {
return compileTimeValue + (dynamicValue * value);
}));
);
We also include a helper method occa::range
which implements most of the occa::array
methods but can be used without allocating data before iteration. It's useful if there is no specific input/output but still need to call a generic map
or reduce
function.
// Iterates through [0, 1, 2]
occa::range(3).map(...);
// Iterates through [3, 4, 5]
occa::range(3, 6).map(...);
// Iterates through [6, 5, 4]
occa::range(6, 3).map(...);
// Iterates through [0, 2, 4]
occa::range(0, 6, 2).map(...);
// Iterates through [6, 4, 2]
occa::range(6, 0, -2).map(...);
// No-op since there isn't anything to iterate through
occa::range(6, 0, 1).map(...);
forEach
mapTo
map
reduce
every
max
min
some
reverse
shiftLeft
shiftRight
cast
clamp
clampMax
clampMin
concat
dot
fill
slice
findIndex
find
includes
indexOf
lastIndexOf
It's still in it's :black_joker: experimental stage, but OKL now allows for basic atomic operations!
:information_source: @atomic
should be fully available for Serial
and OpenMP
modes. There is probably still room for improvement in the OpenMP
implementation!
:warning: GPU modes (HIP
, CUDA
, OpenCL
) don't have general atomics implemented, only have the following basic updates:
@atomic value += update;
@atomic value -= update;
@atomic value &= update;
@atomic value |= update;
@atomic value ^= update;
@atomic
@atomic *ptr += value;
@atomic
If you prefer, you can use blocks which will be equivalent to inlined @atomic
use if possible
@atomic {
*ptr += value;
}
However, generic @atomic
blocks are also possible
@atomic {
*ptr += value;
*ptr2 += value2;
}
The DPC++ backend was added by the great work completed jointly by ALCF and Intel, with contributions from:
Currently only building with CMake is supported.
A functional occa::lang::array
class was introduced to help with statement (statement_t
) and expression (exprNode
) iteration and transformation. More information on PR #404.
Additionally the occa::lang::expr
class helps create expressions easily without having to worry about pointers or underlying node objects. More information on PR #407.
This is more of a potential breaking change but in a series of commits, we finally split up the public/private API!
occa::properties
is now deprecated and replaced with occa::json
occa::properties
wasn't adding much on top of occa::json
, instead making auto-casting harder since we had to handle both json and prop objects. We still keep the properties
and props
naming convention throughout the library, since that's what they are but have transitioned the types to occa::json
.
We still have a
typedef json properties;
so there shouldn't be any type-breaking changes for C++. The big difference is how std::string
is being cast to json/properties:
std::string
? occa::properties
: The std::string
value is parsed into its JSON value. For example, we can pass {key: 1}
or key: 1
std::string
? occa::json
: The occa::json
value is a literal string value. For example, if we pass {key: 1
} then the occa::json
value will be a string whose value is "{key: 1}"
.Details about the refactor:
- [C++] The only breaking change is property strings now need to have the surrounding braces ({}
) to make it valid JSON
- [C] All property methods have been removed and should be replaced with the Json methods
- [Fortran] All property methods have been removed and should be replaced with the Json methods
umalloc
+ UVA since it's only adding extra overhead and introduces a 3rd way to manage memory along with occa::memory
and occa::array
.host: true
option to malloc
for better host-allocation strategies (Thanks @noelchalmers!)statementArray
and exprNodeArray
which makes it easy to:
forEach
or nestedForEach
(recursive))filter
or flatFilter
(recursive))exprNode
) through exprNodeArray::inplaceMap
occa::lang::expr
helper class to build expressions without having to know the underlying exprNode
types or worry about pointers!okl/strict_headers
kernel property to avoid erroring on headers OCCA can't find. Useful for mode-specific system headers.sourceCodeStatement
to inject non-standard source code when needed.@atomic
support (TODO: Finish most base implementations)occa::setOccaCacheDir
to programmatically set the OCCA_CACHE_DIR
at runtimeocca::getDeviceCount
(Thanks @noelchalmers!)device.wrapMemory
to wrap native pointers into occa::memory
objectsoccaKernelRunWithArgs
which takes an occaType
pointerocca::memory
tracks objects to properly handle slicing (Thanks @noelchalmers 🎉 )0
sysctl
since it was deprecated and later removed from the C standardif
statement's condition
(Thanks @noelchalmers!)strncmp
(Thanks @MalachiTimothyPhillips!)I'm super excited to announce the Fortran API! This was single-handedly designed and built by @awehrfritz, so huge thanks!! The API is not finalized but most likely not changing much in the future since the design matches our other language APIs.
For more information, the initial PR can be found here: #341
For a lot of the OCCA development, most of the work was done by a very small group of people. The project has grown over the last few years from it being a research project to it being used by a few organizations.
During this release, we added CMake support. While it's not directly adding any development features, it will enable the use of the OCCA library to a greater audience which some might say is even more impactful than adding features. What makes this even more exciting is how many unrelated collaborators took part in this work!
Lots of PRs that made this happen: #310, #313, #319, #323, #329, #344, #345, #357
Many thanks to
[de598e65036b444bbf700b05e3981b31254144a2] OCCA now compiles with C++11. C++ projects will need the -std=c++11
flag for most compilers added to compilation.
[f4fea623f09f12249ab2c783a5be6730f5497681] Renamed occa::hash_t
methods
hash_t::toString()
→ hash_t::getString()
hash_t::toFullString()
→ hash_t::getFullString()
[#322] Updates occaFree
to take in the argument by reference rather than value
occaFree(value)
↓
occaFree(&value)
[#341] The Fortran API
[08b3a684cdfa378df3588c67249a786b841730ce] Adds OCCA_JIT
and OCCA_JIT_WITH_SCOPE
macro. Examples for C++ and C can be found:
For Example:
OCCA_JIT(
(entries, a, b, ab),
(
for (int i = 0; i < entries; ++i; @tile(16, @outer, @inner)) {
ab[i] = 100 * (a[i] + b[i]);
}
)
);
[0a776961f420e0eb7c46d8cefdc35644a93b43e0] Adds okl-mode.el
for editing OKL kernels in Emacs 🎉
[f813c34bd1846c2250a7e5375c177bc16bb4c208] Adds templated malloc
for easier use while keeping backwards compatibility
occa::memory mem = occa::malloc(10 * sizeof(float), src);
↓
occa::memory mem = occa::malloc(10, occa::dtype::float_, src);
↓
occa::memory mem = occa::malloc<float>(10, src);
[92ffb5874533252dbbed458c7c3db8a8525ce304] Adds templated umalloc
for easier use while keeping backwards compatibility
float *a = (float*) occa::umalloc(10, occa::dtype::float_, src);
void *b = occa::umalloc(10 * sizeof(float), src);
↓
float *a = occa::umalloc<float>(10, src);
void *b = occa::umalloc(10 * sizeof(float), src);
[c61d636fafd35014aac6f26ef8490c7655c2d6ba] Adds templated ptr
for easier use. Defaults to the return value of void*
for backwards compatibility.
occa::memory mem = occa::malloc(10, occa::dtype::float_, src);
float *ptr = (float*) mem.ptr();
↓
occa::memory mem = occa::malloc(10, occa::dtype::float_, src);
float *ptr = mem.ptr<float>();
[c61d636fafd35014aac6f26ef8490c7655c2d6ba] Adds use_host_pointer
to memory props to auto-wrap source pointers during malloc
calls
float *hostPtr = new float[10];
occa::memory mem = occa::malloc<float>(10, occa::dtype::float_, hostPtr, "use_host_pointer: true");
mem.ptr<float>() == hostPtr;
Adds polyfills to test compilation of locally unsupported modes
[284aff89bd692f3254772c9263347a1e2c58ba09] Adds method to get the kernel hash
C++
occa::kernel::hash()
which returns a occa::hash_t
objectC
occaKernelGetHash
and occaKernelGetFullHash
which return hash as a const char*
[f2f21a3cad27c9e0ff49041e15f14401785187d4] Adds Metal backend for GPGPU in MacOS
double
or long
typestypedef
due to missing address space qualifiers[386bc4c6849441e0caf2accc995b546b7a969cf4] Adds occa translate --launcher
to get the host code needed to launch the device kernels (CUDA, HIP, OpenCL, Metal modes)
[#246] Adds the @directive
preprocessor attribute to add directives inside macros, such as OCCA_JIT
@directive("#pragma ivdep")
↓
#pragma ivdep
[#265] Adds OCCA_CONFIG
config file to set defaults. There is a config.defaults.json
file with explanation of possible properties that can be set, including mode-specific properties.
[#266] Allows HIP to compile CUDA kernels (Thanks @noelchalmers!)
[#270] Adds occa::null
for passing a NULL
equivalent to occa::kernel
s (occaNull
in C)
[#284] Adds OCCA_LDFLAGS
along with kernel/compiler_linker_flags
(Thanks @stgeke!)
[#308] Adds OCCA_SHARED_FLAGS
along with kernel/compiler_shared_flags
[#308] Adds support to build native C kernels (disabling OKL with okl/enabled
set to false
and setting kernel/compiler_language
to C
which defaults to C++
) (Thanks @amikstcyr!)
[#346] Supports #include
of standard C and C++ headers in OKL kernels. Note this will print warnings since adding these headers is not a portable solution across supported backends.
[#347] Adds some standard defines on OKL kernels so users can check if the kernel is being processed by an OKL kernel or not. This is useful when reusing source code for OCCA kernels and non-OCCA kernels.
[#349] Keeps some comments around after applying OKL transformation for cleaner generated kernels.
[#354] Adds OKL_KERNEL_HASH
define to help debug which kernel is currently being run (Tip: printf
and std::cout
are available in Serial
and OpenMP
modes!)
[#349][#355][#358][#364] Keeps comments around when transpiling kernels
.dylib
instead of .so
on MacOS (Thanks @thilinarmtb!)PREFIX
(Thanks @thilinarmtb!)const
qualifier &rarr __constant
json
-> properties
unsafe cast (Thanks for pointing it out @stgeke!)kernelBuilder
[beec0865857f5fbdf55554deb596396b40614357] Added struct
support to OKL
There are still a few missing features when using structs
, such as:
typedef
-ing structs
typedef struct { } foo;
Expanding @attributes
on struct variables
struct mat3 {
int *values @dim(3, 3);
}
mat3 m;
// Error since the parser right now doesn't "know" `values` is a @dim(3, 3)
m.values(0, 0);
Access level modifiers are not supported at the moment
struct foo {
private: ...
}
[bf1dd169dfba475ab7ac35a4529fe4c6890d38cd] @restrict
expands to __restrict__
by default
OpenCL mode overrides it to restrict
Setting the property options/restrict
overrides either of those two values. For example:
disable
will make it so @restrict
is ignored
Any other value will be used instead (e.g. setting it to '__declspec(restrict)'
would be preferred in Windows)
[897f60043f3ed75d7e643ed0e1ff54f00b8f0efb] Defaults compiler flags to optimize compilation (e.g. -O3
)
occa::memory
objectCheck it out at libocca/occa.py or install running
pip install occa
occa.memory
objects@okl.kernel
def py_add_vectors(a: Const[List[np.float32]],
b: Const[List[np.float32]],
ab: List[np.float32]) -> None:
for i in okl.range(entries).tile(16):
ab[i] = a[i] + b[i]
↓
@kernel void py_add_vectors(const float *a,
const float *b,
float *ab) {
for (int i = 0; i < entries; ++i; @tile(16, @outer, @inner)) {
ab[i] = a[i] + b[i];
}
}
[54f400321cf129f5cdc0768f53b1e4da8a3cd0fe] Added dtypes which can be optionally used for runtime type checking
occa::dtype_t
occa::memory
allocationocca::malloc(10 * sizeof(float)); // Regular malloc
occa::malloc(10, occa::dtype::float_); // Typed malloc
occa::malloc(10, occa::dtype::get<float>()); // Templated typed malloc
occaMalloc(10 * sizeof(float), NULL, occaDefault); // Regular malloc
occaTypedMalloc(10, occaDtypeFloat, NULL, occaDefault); // Typed malloc
occa::dtype_t vec3; // { float x, y, z }
vec3.addField("x", occa::dtype::float_);
vec3.addField("y", occa::dtype::float_);
vec3.addField("z", occa::dtype::float_);
[994eb2a5fed533d16392b96465036a2c07fa1459] Added more kernel methods for the C API
void occaKernelPushArg(occaKernel kernel,
occaType arg);
void occaKernelClearArgs(occaKernel kernel);
void occaKernelRunFromArgs(occaKernel kernel);
void occaKernelVaRun(occaKernel kernel,
const int argc,
va_list args);
[f6333f2c808ddd29399b1f7ea0def461050d9d0c] Custom kernel library paths, for example:
// Application code
occa::io::addLibraryPath("mylibrary", "./path/to/kernels/dir");
occa::io::addLibraryPath("mylibrary", "${MY_LIBRARY_DIR}");
// Kernel
#include "occa://mylibrary/kernel.okl"
[0c975e18088df5495bc4d63f4f3000c5415cf9a0] Modes can run a check before registering themselves, letting badly installed OpenCL/CUDA/HIP become disabled at runtime
[f5f0e81d271837b2132dc34c90bd4163ed58f95e] OpenCL in newer MacOS versions don't seem to support loading program binaries. We now avoid storing the binary in these cases
[cd687082554d630259b8a3d52c51a4c0f9c6f0f5] Updated wrapMemory
to take in an occa::device
and occa::properties
Before
occa::cpu::wrapMemory(void* ptr, const udim_t bytes)
After
occa::cpu::wrapMemory(occa::device device, void* ptr, const udim_t bytes, occa::properties props)
[959ec4a7e1d28876d7dc81804cfcd99cd356adb7] Renamed occaSetDeviceFromInfos
to fit the rest of the methods
Before
occaSetDeviceFromInfos(const char *info)
After
occaSetDeviceFromString(const char *info)
[7735c669eb001b7c00d4b56ec2cbb61d1a4f8b27] Removed some redundant stream methods
Before
occa::device::freeStream(occa::stream) // C++
occaDeviceFreeStream(occaStream) // C
After (Not new)
occa::stream::free() // C++
occaFree(occaStream) // C
[f81054dfacc83b4897e459c55edb2badd85b4caf] Removed occa::opencl::event()
and moved it to occa::opencl::streamTag::clEvent
[f81054dfacc83b4897e459c55edb2badd85b4caf] Removed occa::cuda::event()
and moved it to occa::cuda::streamTag::cuEvent
[f81054dfacc83b4897e459c55edb2badd85b4caf] Removed occa::streamTag::tagTime
. Tags can only be used for:
make
build and added make info
@v-dobrevNULL
out existing device/kernel/memory objects when one is freed. This switches SEGFAULT
issues to occa::exception
errors that can be more easily debugged.occaJson
occaCreateDeviceFromString
occa::stream
class can now be extendedocca::streamTag
class can now be extendedverbose
)__global
, __local
, and __kernel
are properly inserted in the beginningmemory::slice
was improperly freeing UVA pointers inverbose
property was being overwritten in CUDA mode