Occa Versions Save

Portable and vendor neutral framework for parallel programming on heterogeneous platforms.

v2.0.0

1 month ago

Features

  • The maximum number of kernel arguments can be adjusted at build time. [#718]
  • SYCL subgroup size can be set via kernel property or @simd_length attribute. [#726]
  • Initial support for compiler attribute statements. [#729]

Breaking Changes

  • memory::size() returns the number of dtype entries instead of byte-length. [#711]
  • Memory copies are now datatype aware for consistency. [#728]
  • The CMake variables OCCA_<MODE>_ENABLED are set in parent scope. [#720]
  • CMake build options ENABLE_<OPTION> have been renamed OCCA_ENABLE_<OPTION>. [#733]
  • memoryPool has graduated from an experimental feature and is now in the main occa namespace. [#741]

Bugfixes

  • Correctly sync all streams in device::finishAll(). [#723]
  • Corruption of memory datatypes when using slices. [#727]

Contributors

OCCA is a community driven project that relies on the support of people like you. Thank you everyone who contributed to this release!

  • @nsoborin
  • @rakesroy
  • @barracuda156
  • @AljenU
  • @deukhyun-cha
  • @noelchalmers
  • @kris-rowe

Full Changelog: https://github.com/libocca/occa/compare/v1.6.0...v2.0.0

v1.6.0

8 months ago

Features

  • Devices can be shared by multiple host threads [#672]
  • Pass general objects to kernels by value [#676]
  • Quick return from some memory functions for zero sized allocations [#678]
  • OKL support for typedef enums and unions [#705]

Bugfixes

  • Correctly set source and binary filenames when building a launchedKernel [#666]

Contributors

OCCA is a community driven project that relies on the support of people like you. Thank you everyone who contributed to this release!

  • @ooreilly
  • @BenWibking
  • @topazus
  • @cjatin
  • @mkbosmans
  • @TejaX-Alaghari
  • @kian-ohara
  • @deukhyun-cha
  • @amikstcyr
  • @thilinarmtb
  • @noelchalmers
  • @kris-rowe

** Full Changelog**: https://github.com/libocca/occa/compare/v1.5.0...v1.6.0

v1.5.0

1 year ago

Features

Memory Pools

A device memory pool implementation is available in the occa::experimental namespace, targeting applications that frequently allocate/deallocate memory. See Example 17 for more details.

Provide feedback or share your use cases for this feature in the Experimental discussion category.

Outward Interoperability

An unwrap function has been added to the core classes—device, memory, stream, and streamTag—which returns a void* pointer to the mode-specific object used to implement each class.

This advance feature is intended to facilitate interoperability between occa and other accelerated libraries. Application developers are responsible casting the returned pointer to the correct mode-specific type.

In the future, a type-safe interface will be provided for the C++ API.

Breaking Changes

  • Compiler flags set via occa::json kernel properties now take precedence over the corresponding environment variables. [#622]

Bugfixes

  • Dynamic @exclusive sizes [#121]
  • Build artifacts (e.g., binaries for kernel + launcher) are not durable [#515]
  • Missing initial index value on @inner/@outer loops causes a segfault during translation. [#610]
  • A seg fault is encountered when destroying an occa::kernel that was created via a multi-kernel OKL file in CUDA, HIP, or DPC++ modes. [#624]

Contributors

OCCA is a community driven project that relies on the support of people like you. Thank you everyone who contributed to this release!

  • @MalachiTimothyPhillips
  • @stgeke
  • @SFrijters
  • @noelchalmers
  • @kris-rowe

Full Changelog: https://github.com/libocca/occa/compare/v1.4.0...v1.5.0

v1.4.0

1 year ago

Features

Stream and device synchronization

  • The member function stream::finish() allows for synchronization with a specific stream.
  • The member function device::finishAll() synchronizes all streams on a device.
    • This is in contrast to device::finish(), which only synchronizes the current stream.
  • Related C and Fortran interfaces have been included for both functions.
  • Example 07 streams has been updated to demonstrate intended usage.

Breaking Changes

  • All MPI related functionality has been removed from OCCA.

Bugfixes

  • A race condition that occurs when writing kernel caches. [#594]
  • Reported CPU frequencies are now scaled correctly. [#601]
  • Redundant OpenMP and library flags have been removed from the CMake build. [#607]

Contributors

OCCA is a community driven project that relies on the support of people like you. Thank you everyone who contributed to this release!

  • @SFrijters
  • @noelchalmers
  • @kris-rowe

Full Changelog: https://github.com/libocca/occa/compare/v1.3.0...v1.4.0

v1.3.0

1 year ago

Features

CMake Package Files [#533]

OCCA now provides CMake package files which are configured during installation. These package files define an imported target, OCCA::libocca, and look for all required dependencies.

For example, the CMakeLists.txt of downstream projects using OCCA would include

find_package(OCCA REQUIRED)

add_executable(downstream-app ...)
target_link_libraries(downstream-app PRIVATE OCCA::libocca)

add_library(downstream-lib ...)
target_link_libraries(downstream-lib PUBLIC OCCA::libocca)

In the case of a downstream library, linking OCCA using the PUBLIC specifier ensures that CMake will automatically forward OCCA's dependencies to applications which use the library.

Environment Module [#580]

During installation, the Env Modules file /modulefiles/occa is generated. When this module is loaded, paths to the installed bin, lib, and include directories are appended to environment variables such as PATH and LD_LIBRARY_PATH.

To use this modulefile, add the following line to your .modulerc file

module use -a <occa-install-prefix>/modulefiles

then call

module load occa

Non-blocking Streams [#498]

The CUDA and HIP backends now support the creation of non-blocking streams. An example has been added demonstrating how to enable this feature.

Additionally, a new API has been added wrap native backend streams [#525]

Profiling and Debugging

An interface has been added for logging the memory high watermark. [#522] OCCA preprocessor error messages have also been improved [#572]

OKL

A new attribute, @nobarrier, prevents the automatic addition of barriers to @inner loop blocks. [#544]

Kernel Loop Ranges [#531]

When @inner loop ranges are known at compile-time, compiler optimization directives are added to translated kernel code for the CUDA, HIP, OpenCL, and SYCL backends.

If @inner loop ranges are passed as a kernel argument, the OKL translator will not automatically add optimization directives. In this case, the attribute @max_inner_dims can be used to achieve the same effect.

Dependency Changes

  • The minimum version of CMake required is now v3.17 [#528]

Bugfixes

  • Compilation on MacOS [#485]
  • streamTag timings for OpenCL were corrected to give meaningful results [#518]
  • Git ignore build dir if it's a symlink [#536]
  • Use mpi_f08 module to fix Intel compiler warnings [#539]
  • Use correct directory in run_examples script [#540]
  • HIP compiler error and warnings [#547]
  • Examples JSON [#549]
  • Broken caching for "output" file [#554]

Contributors

OCCA is a community driven project that relies on the support of people like you. Thank you everyone who contributed to this release!

  • @Luthaf
  • @AljenU
  • @deukhyun-cha
  • @wjhorne
  • @MalachiTimothyPhillips
  • @stgeke
  • @SFrijters
  • @noelchalmers
  • @kris-rowe

Full Changelog: https://github.com/libocca/occa/compare/v1.2.0...v1.3.0

v1.2.0

2 years ago

Table of Contents

:fire: Exciting News

We introduce 3 awesome new features to the OCCA library. They are still in the :black_joker: experimental stage, mainly due to performance reasons. We found an initial approach to enabling inlined lambdas and wanted to see how far we could go with them.

Future work includes profiling and optimizing build + launch of the inlined lambdas. How we cache kernel builds and fetch from the cache is still up in the air, but looking forward to tacking this fun problem :smiley:.

:black_joker: occa::forLoop and inlined kernels

Basic Example

Here we generate a for-loop that goes through [0, N) and tiled by tileSize

occa::forLoop()
  .tile({N, tileSize})
  .run(scope, OCCA_FUNCTION([=](const int index) -> void {
    // ...
  }));

We can do it manually by calling .outer and .inner

occa::forLoop()
  .outer(occa::range(0, N, tileSize))
  .inner(tileSize)
  .run(scope, OCCA_FUNCTION([=](const int outerIndex, const int innerIndex) -> void {
    const int index = innerIndex + (tileSize * innerIndex);
    // ...
  }));

Indices + Multiple Dimensions

We give an example where an index array is passed rather than a simple occa::range Additionally, this @inner loop has 2 dimensions so the expected OCCA_FUNCTION should be taking in an int2 for the inner indices

occa::array<int> indices;
// ...
occa::forLoop()
  .outer(indices)
  .inner(X, Y)
  .run(scope, OCCA_FUNCTION([=](const int outerIndex, const int2 innerIndex) -> void {
    // ...
  }));

:black_joker: occa::array and Functional Programming

We introduce a simple wrapper on occa::memory which is typed and contains some of the core map and reduce functional methods.

Example

const double dynamicValue = 10;
const double compileTimeValue = 100;

occa::scope scope({
  // Passed as arguments
    {"dynamicValue", dynamicValue}
  }, {
  // Passed as compile-time #defines
    {"defines/compileTimeValue", compileTimeValue}
});

occa::array<double> doubleValues = (
  values.map(OCCA_FUNCTION(scope, [](int value) -> double {
    return compileTimeValue + (dynamicValue * value);
  }));
);

We also include a helper method occa::range which implements most of the occa::array methods but can be used without allocating data before iteration. It's useful if there is no specific input/output but still need to call a generic map or reduce function.

// Iterates through [0, 1, 2]
occa::range(3).map(...);

// Iterates through [3, 4, 5]
occa::range(3, 6).map(...);

// Iterates through [6, 5, 4]
occa::range(6, 3).map(...);

// Iterates through [0, 2, 4]
occa::range(0, 6, 2).map(...);

// Iterates through [6, 4, 2]
occa::range(6, 0, -2).map(...);

// No-op since there isn't anything to iterate through
occa::range(6, 0, 1).map(...);
Core methods
  • forEach
  • mapTo
  • map
  • reduce
Reduction
  • every
  • max
  • min
  • some
Re-indexing
  • reverse
  • shiftLeft
  • shiftRight
Utility methods
  • cast
  • clamp
  • clampMax
  • clampMin
  • concat
  • dot
  • fill
  • slice
Search
  • findIndex
  • find
  • includes
  • indexOf
  • lastIndexOf

:black_joker: Atomics

It's still in it's :black_joker: experimental stage, but OKL now allows for basic atomic operations!

:information_source:   @atomic should be fully available for Serial and OpenMP modes. There is probably still room for improvement in the OpenMP implementation!

:warning:   GPU modes (HIP, CUDA, OpenCL) don't have general atomics implemented, only have the following basic updates:

  • @atomic value += update;
  • @atomic value -= update;
  • @atomic value &= update;
  • @atomic value |= update;
  • @atomic value ^= update;

Inlined @atomic

@atomic *ptr += value;

Block @atomic

If you prefer, you can use blocks which will be equivalent to inlined @atomic use if possible

@atomic {
  *ptr += value;
}

However, generic @atomic blocks are also possible

@atomic {
  *ptr  += value;
  *ptr2 += value2;
}

:black_joker: DPC++ Backend

The DPC++ backend was added by the great work completed jointly by ALCF and Intel, with contributions from:

  • Anoop Madhusoodhanan Prabha (Intel)
  • Cedric Andreolli (Intel)
  • Kris Rowe (ALCF)
  • Phillipe Thierry (Intel)
  • Saumil Patel (ALCF)

Notes

Currently only building with CMake is supported.

Code Transformation Rewrite

The way statement and expression code transformations are done have been fully rewritten!

A functional occa::lang::array class was introduced to help with statement (statement_t) and expression (exprNode) iteration and transformation. More information on PR #404.

Additionally the occa::lang::expr class helps create expressions easily without having to worry about pointers or underlying node objects. More information on PR #407.

:warning: Breaking Changes

  • This is more of a potential breaking change but in a series of commits, we finally split up the public/private API!

    • [973780b369d35216a43f58b0a26cf1466ce1df5a]
    • [1aeeb86fc782d07db22e1d9a750981b1de07e592]
    • [cb9a2214ea33c73586c7e5ac66c4144949a414ef]
    • [76ea69518ad399e58bc111bc17d331e596f22312]
  • occa::properties is now deprecated and replaced with occa::json

occa::properties wasn't adding much on top of occa::json, instead making auto-casting harder since we had to handle both json and prop objects. We still keep the properties and props naming convention throughout the library, since that's what they are but have transitioned the types to occa::json.

We still have a

typedef json properties;

so there shouldn't be any type-breaking changes for C++. The big difference is how std::string is being cast to json/properties:

  • std::string ? occa::properties: The std::string value is parsed into its JSON value. For example, we can pass {key: 1} or key: 1
  • std::string ? occa::json: The occa::json value is a literal string value. For example, if we pass {key: 1} then the occa::json value will be a string whose value is "{key: 1}".

Details about the refactor: - [C++] The only breaking change is property strings now need to have the surrounding braces ({}) to make it valid JSON - [C] All property methods have been removed and should be replaced with the Json methods - [Fortran] All property methods have been removed and should be replaced with the Json methods

  • [#475 ] We're removing umalloc + UVA since it's only adding extra overhead and introduces a 3rd way to manage memory along with occa::memory and occa::array.

:star: Features

  • [#376] Adds host: true option to malloc for better host-allocation strategies (Thanks @noelchalmers!)
  • [#404] Code transformation just got easier with the introduction of the very functional statementArray and exprNodeArray which makes it easy to:
    • Iterate through statements (forEach or nestedForEach (recursive))
    • Filter statements (filter or flatFilter (recursive))
    • Transform expressions (exprNode) through exprNodeArray::inplaceMap
  • [#407] Introduces the occa::lang::expr helper class to build expressions without having to know the underlying exprNode types or worry about pointers!
  • [#408] Adds okl/strict_headers kernel property to avoid erroring on headers OCCA can't find. Useful for mode-specific system headers.
  • [#409] Adds sourceCodeStatement to inject non-standard source code when needed.
  • [#410] Adds @atomic support (TODO: Finish most base implementations)
  • [#411] Updates bash autocomplete
  • [#420] Adds occa::setOccaCacheDir to programmatically set the OCCA_CACHE_DIR at runtime
  • [#421] Handle new HIP output formats in our builds (Thanks @dmcdougall!)
  • [#427] Adds occa::getDeviceCount (Thanks @noelchalmers!)
  • [#425] We now test CMake builds in our Github Actions (Thanks @noelchalmers!)
  • [#435] Adds device.wrapMemory to wrap native pointers into occa::memory objects
  • [#459] Adds occaKernelRunWithArgs which takes an occaType pointer
  • [#494] Adds DPC++ backend (Thanks @kris-rowe :tada: :tada: :tada:)
  • [#490] Changes how occa::memory tracks objects to properly handle slicing (Thanks @noelchalmers 🎉 )

:bug: Bugs Fixed

  • [#405][#412][#413][#414] Lots of fixes from @SFrijters, thank you! :tada:
  • [#431] Avoid calling kernels when the run sizes are 0
  • [#432] Removed use of sysctl since it was deprecated and later removed from the C standard
  • [#437] Fix HIP pointer reference
  • [#449] Replacing locks with temporary files to avoid race conditions (Thanks @SFrijters + @jedbrown!!)
  • [#456] Iterates through an if statement's condition (Thanks @noelchalmers!)
  • [#477] Generate fixed arity interface (Thanks @SFrijters!)
  • [#495] Fixes unsafe use of strncmp (Thanks @MalachiTimothyPhillips!)

:tada: Contributors

  • @dmcdougall
  • @jedbrown
  • @kris-rowe
  • @MalachiTimothyPhillips
  • @noelchalmers
  • @SFrijters

v1.1.0

3 years ago

:fire: Exciting News

Fortran API

I'm super excited to announce the Fortran API! This was single-handedly designed and built by @awehrfritz, so huge thanks!! The API is not finalized but most likely not changing much in the future since the design matches our other language APIs.

For more information, the initial PR can be found here: #341

Collaboration

Join our Slack workspace!

For a lot of the OCCA development, most of the work was done by a very small group of people. The project has grown over the last few years from it being a research project to it being used by a few organizations.

During this release, we added CMake support. While it's not directly adding any development features, it will enable the use of the OCCA library to a greater audience which some might say is even more impactful than adding features. What makes this even more exciting is how many unrelated collaborators took part in this work!

Lots of PRs that made this happen: #310, #313, #319, #323, #329, #344, #345, #357

Many thanks to

  • @SFrijters
  • @JamesEggleton
  • @noelchalmers
  • @amikstcyr
  • @gionellef

⚠️ Breaking Changes

  • [de598e65036b444bbf700b05e3981b31254144a2] OCCA now compiles with C++11. C++ projects will need the -std=c++11 flag for most compilers added to compilation.

  • [f4fea623f09f12249ab2c783a5be6730f5497681] Renamed occa::hash_t methods

    • hash_t::toString()hash_t::getString()
    • hash_t::toFullString()hash_t::getFullString()
  • [#322] Updates occaFree to take in the argument by reference rather than value

    occaFree(value)
    

    occaFree(&value)
    

🃏Experimental

  • [#341] The Fortran API

  • [08b3a684cdfa378df3588c67249a786b841730ce] Adds OCCA_JIT and OCCA_JIT_WITH_SCOPE macro. Examples for C++ and C can be found:

    For Example:

      OCCA_JIT(
        (entries, a, b, ab),
        (
          for (int i = 0; i < entries; ++i; @tile(16, @outer, @inner)) {
            ab[i] = 100 * (a[i] + b[i]);
          }
        )
      );
    
  • [0a776961f420e0eb7c46d8cefdc35644a93b43e0] Adds okl-mode.el for editing OKL kernels in Emacs 🎉

⭐️ Features

  • [f813c34bd1846c2250a7e5375c177bc16bb4c208] Adds templated malloc for easier use while keeping backwards compatibility

    Original malloc

    occa::memory mem = occa::malloc(10 * sizeof(float), src);
    

    Initial dtype malloc

    occa::memory mem = occa::malloc(10, occa::dtype::float_, src);
    

    New malloc

    occa::memory mem = occa::malloc<float>(10, src);
    
  • [92ffb5874533252dbbed458c7c3db8a8525ce304] Adds templated umalloc for easier use while keeping backwards compatibility

    float *a = (float*) occa::umalloc(10, occa::dtype::float_, src);
    void *b = occa::umalloc(10 * sizeof(float), src);
    

    float *a = occa::umalloc<float>(10, src);
    void *b = occa::umalloc(10 * sizeof(float), src);
    
  • [c61d636fafd35014aac6f26ef8490c7655c2d6ba] Adds templated ptr for easier use. Defaults to the return value of void* for backwards compatibility.

    occa::memory mem = occa::malloc(10, occa::dtype::float_, src);
    float *ptr = (float*) mem.ptr();
    

    occa::memory mem = occa::malloc(10, occa::dtype::float_, src);
    float *ptr = mem.ptr<float>();
    
  • [c61d636fafd35014aac6f26ef8490c7655c2d6ba] Adds use_host_pointer to memory props to auto-wrap source pointers during malloc calls

    float *hostPtr = new float[10];
    occa::memory mem = occa::malloc<float>(10, occa::dtype::float_, hostPtr, "use_host_pointer: true");
    mem.ptr<float>() == hostPtr;
    
  • Adds polyfills to test compilation of locally unsupported modes

    • [a377db86f5a7bf642230ff9cd6c6082178d1e8f1] CUDA polyfill
    • [58fadc23de7a5c2beb436603c3715b470a498a56] HIP polyfill
    • [756005e42371d9c91f0e49b01730dccc9c3b4feb] OpenCL polyfill
    • [0a7253ef1468d796b10c773c615b5fb67d9c62cc] Metal polyfill
  • [284aff89bd692f3254772c9263347a1e2c58ba09] Adds method to get the kernel hash

    • C++ occa::kernel::hash() which returns a occa::hash_t object
    • C occaKernelGetHash and occaKernelGetFullHash which return hash as a const char*
  • [f2f21a3cad27c9e0ff49041e15f14401785187d4] Adds Metal backend for GPGPU in MacOS

    • Requires MacOS to be at least 10.4 (Mojave)
    • Requires XCode version to be at least 10.2.1
    • Metal does not support double or long types
    • Issues with global typedef due to missing address space qualifiers
  • [386bc4c6849441e0caf2accc995b546b7a969cf4] Adds occa translate --launcher to get the host code needed to launch the device kernels (CUDA, HIP, OpenCL, Metal modes)

  • [#246] Adds the @directive preprocessor attribute to add directives inside macros, such as OCCA_JIT

    @directive("#pragma ivdep")
    

    #pragma ivdep
    
  • [#265] Adds OCCA_CONFIG config file to set defaults. There is a config.defaults.json file with explanation of possible properties that can be set, including mode-specific properties.

  • [#266] Allows HIP to compile CUDA kernels (Thanks @noelchalmers!)

  • [#270] Adds occa::null for passing a NULL equivalent to occa::kernels (occaNull in C)

  • [#284] Adds OCCA_LDFLAGS along with kernel/compiler_linker_flags (Thanks @stgeke!)

  • [#308] Adds OCCA_SHARED_FLAGS along with kernel/compiler_shared_flags

  • [#308] Adds support to build native C kernels (disabling OKL with okl/enabled set to false and setting kernel/compiler_language to C which defaults to C++) (Thanks @amikstcyr!)

  • [#346] Supports #include of standard C and C++ headers in OKL kernels. Note this will print warnings since adding these headers is not a portable solution across supported backends.

  • [#347] Adds some standard defines on OKL kernels so users can check if the kernel is being processed by an OKL kernel or not. This is useful when reusing source code for OCCA kernels and non-OCCA kernels.

  • [#349] Keeps some comments around after applying OKL transformation for cleaner generated kernels.

  • [#354] Adds OKL_KERNEL_HASH define to help debug which kernel is currently being run (Tip: printf and std::cout are available in Serial and OpenMP modes!)

  • [#349][#355][#358][#364] Keeps comments around when transpiling kernels

🐛 Bugs Fixed

  • [ebdb659c804f91f1e0f32fd700f9fe229458033c] Updates to HIP backend (Thanks @noelchalmers!)
  • [ac117fb29a8ce1265d5b8aaaa72ce259bf11f3b2] Fixed caching bugs (Thanks Nigel Nunn!)
  • [542000557c2ca0c21f5ae8f8db07e14dc90546ac] Use .dylib instead of .so on MacOS (Thanks @thilinarmtb!)
  • [ce4df262600bfc06ade8812cb67e8da1ec1c37a1] Properly copy over artifacts when building with PREFIX (Thanks @thilinarmtb!)
  • [#243] Properly avoid overriding and duplicating compiler shared flags(Thanks @noelchalmers!)
  • [f23ce885b8de33b26a0bfcef3555fcd0be915217] Avoids writing lockfile when checking compiler vendor
  • [3df3955f6765b5a5f7a59b7ef9e9ead4a43895aa] Properly fixed untyped umalloc in C
  • [4d5d5bc0dbfab92b86b4a8cbc4f29823e6b3a69e] Kernels from strings were badly generating the launcher kernel
  • [27a7420ae544600f5c39c9fcae385307e193d3eb] OpenCL translation was converting the const pointer typedefs const qualifier &rarr __constant
  • [#261] Invalid read in json -> properties unsafe cast (Thanks for pointing it out @stgeke!)
  • [#265] Fixes object/mode specific properties from not propagating
  • [86dead2584f4d2479e092935a26350a7bd6b2b25] OpenCL timing was done backwards, resulting in negative times. (Thanks @tcew!)
  • [#293] Fixed some reference counting issues with the kernelBuilder
  • [#400] CUDA context was not being set in a few places (Thanks @amikstcyr!)

🎉 Contributors

  • @amikstcyr
  • @awehrfritz
  • @gionellef
  • @hknibbe2
  • @JamesEggleton
  • @jeremylt
  • Nigel Nunn
  • @noelchalmers
  • @SFrijters
  • @suniljoshi82
  • @stgeke
  • @tcew
  • @thilinarmtb

v1.0.9

4 years ago

⭐️ Features

  • [beec0865857f5fbdf55554deb596396b40614357] Added struct support to OKL

    There are still a few missing features when using structs, such as:

    • typedef-ing structs

      typedef struct { } foo;
      
    • Expanding @attributes on struct variables

      struct mat3 {
        int *values @dim(3, 3);
      }
      mat3 m;
      // Error since the parser right now doesn't "know" `values` is a @dim(3, 3)
      m.values(0, 0); 
      
    • Access level modifiers are not supported at the moment

      struct foo {
        private: ...
      }
      
  • [bf1dd169dfba475ab7ac35a4529fe4c6890d38cd] @restrict expands to __restrict__ by default

    • OpenCL mode overrides it to restrict

    • Setting the property options/restrict overrides either of those two values. For example:

      • disable will make it so @restrict is ignored

      • Any other value will be used instead (e.g. setting it to '__declspec(restrict)' would be preferred in Windows)

  • [897f60043f3ed75d7e643ed0e1ff54f00b8f0efb] Defaults compiler flags to optimize compilation (e.g. -O3)

🐛 Bugs Fixed

  • [e21962dcf5205e81a3780f0c2419d31c45a34102] CPU wrapped memory was being freed by the occa::memory object

v1.0.8

5 years ago

:loudspeaker: Annoucement

Python API released!

Check it out at libocca/occa.py or install running

pip install occa
  • Most of the core API is ported to Python
  • Numpy arrays are used seamlessly with occa.memory objects
  • First steps to supporting JIT-compiled Python functions as OKL kernels:
@okl.kernel
def py_add_vectors(a: Const[List[np.float32]],
                   b: Const[List[np.float32]],
                   ab: List[np.float32]) -> None:
    for i in okl.range(entries).tile(16):
        ab[i] = a[i] + b[i]

@kernel void py_add_vectors(const float *a,
                            const float *b,
                            float *ab) {
  for (int i = 0; i < entries; ++i; @tile(16, @outer, @inner)) {
    ab[i] = a[i] + b[i];
  }
}

⭐️ Features

  • [54f400321cf129f5cdc0768f53b1e4da8a3cd0fe] Added dtypes which can be optionally used for runtime type checking

    • New class occa::dtype_t
    • Optional typed occa::memory allocation
    occa::malloc(10 * sizeof(float));            // Regular malloc
    occa::malloc(10, occa::dtype::float_);       // Typed malloc
    occa::malloc(10, occa::dtype::get<float>()); // Templated typed malloc
    
    occaMalloc(10 * sizeof(float), NULL, occaDefault);      // Regular malloc
    occaTypedMalloc(10, occaDtypeFloat, NULL, occaDefault); // Typed malloc
    
    • API for creating custom dtypes, for example:
     occa::dtype_t vec3; // { float x, y, z }
     vec3.addField("x", occa::dtype::float_);
     vec3.addField("y", occa::dtype::float_);
     vec3.addField("z", occa::dtype::float_);
    
  • [994eb2a5fed533d16392b96465036a2c07fa1459] Added more kernel methods for the C API

    void occaKernelPushArg(occaKernel kernel,
                           occaType arg);
    
    void occaKernelClearArgs(occaKernel kernel);
    
    void occaKernelRunFromArgs(occaKernel kernel);
    
    void occaKernelVaRun(occaKernel kernel,
                         const int argc,
                         va_list args);
    
  • [f6333f2c808ddd29399b1f7ea0def461050d9d0c] Custom kernel library paths, for example:

    // Application code
    occa::io::addLibraryPath("mylibrary", "./path/to/kernels/dir");
    occa::io::addLibraryPath("mylibrary", "${MY_LIBRARY_DIR}");
    
    // Kernel
    #include "occa://mylibrary/kernel.okl"
    

🐛 Bugs Fixed

  • [0c975e18088df5495bc4d63f4f3000c5415cf9a0] Modes can run a check before registering themselves, letting badly installed OpenCL/CUDA/HIP become disabled at runtime

  • [f5f0e81d271837b2132dc34c90bd4163ed58f95e] OpenCL in newer MacOS versions don't seem to support loading program binaries. We now avoid storing the binary in these cases

v1.0.7

5 years ago

⚠️ Breaking Changes

  • [cd687082554d630259b8a3d52c51a4c0f9c6f0f5] Updated wrapMemory to take in an occa::device and occa::properties

    Before

    occa::cpu::wrapMemory(void* ptr, const udim_t bytes)
    

    After

    occa::cpu::wrapMemory(occa::device device, void* ptr, const udim_t bytes, occa::properties props)
    
  • [959ec4a7e1d28876d7dc81804cfcd99cd356adb7] Renamed occaSetDeviceFromInfos to fit the rest of the methods

    Before

    occaSetDeviceFromInfos(const char *info)
    

    After

    occaSetDeviceFromString(const char *info)
    
  • [7735c669eb001b7c00d4b56ec2cbb61d1a4f8b27] Removed some redundant stream methods

    Before

    occa::device::freeStream(occa::stream) // C++
    occaDeviceFreeStream(occaStream)       // C
    

    After (Not new)

    occa::stream::free() // C++
    occaFree(occaStream) // C
    
  • [f81054dfacc83b4897e459c55edb2badd85b4caf] Removed occa::opencl::event() and moved it to occa::opencl::streamTag::clEvent

  • [f81054dfacc83b4897e459c55edb2badd85b4caf] Removed occa::cuda::event() and moved it to occa::cuda::streamTag::cuEvent

  • [f81054dfacc83b4897e459c55edb2badd85b4caf] Removed occa::streamTag::tagTime. Tags can only be used for:

    • Waiting for queued tasks to finish (e.g. launched kernels or memory copies)
    • Time gaps between 2 tags

⭐️ Features

  • [daf030050355415888b2c10b01300f817dfabebd] Faster make build and added make info @v-dobrev
  • [1024a62072c3ee7bcbcf6e8991eb7be852165ec9] Switched garbage collection strategy to NULL out existing device/kernel/memory objects when one is freed. This switches SEGFAULT issues to occa::exception errors that can be more easily debugged.
  • [527494c949e95451f491c710592e9cd91e52b56f] Linalg methods reuse device buffers for reductions
  • [ce46013ba795f4c1863d63140aa05c4fa8e013a2] Loading cached kernels are sped up by avoiding locks if possible
  • [e27b29e9ce64be2289075ecd522a5b0d48ba1e67] Added occaJson
  • [fdd2d7c342cd82b5084a5edd626b8e94d44a85e8] Added occaCreateDeviceFromString
  • [fdd2d7c342cd82b5084a5edd626b8e94d44a85e8] Added CLI to C exampleOpenCL mode
  • [959ec4a7e1d28876d7dc81804cfcd99cd356adb7] Added UVA methods to C API
  • [7735c669eb001b7c00d4b56ec2cbb61d1a4f8b27] The occa::stream class can now be extended
  • [f81054dfacc83b4897e459c55edb2badd85b4caf] The occa::streamTag class can now be extended

🐛 Bugs Fixed

  • [99ce6fb54fed6e28abb2db2e006565c05a9edbd2] Linalg properly deletes array allocations @jdahm
  • [b7384bcc0eb64edfd74211a99a5024c94c291c7d] Kernel hashes is generated only from needed props (e.g. ignores verbose)
  • [780a06ae273ea037bf21452600ffdb17b93a312c] OpenCL __global, __local, and __kernel are properly inserted in the beginning
  • [dba0db9741f627baed9f13f245e2ca791515d1d5] memory::slice was improperly freeing UVA pointers in
  • [3260a053eedb53867391610895f03d03a4e8599c] The verbose property was being overwritten in CUDA mode

🎉 Contributors

  • @v-dobrev
  • @jdahm