High-level C++ for Accelerator Clusters
Right on time for the holidays we bring you a new major release with several new features, quality of life improvements and debugging facilities.
Thanks to everybody who contributed to this release: @fknorr, @GagaLP, @PeterTh, @psalz!
distr_queue::fence
and buffer_snapshot
APIs introduced in Celerity 0.4.0 are now stable (#225).experimental::constrain_split
API (#212).experimental:hint
API can be used to control how a kernel is split across worker nodes (#227).We recommend using the following SYCL versions with this release:
CELERITY_PRINT_GRAPHS
to control whether task and command graphs are logged (#197, #236)for_each_item
utility to iterate over a celerity range (#199)CELERITY_HORIZON_STEP
and CELERITY_HORIZON_MAX_PARALLELISM
to control Horizon generation (#199)CELERITY_ACCESSOR_BOUNDARY_CHECK
) (#211)debug::set_task_name
utility for naming tasks to aid debugging (#213)experimental::constrain_split
API to limit how a kernel can be split (#212)distr_queue::fence
and buffer_snapshot
are now stable, subsuming the experimental::
APIs of the same name (#225)experimental::hint
API for providing the runtime with additional information on how to execute a task (#227)experimental::hints::split_1d
and experimental::hints::split_2d
task hints for controlling how a task is split into chunks (#227)This is a small bugfix release primarily restoring Celerity's compatibility with the most recent versions of hipSYCL and DPC++.
We recommend using the following SYCL versions with this release:
See our platform support guide for a complete list of all officially supported configurations.
CELERITY_DRY_RUN_NODES
) in the presence of fences or graph horizons (#196, 069f5029)We are back with a major release that touches all aspects of Celerity, bringing considerable improvements to its APIs, usability and performance.
Thanks to everybody who contributed to this release: @almightyvats @BlackMark29A @facuMH @fknorr @PeterTh @psalz!
host_task
s, such as file handles for I/O operations, can now be explicitly managed by the runtime through a new experimental declarative API: A host_object
encapsulates arbitrary host-side objects, while side_effects
are used to read and/or mutate them, analogously to buffer
and accessor
. Embracing this new pattern will guarantee correct lifetimes and synchronization around these objects. (#68).fence
API allows accessing buffer and host-object data from the main thread without manual synchronization and reimagines SYCL's host accessors in a way that is more compatible with Celerity's asynchronous execution model (#151).CELERITY_ACCESSOR_BOUNDARY_CHECK
can be set to enable out-of-bounds buffer access detection at runtime inside device kernels to detect errors such as incorrectly-specified range-mappers, at the cost of some runtime overhead. This check is enabled by default for debug builds of Celerity (#178).We recommend using the following SYCL versions with this release:
See our platform support guide for a complete list of all officially supported configurations.
host_object
and side_effect
APIs to express non-buffer dependencies between host tasks (#68, 7a5326a4c5edf348deedd6f9a7f6e9360ea22684)CELERITY_GRAPH_PRINT_MAX_VERTS
config options (#80, d3dd722c984e5a4c897f056774dd1863552dd128)distr_queue
constructor (#113, 556b6f2b28d794006f76b28c544abb0a8ea2f1cb)CELERITY_DRY_RUN_NODES
environment variable to simulate the scheduling of an application on a large number of nodes (without execution or data transfers) (#125, 299ebbf3661574dc260510ac02686e)fence
API for accessing buffer and host-object data from the main thread (#151, 6b803f8140cbf294b77b6def621b4004ecc18786)CELERITY_USE_MIMALLOC
CMake configuration option to use the mimalloc allocator (enabled by default) (#170, 234e3d23f328944225a5aefd0c50ba20a9b09c8b)CELERITY_ACCESSOR_BOUNDARY_CHECK
CMake option to detect out-of-bounds buffer accesses inside device kernels (enabled by default for debug builds) (#178, 2c738c846a70aee034c568782fbdf349be2)CELERITY_*
environment variables (#158, b2ced9b3b72eedb72141cf4526269efe6c64c273)range
and id
types instead of aliasing SYCL types (#163, 0685d944768bde29239c31066eb6d4fa1329e2cf)sycl::device
to distr_queue
constructor (use a device selector instead) (#113, 556b6f2b28d794006f76b28c544abb0a8ea2f1cb)allow_by_ref
is no longer required to capture references into command group functions (#173, 0a743c7bec9091f2c85eb87bedfca7104e33db9a)host_memory_layout
(use buffer_allocation_window
instead) (#187, f5e6510ac3bbde7fe882d9dbb7fa1e9043ef5a50)one_to_one
, fixed
and all
range mappers (#187, 40a12a4004c7a5fcca2f30af4b443356aa728500)sycl::item
(use celerity::item
instead), this was already broken in 0.3.2 (#163, 67ccacce703553d1056bfaf3ae2ddb342372bf88)operator[]
with SYCL 2020 spec by returning const-reference instead of value (#156, 5011dedfe0f8c60103d972d316d4aae7fadf772d)This release fixes several bugs discovered since the v0.3.1 release. It also improves SYCL backend support and adds minor debugging features.
sycl::atomic
from DPC++. (39dacdf5)This release contains several fixes for bugs discovered since the v0.3.0 release.
task_manager::notify_horizon_executed
(only in debug builds). (f641bcb)user_benchmarker
. (d1c9e51)wave_sim
example to avoid host side race condition for certain --sample-rate
configurations. (d226b95)The 0.3.0 release brings with it many new features and improvements, including several changes that further align Celerity with the new SYCL 2020 API.
See the changelog for a full summary of all changes.
parallel_for
alongside local_accessor
and group
functions.celerity
namespace for most supported SYCL features (e.g. celerity::access_mode
, celerity::item
and so on). This means that Celerity code no longer has to mix names from the sycl
and celerity
namespaces. While still possible for the most part, we recommend sticking to celerity::
!celerity::accessor acc{my_buffer, cgh, celerity::access::one_to_one{}, celerity::write_only}
.Note: We recommend using the following SYCL versions with this release:
This release contains a few fixes for bugs introduced in version 0.2.0.
It's been a while since our last release, but we have been busy!
See the changelog for a full summary of all changes.
with_master_access
has been replaced by a much more powerful host_task
API, which can schedule host-side code both on the master node as well as on worker nodes. It also features experimental support for integration with collective operations, such as parallel HDF5 I/O.See our docs on how to get started with Celerity (available either as markdown or on our website).
Note: If you are using hipSYCL, make sure to use the newest version from the develop
branch.
In light of our recently released website we think it is the perfect time to also cut our first official release! The code base has reached a point where -- while not completely pain free -- we think it is ready to be compiled and experimented with by interested (and somewhat adventurous) external users!
If you encounter any problems with the documentation, build process or the examples, or if you have any other feedback you'd like to share, please don't hesitate to open an issue!
Note that, as mentioned in the readme and on the website, Celerity is a research project first and foremost, and it's still in its early stages. It will most likely not yet fully support your use case, and things will break!