A state-of-the-art multithreading runtime: message-passing based, fast, scalable, ultra-low overhead
Weave now targets Nim 1.2.0 instead of devel
. This is the first Nim release
that supports all requirements of Weave.
Weave now provides an experimental "dataflow parallelism" mode. Dataflow parallelism is also known under the following names:
Concretely this allows delaying tasks until a condition is met.
This condition is called Pledge
.
Programs can now create a "computation graph"
or a pipeline of tasks ahead of time that depends on one or more Pledge
.
For example a game engine might want to associate a pipeline of transformations
to each frame and once the frame prerequisites are met, set the Pledge
to fulfilled
.
The Pledge
can be combined with parallel loops and programs can wait on specific
iterations or even iteration ranges for example to implement parallel video processing
as soon as a subset of the frame is ready instead of waiting for the whole frame.
This exposes significantly more parallelism opportunities.
Dataflow parallelism cannot be used with the C++ backend at the moment.
Weave now provides the 3 main parallelism models:
Weave scalability has been carefully measured and improved.
On matrix multiplication, the traditional benchmark to classify the top 500 supercomputers of the world, Weave speedup on an 18-core CPU is 17.5x while the state-of-the-art Intel implementation using OpenMP allows 15.5x-16x speedup.
⚠️ usual disclaimer: pre-release, bugs, don't use in critical systems without thorough testing.
Fixing data parallelism performance bugs and proving Weave relevancy as an alternative High-Performance-Computing runtime was the focus of this release.
Weave can now be compiled with Microsoft Visual Studio in C++ mode.
sync(Weave)
has been renamed syncRoot(Weave)
to highlight that it is only valid on the root task in the main thread. In particular, a procedure that uses syncRoot should not be called be in a multithreaded section. This is a breaking change. In the future such changes will have a deprecation path but the library is only 2 weeks old at the moment.
parallelFor
, parallelForStrided
, parallelForStaged
, parallelForStagedStrided
now support an "awaitable" statement to allow fine-grain sync.
Fine-grained data-dependencies are under research (for example launch a task when the first 50 iterations are done out of a 100 iteration loops), "awaitable" may change to have an unified syntax for delayed tasks depending on a task, a whole loop or a subset of it.
If possible, it is recommended to use "awaitable" instead of syncRoot()
to allow composable parallelism, syncRoot()
can only be called in a serial section of the code.
"LastVictim" and "LastThief" WV_Target policy have been added. The default is still "Random", pass "-d:WV_Target=LastVictim" to explore performance on your workload with an alternate steal policy.
"StealEarly" has been implemented, the default is not to steal early, pass "-d:WV_StealEarly=2" for example to allow workers to initiate a steal request when 2 tasks or less are left in their queue.
Weave has been thoroughly tested and tuned on state-of-the-art matrix multiplication implementation against competing pure Assembly, hand-tuned BLAS implementations to reach High-performance Computing scalability standards.
3 cases can trigger loop splitting in Weave:
Fixed strided loop iteration rounding Fixed compilation with metrics
Executing a loop now counts as a single task for the adaptative steal policy. This prevents short loops from hindering steal-half strategy as it depends on the number of tasks executed per steal requests interval.
Weave now supports Windows in addition to Linux, macOS and all platforms with Pthreads.
Furthermore, Weave Backoff system has been reworked, formally verified to be dead-lock free. It is now enabled by default and is without any noticeable performance impact. It allows Weave to park idle threads to save power.
Side-story: In the process a critical bug in glibc and musl implementation of condition variable has been found.
This is a preview of Weave.
Linux is well tested. Not compatible with Windows OSX might work.
Discussion thread: https://github.com/nim-lang/RFCs/issues/160 Forum: Coming soon