YASK--Yet Another Stencil Kit: a domain-specific language and framework to create high-performance stencil code for implementing finite-difference methods and similar applications.
YASK is a framework to rapidly create high-performance stencil code including optimizations and features such as
YASK contains a domain-specific compiler to convert stencil-equation specifications to optimized code for Intel(R) Xeon(R) processors, Intel(R) Xeon Phi(TM) processors, and Intel(R) graphics processors.
librt
and libnuma
.indent
or gindent
utility, used automatically during the build process
to make the generated code easier for humans to read.
You'll get a warning when running make
if one of these doesn't exist.
Everything will still work, but the generated code will be difficult to read.
Reading the generated code is only necessary for debug, performance analysis, etc.numpy
package for running Python interface tests.
Included with Intel(R) oneAPI HPC Toolkit.void* {set,get}_elements_in_slice()
APIs and provides safer float*
and double*
versions.A(t+1, x, y) EQUALS B(t, x, y+1)
.
(-1 is used for less-common reverse-time stencils.)yk_solution::get_var()
API now throws an exception if the
named var does not exist. (Used to return std::nullptr
.)-eq_bundles
option, and
an exception is now thrown from output_solution()
if the
format string is unrecognized.make offload=1 ...
. This will create
a kernel library and executable with an "arch" field containing
"offload" and the OpenMP device target name.
Use make offload=1 offload_arch=<target>
to change the OpenMP target;
the default is spir64
, for GPUs with Intel(R) Architecture (e.g., Gen12).
Use make offload_usm=1
to use the OpenMP Unified Shared Memory model.make YK_CXX=<compiler> ...
for the kernel, and/or make YC_CXX=<compiler> ...
for the YASK compiler,
or make CXX=<compiler>
for both. A C++ compiler that supports C++17
is now required.get_region_size()
and set_region_size()
APIs have been removed.
The -r
and -sb
options, e.g., -rx
and -sbx
, have also been removed.-block_threads
is deprecated.
The option -thread_divisor
has been removed.
See the -help
documentation for new options -outer_threads
and -inner_threads
.
The -max_threads
option remains.yask.sh
by passing -outer_threads <N>
to the executable,
where <N>
is the number of cores on the node divided by the
number of MPI ranks.
Consequently, the default number of inner threads is now one (1)
to use one core per block.
This change was made based on observed
performance on newer Intel(R) Xeon(R) Processors. Previous versions
used two threads per block by default and used both hyper-threads if
they were enabled. To configure two hyper-threads to work cooperatively
on each block, use the option -inner_threads 2
.
These changes do not
apply to Intel(R) Xeon Phi(TM) x200-family processors (KNL), which
continue to use all 4 hyper-threads per core and 8 inner threads
by default (because 2 cores share an L2 cache).yk_solution
and yk_var
to allow getting
or setting multiple dimensions in one API call.new_relative_var_point()
API is deprecated.yk_var
class have been removed.-use_shm
to true
.
Use -no-use_shm
to disable shared-memory inter-rank communication.auto_tune_each_pass
changed to
auto_tune_each_stage
.-trace
and -msg_rank
options from the kernel
library to the kernel utility, so those options may no longer be set via
yk_solution::apply_command_line_options()
. APIs to set the corresponding
options are now in yk_env
. This allows configuring the debug output
before a yk_solution
is created.SolutionBase
and GridValue
and undocumented macros such as
MAKE_GRID
was replaced with an expanded version of the documented YASK
compiler API. Canonical v2 DSL code should still work using the
Soln.hpp
backward-compatibility header file. To convert v2 DSL code
to v3 format, use the ./utils/bin/convert_v2_stencil.pl
utility.
Conversion is recommended.utils/bin/yask_log_to_csv.pl
script.3axis
, 3plane
, etc.)
to calculate simple averages like those in the MiniGhost benchmark.
This reduced the number of floating-point operations but not the number of points read
for each stencil.yk_grid::get_element()
and similar APIs.
Previously, invalid values silently "wrapped" around to valid values.
Now, by default, the step index must be valid when reading, and the valid step
indices are updated when writing.
The old behavior of silent index wrapping may be restored via set_step_wrap(true)
.
The default for all strict_indices
API parameters is now true
to catch more
programming errors and
increase consistency of behavior between "set" and "get" APIs.
Also, the advanced share_storage()
APIs have been replaced with fuse_grids()
.-auto_tune_each_pass
.make
and bin/yask.sh
and
number of MPI ranks in bin/yask.sh
.
This changed the old behavior of make
defaulting to snb
architecture and
bin/yask.sh
requiring -arch
and -ranks
.
Those options are still available to override the host-based default.utils/bin/yask_log_to_csv.pl
.yc_grid::set_dynamic_step_alloc(true)
to allow changing the
allocation in the step (time) dimension at run-time for grid variables created at YASK compile-time.yask.sh
)
to run trials for a given amount of time instead of a given number of steps.
As of version 2.13.08, use the -trial_time
option to specify the number of seconds to run.
To force a specific number of trials as in previous versions, use the -trial_steps
option.yk_stats
APIs affected.==
operator for asserting equality between
a grid point and an equation. Use EQUALS
instead.src/compiler/gen
and src/kernel/gen
directories..hpp
to .cpp
files.
See the notes in https://github.com/intel/yask/releases/tag/v2.09.00 if you have any new
or modified code in src/stencils
.