oneAPI DPC++ Library (oneDPL) https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/dpc-library.html
histogram
algorithms for generating a histogram from an input sequence into an output sequence representing either equally spaced or user-defined bins. These algorithms are currently only available for device execution policies.transform
algorithm.permutation_iterator
as input to oneDPL algorithms for a variety of source iterator and permutation types which caused issues.zip_iterator
to be sycl device copyable
for trivially copyable source iterator types.ONEDPL_WORKAROUND_FOR_IGPU_64BIT_REDUCTION
macro to 1
before including oneDPL header files.oneapi::dpl::reverse_iterator
or std::reverse_iterator
as input to oneDPL algorithms with device execution policies.radix_sort
and radix_sort_by_key
algorithms residing in
the oneapi::dpl::experimental::kt::esimd
namespace. These algorithms are first
in the family of kernel templates that allow configuring a variety of parameters
including the number of elements to process by a work item, and the size of a workgroup.
The algorithms only work with Intel® Data Center GPU Max Series.transform_if
algorithm for applying a transform function conditionally
based on a predicate, with overloads provided for one and two input sequences
that use correspondingly unary and binary operations and predicates.esimd::radix_sort
and esimd::radix_sort_by_key
kernel templates fail to compile when a program
is built with -g, -O0, -O1 compiler options.esimd::radix_sort_by_key
kernel template produces wrong results with the following combinations
of kernel_param
and types of keys and values:
sizeof(key_type) + sizeof(val_type) == 12
, kernel_param::workgroup_size == 64
, and kernel_param::data_per_workitem == 96
sizeof(key_type) + sizeof(val_type) == 16
, kernel_param::workgroup_size == 64
, and kernel_param::data_per_workitem == 64
select
, submit
and submit_and_wait
,
and several selection policies: fixed_resource_policy
, round_robin_policy
,
dynamic_load_policy
, and auto_tune_policy
.unseq
and par_unseq
policies now enable vectorization also for Intel® oneAPI DPC++/C++ Compiler.reduce_by_segment
,
exclusive_scan_by_segment
, and inclusive_scan_by_segment
.merge
, sort
, stable_sort
, sort_by_key
,
reduce
, min_element
, max_element
, minmax_element
, is_partitioned
, and
lexicographical_compare
algorithms with DPC++ execution policies.reduce_async
function to not ignore the provided binary operation.-fsycl-pstl-offload
option of Intel® oneAPI DPC++/C++ compiler and with
libstdc++
version 8 or libc++
, oneapi::dpl::execution::par_unseq
offloads
standard parallel algorithms to the SYCL device similarly to std::execution::par_unseq
in accordance with the -fsycl-pstl-offload
option value.-fsycl-pstl-offload
option of Intel® oneAPI DPC++/C++ compiler, a linking issue or program crash may be encountered due to the directory
containing libpstloffload.so not being included in the search path. Use the env/vars.sh to configure the working
environment to avoid the issue.exclusive_scan_by_segment
on Windows.set_intersection
with a DPC++ execution policy,
where elements are copied from the second input range rather than the first input range.transform_exclusive_scan
and exclusive_scan
to run in-place (that is, with the same data
used for both input and destination) and with an execution policy of unseq
or par_unseq
,
it is required that the provided input and destination iterators are equality comparable.
Furthermore, the equality comparison of the input and destination iterator must evaluate to true.
If these conditions are not met, the result of these algorithm calls is undefined.sort
, stable_sort
, sort_by_key
, partial_sort_copy
algorithms may work incorrectly or cause
a segmentation fault when used a DPC++ execution policy for CPU device, and built
on Linux with Intel® oneAPI DPC++/C++ Compiler and -O0 -g compiler options.
To avoid the issue, pass -fsycl-device-code-split=per_kernel
option to the compiler.exclusive_scan
, inclusive_scan
, transform_exclusive_scan
,
transform_inclusive_scan
, exclusive_scan_by_segment
, inclusive_scan_by_segment
, reduce_by_segment
with unseq
or par_unseq
policy when compiled by Intel® oneAPI DPC++/C++ Compiler
with -fiopenmp
, -fiopenmp-simd
, -qopenmp
, -qopenmp-simd
options on Linux.
To avoid the issue, pass -fopenmp
or -fopenmp-simd
option instead.reduce
and transform_reduce
with 64-bit types and std::multiplies
,
sycl::multiplies
operations when compiled by Intel® C++ Compiler 2021.3 and newer and executed on GPU devices.sort_by_key
algorithm for key-value sorting.reduce
, min_element
, max_element
, minmax_element
,
is_partitioned
, and lexicographical_compare
algorithms with DPC++ execution policies.reduce_by_segment
, inclusive_scan_by_segment
, and
exclusive_scan_by_segment
algorithms for binary operators with known identities
when using DPC++ execution policies.value_type
to all views in oneapi::dpl::experimental::ranges
.oneapi::dpl::experimental::ranges::sort
to support projections applied to the range elements prior to comparison.oneDPLIntelLLVMConfig.cmake
to resolve issues using CMake 3.20+ on Windows for icx and icx-cl.sort
and stable_sort
algorithms when performing a descending sort
on signed numeric types with negative values.reduce_by_segment
algorithm when a non-commutative predicate is used.sort
and stable_sort
algorithms for integral types wider than 4 bytes.unseq
and par_unseq
policies on
CPUs with the Intel® C++ Compiler 2021.8.sort
algorithm performance for the arithmetic data types with std::less
or std::greater
comparison operator and DPC++ policy.transform_reduce
, minmax_element
, and related algorithms when ran on CPU devices.transform_reduce
, minmax_element
, and related algorithms on FPGAs.permutation_iterator
to support C-style array as a permutation map.generate
, generate_n
, transform
algorithms to Tested Standard C++ API.inclusive_scan
, exclusive_scan
, reduce
and
max_element
algorithms with DPC++ execution policies.TBB headers not found
issue occurring with libstdc++ version 9 when
oneTBB headers are not present in the environment. The workaround requires inclusion of the oneDPL headers before the libstdc++ headers.exclusive_scan
algorithm when the output iterator is equal to the
input iterator (in-place scan).<complex>
and more APIs from <cmath>
and <limits>
standard headers to Tested Standard C++ API.sort
and stable_sort
algorithms on GPU devices when using Radix sort*.oneapi::dpl::experimental::ranges::guard_view
and oneapi::dpl::experimental::ranges::zip_view
when using operator[]
with an index exceeding the limits of a 32 bit integer type.upper_bound
, lower_bound
and binary_search
algorithms.Removed support of C++11 and C++14.
Changed the size and the layout of the discard_block_engine
class template.
For further details, please refer to 2022.0 Changes
*The sorting algorithms in oneDPL use Radix sort for arithmetic data types compared with std::less
or std::greater
, otherwise Merge sort.
is_heap
, is_heap_until
, make_heap
, push_heap
, pop_heap
, is_sorted
, is_sorted_until
, partial_sort
, partial_sort_copy
. Please refer to Tested Standard C++ API Reference.dpl = oneapi::dpl
into all public headers.reduce_by_segment
algorithm.upper_bound
, lower_bound
and binary_search
algorithms.seq
, unseq
, par
, par_unseq
).
C++17 is the minimal required version going forward.reduce_by_segment
used with
a device_policy object that has no explicit kernel name.CL_OUT_OF_RESOURCES
issue for Radix sort algorithm executed on CPU devices.exclusive_scan_by_segment
, inclusive_scan_by_segment
, reduce_by_segment
algorithms applied to
device-allocated USM.