The Intel® oneAPI DPC++ Library (oneDPL) accompanies the Intel® oneAPI DPC++/C++ Compiler and provides high-productivity APIs aimed to minimize programming efforts of C++ developers creating efficient heterogeneous applications.
- Removed the
ONEDPL_DEVICE_TYPEandONEDPL_DEVICE_BACKENDCMake options. UseONEAPI_DEVICE_SELECTORenvironment variable or compiler options for device selection instead. - Removed the
ONEDPL_USE_AOT_COMPILATION,ONEDPL_AOT_ARCH, andONEDPL_FPGA_STATIC_REPORTCMake options. Use the appropriate compiler flags to control ahead-of-time compilation.
exclusive_scan_by_segment,inclusive_scan_by_segment,reduce_by_segment, andhistogramalgorithms are added to the<oneapi/dpl/numeric>header, and this header is now recommended to use for these algorithms.- Improved performance of
sort,stable_sort,sort_by_key, andstable_sort_by_keywhen using Radix sort [1] and device policies. - Moved
philox_engineto theoneapi::dplnamespace and fixed incorrect results for instantiations with non-predefined word size values andstd::uint_fast64_t. - Added experimental
radix_sortandradix_sort_by_keyalgorithms in theoneapi::dpl::experimental::kt::gpunamespace. The implementation has been verified on Intel® Arc B580 Graphics and Intel® Data Center GPU Max Series. - The
experimental::ranges::zip_viewrange adaptor can now be used with the oneDPL parallel range algorithms and C++20 random access ranges. - Improved performance of multiple (30+) algorithms with
parandpar_unseqexecution policies and data sizes from 50K to 4M elements when built with the OpenMP backend and Intel® oneAPI DPC++/C++ Compiler.
Support for FPGA devices in oneDPL algorithms, the corresponding execution policies fpga_policy and dpcpp_fpga,
the make_fpga_policy function, and the enabling macros ONEDPL_FPGA_DEVICE and ONEDPL_FPGA_EMULATOR
are deprecated and will be removed in a future release.
- Fixed validation of minimal requirements for range-based algorithms. They require clang 16 and newer instead of the corresponding libc++ versions.
- Fixed
ranges::unique_copyto allow output ranges of any size. - Fixed the default template argument for the new value type in ranges::replace and ranges::replace_if to not use projections.
- Removed excessive data copying when using device policies and host allocated data with several algorithms:
fill,generate,transform_if,binary_search,lower_bound,upper_bound,histogram,unique_copy,uninitialized_copy,uninitialized_move,uninitialized_fill,uninitialized_value_construct,uninitialized_default_construct, anddestroy.
ranges::unique_copywith the output size smaller than the input size may lose performance on GPU devices.kt::gpu::radix_sort_by_keyfunction may produce incorrect results on RHEL 10 or earlier when run on Intel® Data Center GPU Max Series with SYCL buffer passed as input data and no optimization flags passed to the device compiler.
See oneDPL Guide for other restrictions and known limitations.
ranges::copy_ifwith the output size smaller than the input size may lose performance on GPU devices.set_union,set_intersection,set_difference,set_symmetric_differencerange algorithms require the output range to have sufficient size to hold all resulting elements.histogramalgorithm requires the output value type to be an integral type no larger than four bytes when used with a device policy on hardware that does not support 64-bit atomic operations.- For
transform_exclusive_scanandexclusive_scanto run in-place (that is, with the same data used for both input and destination) and with an execution policy ofunseqorpar_unseq, it is required that the provided input and destination iterators are equality comparable. Furthermore, the equality comparison of the input and destination iterator must evaluate to true. If these conditions are not met, the result of these algorithm calls is undefined.
- Added tools for easier customization of policies and backends for experimental dynamic selection feature, see documentation for more details.
- Simplified public submission API for experimental dynamic selection feature, removing
oneapi::dpl::experimental::selectand addingoneapi::dpl::experimental::try_submit. - Enabled list-initialization for algorithms. This aligns the API with C++26 but is supported for C++17 and above.
- Fixed
ranges::copy_ifto allow output ranges of any size. - Fixed a compilation error that occurs with device policies when calling
oneapi::dpl::reducemultiple times with differing parameter types. - Removed the requirement that ranges passed to range-based algorithms support the subscipt operator. This did not comply with the requirement of the C++ standard.
ranges::copy_ifwith the output size smaller than the input size may lose performance on GPU devices.
See oneDPL Guide for other restrictions and known limitations.
unique_copy,set_union,set_intersection,set_difference,set_symmetric_differencerange algorithms require the output range to have sufficient size to hold all resulting elements.histogramalgorithm requires the output value type to be an integral type no larger than four bytes when used with a device policy on hardware that does not support 64-bit atomic operations.- For
transform_exclusive_scanandexclusive_scanto run in-place (that is, with the same data used for both input and destination) and with an execution policy ofunseqorpar_unseq, it is required that the provided input and destination iterators are equality comparable. Furthermore, the equality comparison of the input and destination iterator must evaluate to true. If these conditions are not met, the result of these algorithm calls is undefined. - Incorrect results may be produced by
exclusive_scan,inclusive_scan,transform_exclusive_scan,transform_inclusive_scan,exclusive_scan_by_segment,inclusive_scan_by_segment,reduce_by_segmentwithunseqorpar_unseqpolicy when compiled by Intel® oneAPI DPC++/C++ Compiler 2024.1 or earlier with-fiopenmp,-fiopenmp-simd,-qopenmp,-qopenmp-simdoptions on Linux. To avoid the issue, pass-fopenmpor-fopenmp-simdoption instead.
The ONEDPL_USE_AOT_COMPILATION and ONEDPL_AOT_ARCH CMake options are deprecated and will be removed in a future
release. Please use the relevant compiler flags to enable this feature.
- Added parallel range algorithms in
namespace oneapi::dpl::ranges:set_intersection,set_union,set_difference,set_symmetric_difference,includes,unique,unique_copy,destroy,uninitialized_fill,uninitialized_move,uninitialized_copy,uninitialized_value_construct,uninitialized_default_construct,reverse,reverse_copy,swap_ranges. These algorithms operate with C++20 random access ranges. - Improved performance of
gpu::inclusive_scankernel template and added support for binary operator and type combinations which do not have a SYCL known identity. - Improved performance of
inclusive_scan_by_segment,exclusive_scan_by_segment,set_union,set_difference,set_intersection, andset_symmetric_differencewhen using device policies. - Improved performance of search operations (e.g.,
find,all_of,equal,search, etc.),is_heapandis_heap_untilalgorithms on Intel® Arc™ B-series GPU devices.
- Removed requirement of GPU double precision support to use
set_union,set_difference,set_intersection, andset_symmetric_differenceon Windows operating systems. - Removed default-constructible requirements from the value type for
reduceandtransform_reducealgorithms, as well as copy-constructible requirements when these algorithms are used with a native ("host") policy. - Fixed an issue with
ranges::mergewhen projections of the two input ranges were not the same. - Fixed
equalreturning afalsefor empty input sequences; now it returnstrue. - Fixed a compilation error SYCL kernel cannot use exceptions occurring with libstdc++ version 10 when calling
adjacent_find,is_sortedandis_sorted_untilrange algorithms with device policies. - Fixed an issue with
PSTL_USE_NONTEMPORAL_STORESmacro having no effect. - Fixed a bug where
uniquecalled with a device policy returned an incorrect result iterator. - Fixed a bug in
exclusive_scan,inclusive_scan,transform_exclusive_scan,transform_inclusive_scan,exclusive_scan_by_segment, andinclusive_scan_by_segmentalgorithms when using device policies with different input and output value types. - Fixed a bug in return value types of
minmax_elementandmismatchrange algorithms. - Fixed compile errors in
set_unionandset_symmetric_differencewhen using device policies with different second-input and output value types.
copy_if,unique_copy,set_union,set_intersection,set_difference,set_symmetric_differencerange algorithms require the output range to have sufficient size to hold all resulting elements.
See oneDPL Guide for other restrictions and known limitations.
histogramalgorithm requires the output value type to be an integral type no larger than four bytes when used with a device policy on hardware that does not support 64-bit atomic operations.- For
transform_exclusive_scanandexclusive_scanto run in-place (that is, with the same data used for both input and destination) and with an execution policy ofunseqorpar_unseq, it is required that the provided input and destination iterators are equality comparable. Furthermore, the equality comparison of the input and destination iterator must evaluate to true. If these conditions are not met, the result of these algorithm calls is undefined. - Incorrect results may be produced by
exclusive_scan,inclusive_scan,transform_exclusive_scan,transform_inclusive_scan,exclusive_scan_by_segment,inclusive_scan_by_segment,reduce_by_segmentwithunseqorpar_unseqpolicy when compiled by Intel® oneAPI DPC++/C++ Compiler 2024.1 or earlier with-fiopenmp,-fiopenmp-simd,-qopenmp,-qopenmp-simdoptions on Linux. To avoid the issue, pass-fopenmpor-fopenmp-simdoption instead.
- Added parallel range algorithms in
namespace oneapi::dpl::ranges:fill,move,replace,replace_if,remove,remove_if,mismatch,minmax_element,min,max,find_first_of,find_end,is_sorted_until. These algorithms operate with C++20 random access ranges. - Improved performance of set operation algorithms when using device policies:
set_union,set_difference,set_intersection,set_symmetric_difference. - Improved performance of
copy,fill,for_each,replace,reverse,rotate,transformand 30+ other algorithms with device policies on GPUs when usingstd::reverse_iterator. - Added ADL-based customization point
is_onedpl_indirectly_device_accessible, which can be used to mark iterator types as indirectly device accessible. Added public traitoneapi::dpl::is_directly_device_accessible[_v]to query if types are indirectly device accessible.
- Eliminated runtime exceptions encountered when compiling code that called
inclusive_scan,copy_if,partition,unique,reduce_by_segment, and related algorithms with device policies using the open source oneAPI DPC++ Compiler without specifying an optimization flag. - Fixed a compilation error in
reduce_by_segmentregarding return type deduction when called with a device policy. - Eliminated multiple compile time warnings throughout the library.
- The set_intersection, set_difference, set_symmetric_difference, and set_union algorithms with a device policy require GPUs with double-precision support on Windows, regardless of the value type of the input sequences.
See oneDPL Guide for other restrictions and known limitations.
histogramalgorithm requires the output value type to be an integral type no larger than four bytes when used with a device policy on hardware that does not support 64-bit atomic operations.histogrammay provide incorrect results with device policies in a program built with-O0option and the driver version is 2448.13 or older.- For
transform_exclusive_scanandexclusive_scanto run in-place (that is, with the same data used for both input and destination) and with an execution policy ofunseqorpar_unseq, it is required that the provided input and destination iterators are equality comparable. Furthermore, the equality comparison of the input and destination iterator must evaluate to true. If these conditions are not met, the result of these algorithm calls is undefined. - Incorrect results may be produced by
exclusive_scan,inclusive_scan,transform_exclusive_scan,transform_inclusive_scan,exclusive_scan_by_segment,inclusive_scan_by_segment,reduce_by_segmentwithunseqorpar_unseqpolicy when compiled by Intel® oneAPI DPC++/C++ Compiler 2024.1 or earlier with-fiopenmp,-fiopenmp-simd,-qopenmp,-qopenmp-simdoptions on Linux. To avoid the issue, pass-fopenmpor-fopenmp-simdoption instead. - With libstdc++ version 10, the compilation error SYCL kernel cannot use exceptions occurs
when calling the range-based
adjacent_find,is_sortedoris_sorted_untilalgorithms with device policies.
- Added support of host policies for
histogramalgorithms. - Added support for an undersized output range in the range-based
mergealgorithm. - Improved performance of the
mergeand sorting algorithms (sort,stable_sort,sort_by_key,stable_sort_by_key) that rely on Merge sort [1], with device policies for large data sizes. - Improved performance of
copy,fill,for_each,replace,reverse,rotate,transformand 30+ other algorithms with device policies on GPUs. - Improved oneDPL use with SYCL implementations other than Intel® oneAPI DPC++/C++ Compiler.
Fixed an issue with
drop_viewin the experimental range-based API.Fixed compilation errors in
find_ifandfind_if_notwith device policies where the user provided predicate is device copyable but not trivially copyable.Fixed incorrect results or synchronous SYCL exceptions for several algorithms when compiled with
-O0and executed on a GPU device.Fixed an issue preventing inclusion of the
<numeric>header after<execution>and<algorithm>headers.Fixed several issues in the
sort,stable_sort,sort_by_keyandstable_sort_by_keyalgorithms that:- Allows the use of non-trivially-copyable comparators.
- Eliminates duplicate kernel names.
- Resolves incorrect results on devices with sub-group sizes smaller than four.
- Resolved synchronization errors that were seen on Intel® Arc™ B-series GPU devices.
- Incorrect results may be observed when calling
sortwith a device policy on Intel® Arc™ graphics 140V with data sizes of 4-8 million elements on Windows. This issue is resolved in Intel® oneAPI DPC++/C++ Compiler 2025.1 or later and Intel® Graphics Driver 32.0.101.6647 or later. sort,stable_sort,sort_by_keyandstable_sort_by_keyalgorithms fail to compile when using Clang 17 and earlier versions, as well as compilers based on these versions, such as Intel® oneAPI DPC++/C++ Compiler 2023.2.0.- When compiling code that uses device policies with the open source oneAPI DPC++ Compiler (clang++ driver),
synchronous SYCL runtime exceptions regarding unfound kernels may be encountered unless an optimization flag is
specified (for example
-O1) as opposed to relying on the compiler's default optimization level.
See oneDPL Guide for other restrictions and known limitations.
histogramalgorithm requires the output value type to be an integral type no larger than four bytes when used with an FPGA policy.histogrammay provide incorrect results with device policies in a program built with-O0option.- Compilation issues may be encountered when passing zip iterators to
exclusive_scan_by_segmenton Windows. - For
transform_exclusive_scanandexclusive_scanto run in-place (that is, with the same data used for both input and destination) and with an execution policy ofunseqorpar_unseq, it is required that the provided input and destination iterators are equality comparable. Furthermore, the equality comparison of the input and destination iterator must evaluate to true. If these conditions are not met, the result of these algorithm calls is undefined. - Incorrect results may be produced by
exclusive_scan,inclusive_scan,transform_exclusive_scan,transform_inclusive_scan,exclusive_scan_by_segment,inclusive_scan_by_segment,reduce_by_segmentwithunseqorpar_unseqpolicy when compiled by Intel® oneAPI DPC++/C++ Compiler with-fiopenmp,-fiopenmp-simd,-qopenmp,-qopenmp-simdoptions on Linux. To avoid the issue, pass-fopenmpor-fopenmp-simdoption instead.
- Improved performance of the
adjacent_find,all_of,any_of,copy_if,exclusive_scan,equal,find,find_if,find_end,find_first_of,find_if_not,inclusive_scan,includes,is_heap,is_heap_until,is_partitioned,is_sorted,is_sorted_until,lexicographical_compare,max_element,min_element,minmax_element,mismatch,none_of,partition,partition_copy,reduce,remove,remove_copy,remove_copy_if,remove_if,search,search_n,stable_partition,transform_exclusive_scan,transform_inclusive_scan,unique, andunique_copyalgorithms with device policies. - Improved performance of
sort,stable_sortandsort_by_keyalgorithms with device policies when using Merge sort [1]. - Added
stable_sort_by_keyalgorithm innamespace oneapi::dpl. - Added parallel range algorithms in
namespace oneapi::dpl::ranges:all_of,any_of,none_of,for_each,find,find_if,find_if_not,adjacent_find,search,search_n,transform,sort,stable_sort,is_sorted,merge,count,count_if,equal,copy,copy_if,min_element,max_element. These algorithms operate with C++20 random access ranges and views while also taking an execution policy similarly to other oneDPL algorithms. - Added support for operators ==, !=, << and >> for RNG engines and distributions.
- Added experimental support for the Philox RNG engine in
namespace oneapi::dpl::experimental. - Added the
<oneapi/dpl/version>header containing oneDPL version macros and new feature testing macros.
- Fixed unused variable and unused type warnings.
- Fixed memory leaks when using
sortandstable_sortalgorithms with the oneTBB backend. - Fixed a build error for
oneapi::dpl::beginandoneapi::dpl::endfunctions used with the Microsoft* Visual C++ standard library and with C++20. - Reordered template parameters of the
histogramalgorithm to match its function parameter order. For affectedhistogramcalls we recommend to remove explicit specification of template parameters and instead add explicit type conversions of the function arguments as necessary. gpu::esimd::radix_sortandgpu::esimd::radix_sort_by_keykernel templates now throwstd::bad_allocif they fail to allocate global memory.- Fixed a potential hang occurring with
gpu::esimd::radix_sortandgpu::esimd::radix_sort_by_keykernel templates. - Fixed documentation for
sort_by_keyalgorithm, which used to be mistakenly described as stable, despite being possibly unstable for some execution policies. If stability is required, usestable_sort_by_keyinstead. - Fixed an error when calling
sortwith device execution policies on CUDA devices. - Allow passing C++20 random access iterators to oneDPL algorithms.
- Fixed issues caused by initialization of SYCL queues in the predefined device execution policies.
These policies have been updated to be immutable (
const) objects.
histogrammay provide incorrect results with device policies in a program built with -O0 option.- Inclusion of
<oneapi/dpl/dynamic_selection>prior to<oneapi/dpl/random>may result in compilation errors. Include<oneapi/dpl/random>first as a workaround. - Incorrect results may occur when using
oneapi::dpl::experimental::philox_enginewith no predefined template parameters and with word_size values other than 64 and 32. - Incorrect results or a synchronous SYCL exception may be observed with the following algorithms built
with -O0 option and executed on a GPU device:
exclusive_scan,inclusive_scan,transform_exclusive_scan,transform_inclusive_scan,copy_if,remove,remove_copy,remove_copy_if,remove_if,partition,partition_copy,stable_partition,unique,unique_copy, andsort. - The value type of the input sequence should be convertible to the type of the initial element for the following
algorithms with device execution policies:
transform_inclusive_scan,transform_exclusive_scan,inclusive_scan, andexclusive_scan. - The following algorithms with device execution policies may exceed the C++ standard requirements on the number
of applications of user-provided predicates or equality operators:
copy_if,remove,remove_copy,remove_copy_if,remove_if,partition_copy,unique, andunique_copy. In all cases, the predicate or equality operator is appliedO(n)times. - The
adjacent_find,all_of,any_of,equal,find,find_if,find_end,find_first_of,find_if_not,includes,is_heap,is_heap_until,is_sorted,is_sorted_until,mismatch,none_of,search, andsearch_nalgorithms may cause a segmentation fault when used with a device execution policy on a CPU device, and built on Linux with Intel® oneAPI DPC++/C++ Compiler 2025.0.0 and -O0 -g compiler options.
See oneDPL Guide for other restrictions and known limitations.
histogramalgorithm requires the output value type to be an integral type no larger than 4 bytes when used with an FPGA policy.- Compilation issues may be encountered when passing zip iterators to
exclusive_scan_by_segmenton Windows. - For
transform_exclusive_scanandexclusive_scanto run in-place (that is, with the same data used for both input and destination) and with an execution policy ofunseqorpar_unseq, it is required that the provided input and destination iterators are equality comparable. Furthermore, the equality comparison of the input and destination iterator must evaluate to true. If these conditions are not met, the result of these algorithm calls is undefined. sort,stable_sort,sort_by_key,stable_sort_by_key,partial_sort_copyalgorithms may work incorrectly or cause a segmentation fault when used a device execution policy on a CPU device, and built on Linux with Intel® oneAPI DPC++/C++ Compiler and -O0 -g compiler options. To avoid the issue, pass-fsycl-device-code-split=per_kerneloption to the compiler.- Incorrect results may be produced by
exclusive_scan,inclusive_scan,transform_exclusive_scan,transform_inclusive_scan,exclusive_scan_by_segment,inclusive_scan_by_segment,reduce_by_segmentwithunseqorpar_unseqpolicy when compiled by Intel® oneAPI DPC++/C++ Compiler with-fiopenmp,-fiopenmp-simd,-qopenmp,-qopenmp-simdoptions on Linux. To avoid the issue, pass-fopenmpor-fopenmp-simdoption instead. - Incorrect results may be produced by
reduce,reduce_by_segment, andtransform_reducewith 64-bit data types when compiled by Intel® oneAPI DPC++/C++ Compiler versions 2021.3 and newer and executed on a GPU device. For a workaround, define theONEDPL_WORKAROUND_FOR_IGPU_64BIT_REDUCTIONmacro to1before including oneDPL header files. std::tuple,std::paircannot be used with SYCL buffers to transfer data between host and device.std::arraycannot be swapped in DPC++ kernels withstd::swapfunction orswapmember function in the Microsoft* Visual C++ standard library.- The
oneapi::dpl::experimental::ranges::reversealgorithm is not available with-fno-sycl-unnamed-lambdaoption. - STL algorithm functions (such as
std::for_each) used in DPC++ kernels do not compile with the debug version of the Microsoft* Visual C++ standard library.
- oneAPI DPC++ Library Manual Migration Guide to simplify the migration of Thrust* and CUB* APIs from CUDA*.
radix_sortandradix_sort_by_keykernel templates were moved intooneapi::dpl::experimental::kt::gpu::esimdnamespace. The formeroneapi::dpl::experimental::kt::esimdnamespace is deprecated and will be removed in a future release.- The
for_loop,for_loop_strided,for_loop_n,for_loop_n_stridedalgorithms in namespace oneapi::dpl::experimental are enforced to fail with device execution policies.
- Added experimental
inclusive_scankernel template algorithm residing in theoneapi::dpl::experimental::kt::gpunamespace. radix_sortandradix_sort_by_keykernel templates are extended with overloads for out-of-place sorting. These overloads preserve the input sequence and sort data into the user provided output sequence.- Improved performance of the
reduce,min_element,max_element,minmax_element,is_partitioned,lexicographical_compare,binary_search,lower_bound, andupper_boundalgorithms with device policies. sort,stable_sort,sort_by_keyalgorithms now use Radix sort [1] for sortingsycl::halfelements compared withstd::lessorstd::greater.
- Fixed compilation errors when using
reduce,min_element,max_element,minmax_element,is_partitioned, andlexicographical_comparewith Intel oneAPI DPC++/C++ compiler 2023.0 and earlier. - Fixed possible data races in the following algorithms used with device execution policies:
remove_if,unique,inplace_merge,stable_partition,partial_sort_copy,rotate. - Fixed excessive copying of data in
std::vectorallocated with a USM allocator for standard library implementations which have allocator information in thestd::vector::iteratortype. - Fixed an issue where checking
std::is_default_constructiblefortransform_iteratorwith a functor that is not default-constructible could cause a build error or an incorrect result. - Fixed handling of sycl device copyable for internal and public oneDPL types.
- Fixed handling of
std::reverse_iteratoras input to oneDPL algorithms using a device policy. - Fixed
set_intersectionto always copy from the first input sequence to the output, where previously some calls would copy from the second input sequence. - Fixed compilation errors when using
oneapi::dpl::zip_iteratorwith the oneTBB backend and C++20.
histogramalgorithm requires the output value type to be an integral type no larger than 4 bytes when used with an FPGA policy.
See oneDPL Guide for other restrictions and known limitations.
- When compiled with
-fsycl-pstl-offloadoption of Intel oneAPI DPC++/C++ compiler and with libstdc++ version 8 or libc++,oneapi::dpl::execution::par_unseqoffloads standard parallel algorithms to the SYCL device similarly tostd::execution::par_unseqin accordance with the-fsycl-pstl-offloadoption value. - When using the dpl modulefile to initialize the user's environment and compiling with
-fsycl-pstl-offloadoption of Intel® oneAPI DPC++/C++ compiler, a linking issue or program crash may be encountered due to the directory containing libpstloffload.so not being included in the search path. Use the env/vars.sh to configure the working environment to avoid the issue. - Compilation issues may be encountered when passing zip iterators to
exclusive_scan_by_segmenton Windows. - For
transform_exclusive_scanandexclusive_scanto run in-place (that is, with the same data used for both input and destination) and with an execution policy ofunseqorpar_unseq, it is required that the provided input and destination iterators are equality comparable. Furthermore, the equality comparison of the input and destination iterator must evaluate to true. If these conditions are not met, the result of these algorithm calls is undefined. sort,stable_sort,sort_by_key,partial_sort_copyalgorithms may work incorrectly or cause a segmentation fault when used a DPC++ execution policy for CPU device, and built on Linux with Intel® oneAPI DPC++/C++ Compiler and -O0 -g compiler options. To avoid the issue, pass-fsycl-device-code-split=per_kerneloption to the compiler.- Incorrect results may be produced by
exclusive_scan,inclusive_scan,transform_exclusive_scan,transform_inclusive_scan,exclusive_scan_by_segment,inclusive_scan_by_segment,reduce_by_segmentwithunseqorpar_unseqpolicy when compiled by Intel® oneAPI DPC++/C++ Compiler with-fiopenmp,-fiopenmp-simd,-qopenmp,-qopenmp-simdoptions on Linux. To avoid the issue, pass-fopenmpor-fopenmp-simdoption instead. - Incorrect results may be produced by
reduce,reduce_by_segment, andtransform_reducewith 64-bit data types when compiled by Intel® oneAPI DPC++/C++ Compiler versions 2021.3 and newer and executed on GPU devices. For a workaround, define theONEDPL_WORKAROUND_FOR_IGPU_64BIT_REDUCTIONmacro to1before including oneDPL header files. std::tuple,std::paircannot be used with SYCL buffers to transfer data between host and device.std::arraycannot be swapped in DPC++ kernels withstd::swapfunction orswapmember function in the Microsoft* Visual C++ standard library.- The
oneapi::dpl::experimental::ranges::reversealgorithm is not available with-fno-sycl-unnamed-lambdaoption. - STL algorithm functions (such as
std::for_each) used in DPC++ kernels do not compile with the debug version of the Microsoft* Visual C++ standard library.
- Added new
histogramalgorithms for generating a histogram from an input sequence into an output sequence representing either equally spaced or user-defined bins. These algorithms are currently only available for device execution policies. - Supported zip_iterator for
transformalgorithm.
- Fixed handling of
permutation_iteratoras input to oneDPL algorithms for a variety of source iterator and permutation types which caused issues. - Fixed
zip_iteratorto be sycl device copyable for trivially copyable source iterator types. - Added a workaround for reduction algorithm failures with 64-bit data types. Define
the
ONEDPL_WORKAROUND_FOR_IGPU_64BIT_REDUCTIONmacro to1before including oneDPL header files.
- Crashes or incorrect results may occur when using
oneapi::dpl::reverse_iteratororstd::reverse_iteratoras input to oneDPL algorithms with device execution policies.
See oneDPL Guide for other restrictions and known limitations.
- When compiled with
-fsycl-pstl-offloadoption of Intel oneAPI DPC++/C++ compiler and with libstdc++ version 8 or libc++,oneapi::dpl::execution::par_unseqoffloads standard parallel algorithms to the SYCL device similarly tostd::execution::par_unseqin accordance with the-fsycl-pstl-offloadoption value. - When using the dpl modulefile to initialize the user's environment and compiling with
-fsycl-pstl-offloadoption of Intel® oneAPI DPC++/C++ compiler, a linking issue or program crash may be encountered due to the directory containing libpstloffload.so not being included in the search path. Use the env/vars.sh to configure the working environment to avoid the issue. - Compilation issues may be encountered when passing zip iterators to
exclusive_scan_by_segmenton Windows. - Incorrect results may be produced by
set_intersectionwith a DPC++ execution policy, where elements are copied from the second input range rather than the first input range. - For
transform_exclusive_scanandexclusive_scanto run in-place (that is, with the same data used for both input and destination) and with an execution policy ofunseqorpar_unseq, it is required that the provided input and destination iterators are equality comparable. Furthermore, the equality comparison of the input and destination iterator must evaluate to true. If these conditions are not met, the result of these algorithm calls is undefined. sort,stable_sort,sort_by_key,partial_sort_copyalgorithms may work incorrectly or cause a segmentation fault when used a DPC++ execution policy for CPU device, and built on Linux with Intel® oneAPI DPC++/C++ Compiler and -O0 -g compiler options. To avoid the issue, pass-fsycl-device-code-split=per_kerneloption to the compiler.- Incorrect results may be produced by
exclusive_scan,inclusive_scan,transform_exclusive_scan,transform_inclusive_scan,exclusive_scan_by_segment,inclusive_scan_by_segment,reduce_by_segmentwithunseqorpar_unseqpolicy when compiled by Intel® oneAPI DPC++/C++ Compiler with-fiopenmp,-fiopenmp-simd,-qopenmp,-qopenmp-simdoptions on Linux. To avoid the issue, pass-fopenmpor-fopenmp-simdoption instead. - Incorrect results may be produced by
reduce,reduce_by_segment, andtransform_reducewith 64-bit data types when compiled by Intel® oneAPI DPC++/C++ Compiler versions 2021.3 and newer and executed on GPU devices. For a workaround, define theONEDPL_WORKAROUND_FOR_IGPU_64BIT_REDUCTIONmacro to1before including oneDPL header files. std::tuple,std::paircannot be used with SYCL buffers to transfer data between host and device.std::arraycannot be swapped in DPC++ kernels withstd::swapfunction orswapmember function in the Microsoft* Visual C++ standard library.- The
oneapi::dpl::experimental::ranges::reversealgorithm is not available with-fno-sycl-unnamed-lambdaoption. - STL algorithm functions (such as
std::for_each) used in DPC++ kernels do not compile with the debug version of the Microsoft* Visual C++ standard library.
- Added experimental
radix_sortandradix_sort_by_keyalgorithms residing in theoneapi::dpl::experimental::kt::esimdnamespace. These algorithms are first in the family of kernel templates that allow configuring a variety of parameters including the number of elements to process by a work item, and the size of a workgroup. The algorithms only work with Intel® Data Center GPU Max Series. - Added new
transform_ifalgorithm for applying a transform function conditionally based on a predicate, with overloads provided for one and two input sequences that use correspondingly unary and binary operations and predicates. - Optimizations used with Intel® oneAPI DPC++/C++ Compiler are expanded to the open source oneAPI DPC++ compiler.
esimd::radix_sortandesimd::radix_sort_by_keykernel templates fail to compile when a program is built with -g, -O0, -O1 compiler options.esimd::radix_sort_by_keykernel template produces wrong results with the following combinations ofkernel_paramand types of keys and values: -sizeof(key_type) + sizeof(val_type) == 12,kernel_param::workgroup_size == 64, andkernel_param::data_per_workitem == 96-sizeof(key_type) + sizeof(val_type) == 16,kernel_param::workgroup_size == 64, andkernel_param::data_per_workitem == 64
- Added an experimental feature to dynamically select an execution context, e.g., a SYCL queue.
The feature provides selection functions such as
select,submitandsubmit_and_wait, and several selection policies:fixed_resource_policy,round_robin_policy,dynamic_load_policy, andauto_tune_policy. unseqandpar_unseqpolicies now enable vectorization also for Intel oneAPI DPC++/C++ Compiler.- Added support for passing zip iterators as segment value data in
reduce_by_segment,exclusive_scan_by_segment, andinclusive_scan_by_segment. - Improved performance of the
merge,sort,stable_sort,sort_by_key,reduce,min_element,max_element,minmax_element,is_partitioned, andlexicographical_comparealgorithms with DPC++ execution policies.
- Fixed the
reduce_asyncfunction to not ignore the provided binary operation.
- When compiled with
-fsycl-pstl-offloadoption of Intel oneAPI DPC++/C++ compiler and with libstdc++ version 8 or libc++,oneapi::dpl::execution::par_unseqoffloads standard parallel algorithms to the SYCL device similarly tostd::execution::par_unseqin accordance with the-fsycl-pstl-offloadoption value. - When using the dpl modulefile to initialize the user's environment and compiling with
-fsycl-pstl-offloadoption of Intel® oneAPI DPC++/C++ compiler, a linking issue or program crash may be encountered due to the directory containing libpstloffload.so not being included in the search path. Use the env/vars.sh to configure the working environment to avoid the issue. - Compilation issues may be encountered when passing zip iterators to
exclusive_scan_by_segmenton Windows. - Incorrect results may be produced by
set_intersectionwith a DPC++ execution policy, where elements are copied from the second input range rather than the first input range. - For
transform_exclusive_scanandexclusive_scanto run in-place (that is, with the same data used for both input and destination) and with an execution policy ofunseqorpar_unseq, it is required that the provided input and destination iterators are equality comparable. Furthermore, the equality comparison of the input and destination iterator must evaluate to true. If these conditions are not met, the result of these algorithm calls is undefined. sort,stable_sort,sort_by_key,partial_sort_copyalgorithms may work incorrectly or cause a segmentation fault when used a DPC++ execution policy for CPU device, and built on Linux with Intel® oneAPI DPC++/C++ Compiler and -O0 -g compiler options. To avoid the issue, pass-fsycl-device-code-split=per_kerneloption to the compiler.- Incorrect results may be produced by
exclusive_scan,inclusive_scan,transform_exclusive_scan,transform_inclusive_scan,exclusive_scan_by_segment,inclusive_scan_by_segment,reduce_by_segmentwithunseqorpar_unseqpolicy when compiled by Intel® oneAPI DPC++/C++ Compiler with-fiopenmp,-fiopenmp-simd,-qopenmp,-qopenmp-simdoptions on Linux. To avoid the issue, pass-fopenmpor-fopenmp-simdoption instead. - Incorrect results may be produced by
reduce,reduce_by_segment, andtransform_reducewith 64-bit data types when compiled by Intel® oneAPI DPC++/C++ Compiler versions 2021.3 and newer and executed on GPU devices.
See oneDPL Guide for other restrictions and known limitations.
std::tuple,std::paircannot be used with SYCL buffers to transfer data between host and device.std::arraycannot be swapped in DPC++ kernels withstd::swapfunction orswapmember function in the Microsoft* Visual C++ standard library.- The
oneapi::dpl::experimental::ranges::reversealgorithm is not available with-fno-sycl-unnamed-lambdaoption. - STL algorithm functions (such as
std::for_each) used in DPC++ kernels do not compile with the debug version of the Microsoft* Visual C++ standard library.
- Added
sort_by_keyalgorithm for key-value sorting. - Improved performance of the
reduce,min_element,max_element,minmax_element,is_partitioned, andlexicographical_comparealgorithms with DPC++ execution policies. - Improved performance of the
reduce_by_segment,inclusive_scan_by_segment, andexclusive_scan_by_segmentalgorithms for binary operators with known identities when using DPC++ execution policies. - Added
value_typeto all views inoneapi::dpl::experimental::ranges. - Extended
oneapi::dpl::experimental::ranges::sortto support projections applied to the range elements prior to comparison.
- The minimally required CMake version is raised to 3.11 on Linux and 3.20 on Windows.
- Added new CMake package
oneDPLIntelLLVMConfig.cmaketo resolve issues using CMake 3.20+ on Windows for icx and icx-cl. - Fixed an error in the
sortandstable_sortalgorithms when performing a descending sort on signed numeric types with negative values. - Fixed an error in
reduce_by_segmentalgorithm when a non-commutative predicate is used. - Fixed an error in
sortandstable_sortalgorithms for integral types wider than 4 bytes. - Fixed an error for some compilers where OpenMP or SYCL backend was selected by CMake scripts without full compiler support.
- Incorrect results may be produced with in-place scans using
unseqandpar_unseqpolicies on CPUs with the Intel® C++ Compiler 2021.8.
See oneDPL Guide for other restrictions and known limitations.
std::tuple,std::paircannot be used with SYCL buffers to transfer data between host and device.std::arraycannot be swapped in DPC++ kernels withstd::swapfunction orswapmember function in the Microsoft* Visual C++ standard library.- The
oneapi::dpl::experimental::ranges::reversealgorithm is not available with-fno-sycl-unnamed-lambdaoption. - STL algorithm functions (such as
std::for_each) used in DPC++ kernels do not compile with the debug version of the Microsoft* Visual C++ standard library.
- Improved
sortalgorithm performance for the arithmetic data types withstd::lessorstd::greatercomparison operator and DPC++ policy.
- Fixed an error that caused segmentation faults in
transform_reduce,minmax_element, and related algorithms when ran on CPU devices. - Fixed a compilation error in
transform_reduce,minmax_element, and related algorithms on FPGAs. - Fixed
permutation_iteratorto support C-style array as a permutation map. - Fixed a radix-sort issue with 64-bit signed integer types.
- Added
generate,generate_n,transformalgorithms to Tested Standard C++ API. - Improved performance of the
inclusive_scan,exclusive_scan,reduceandmax_elementalgorithms with DPC++ execution policies.
- Added a workaround for the
TBB headers not foundissue occurring with libstdc++ version 9 when oneTBB headers are not present in the environment. The workaround requires inclusion of the oneDPL headers before the libstdc++ headers. - When possible, oneDPL CMake scripts now enforce C++17 as the minimally required language version.
- Fixed an error in the
exclusive_scanalgorithm when the output iterator is equal to the input iterator (in-place scan).
See oneDPL Guide for other restrictions and known limitations.
std::tuple,std::paircannot be used with SYCL buffers to transfer data between host and device.std::arraycannot be swapped in DPC++ kernels withstd::swapfunction orswapmember function in the Microsoft* Visual C++ standard library.- The
oneapi::dpl::experimental::ranges::reversealgorithm is not available with-fno-sycl-unnamed-lambdaoption. - STL algorithm functions (such as
std::for_each) used in DPC++ kernels do not compile with the debug version of the Microsoft* Visual C++ standard library.
- Added the functionality from
<complex>and more APIs from<cmath>and<limits>standard headers to Tested Standard C++ API. - Improved performance of
sortandstable_sortalgorithms on GPU devices when using Radix sort [1].
- Fixed permutation_iterator to work with C++ lambda functions for index permutation.
- Fixed an error in
oneapi::dpl::experimental::ranges::guard_viewandoneapi::dpl::experimental::ranges::zip_viewwhen usingoperator[]with an index exceeding the limits of a 32 bit integer type. - Fixed errors when data size is 0 in
upper_bound,lower_boundandbinary_searchalgorithms.
Removed support of C++11 and C++14.
Changed the size and the layout of the
discard_block_engineclass template.For further details, please refer to 2022.0 Changes.
See oneDPL Guide for other restrictions and known limitations.
std::tuple,std::paircannot be used with SYCL buffers to transfer data between host and device.std::arraycannot be swapped in DPC++ kernels withstd::swapfunction orswapmember function in the Microsoft* Visual C++ standard library.- The
oneapi::dpl::experimental::ranges::reversealgorithm is not available with-fno-sycl-unnamed-lambdaoption. - STL algorithm functions (such as
std::for_each) used in DPC++ kernels do not compile with the debug version of the Microsoft* Visual C++ standard library.
- Added possibility to construct a zip_iterator out of a std::tuple of iterators.
- Added 9 more serial-based versions of algorithms:
is_heap,is_heap_until,make_heap,push_heap,pop_heap,is_sorted,is_sorted_until,partial_sort,partial_sort_copy. Please refer to Tested Standard C++ API.
- Added namespace alias
dpl = oneapi::dplinto all public headers. - Fixed error in
reduce_by_segmentalgorithm. - Fixed wrong results error in algorithms call with permutation iterator.
See oneDPL Guide for other restrictions and known limitations.
std::tuple,std::paircannot be used with SYCL buffers to transfer data between host and device.std::arraycannot be swapped in DPC++ kernels withstd::swapfunction orswapmember function in the Microsoft* Visual C++ standard library.- The
oneapi::dpl::experimental::ranges::reversealgorithm is not available with-fno-sycl-unnamed-lambdaoption. - STL algorithm functions (such as
std::for_each) used in DPC++ kernels do not compile with the debug version of the Microsoft* Visual C++ standard library.
- Deprecated support of C++11 for Parallel API with host execution policies (
seq,unseq,par,par_unseq). C++17 is the minimal required version going forward.
- Fixed a kernel name definition error in range-based algorithms and
reduce_by_segmentused with a device_policy object that has no explicit kernel name.
- STL algorithm functions (such as
std::for_each) used in DPC++ kernels do not compile with the debug version of the Microsoft* Visual C++ standard library.
- Fixed compilation errors with C++20.
- Fixed
CL_OUT_OF_RESOURCESissue for Radix sort algorithm executed on CPU devices. - Fixed crashes in
exclusive_scan_by_segment,inclusive_scan_by_segment,reduce_by_segmentalgorithms applied to device-allocated USM.
- No new issues in this release.
See oneDPL Guide for other restrictions and known limitations.
std::tuple,std::paircannot be used with SYCL buffers to transfer data between host and device.std::arraycannot be swapped in DPC++ kernels withstd::swapfunction orswapmember function in the Microsoft* Visual C++ standard library.- The
oneapi::dpl::experimental::ranges::reversealgorithm is not available with-fno-sycl-unnamed-lambdaoption.
- Added a new implementation for
parandpar_unseqexecution policies based on OpenMP* 4.5 pragmas. It can be enabled with theONEDPL_USE_OPENMP_BACKENDmacro. For more details, see Macros page in oneDPL Guide. - Added the range-based version of the
reduce_by_segmentalgorithm and improved performance of the iterator-basedreduce_by_segmentAPIs. Please note that the use of thereduce_by_segmentalgorithm requires C++17. - Added the following algorithms (serial versions) to Tested Standard C++ API:
for_each_n,copy,copy_backward,copy_if,copy_n,is_permutation,fill,fill_n,move,move_backward.
- Fixed
param_typeAPI of random number distributions to satisfy C++ standard requirements. The new definitions ofparam_typeare not compatible with incorrect definitions in previous library versions. Recompilation is recommended for all codes that might useparam_type.
- Fixed hangs and errors when oneDPL is used together with oneAPI Math Kernel Library (oneMKL) in Data Parallel C++ (DPC++) programs.
- Fixed possible data races in the following algorithms used with DPC++ execution
policies:
sort,stable_sort,partial_sort,nth_element.
- No new issues in this release.
See oneDPL Guide for other restrictions and known limitations.
std::tuple,std::paircannot be used with SYCL buffers to transfer data between host and device.std::arraycannot be swapped in DPC++ kernels withstd::swapfunction orswapmember function in the Microsoft* Visual C++ standard library.- The
oneapi::dpl::experimental::ranges::reversealgorithm is not available with-fno-sycl-unnamed-lambdaoption.
- Added new random number distributions:
exponential_distribution,bernoulli_distribution,geometric_distribution,lognormal_distribution,weibull_distribution,cachy_distribution,extreme_value_distribution. - Added the following algorithms (serial versions) to Tested Standard C++ API:
all_of,any_of,none_of,count,count_if,for_each,find,find_if,find_if_not. - Improved performance of
searchandfind_endalgorithms on GPU devices.
- Fixed SYCL* 2020 features deprecation warnings.
- Fixed some corner cases of
normal_distributionfunctionality. - Fixed a floating point exception occurring on CPU devices when a program uses a lot of oneDPL algorithms and DPC++ kernels.
- Fixed possible hanging and data races of the following algorithms used with DPC++ execution policies:
count,count_if,is_partitioned,lexicographical_compare,max_element,min_element,minmax_element,reduce,transform_reduce.
- The definition of lambda functions used with parallel algorithms should not depend on preprocessor macros that makes it different for the host and the device. Otherwise, the behavior is undefined.
exclusive_scanandtransform_exclusive_scanalgorithms may provide wrong results with vector execution policies when building a program with GCC 10 and using -O0 option.- Some algorithms may hang when a program is built with -O0 option, executed on GPU devices and large number of elements is to be processed.
- The use of oneDPL together with the GNU C++ standard library (libstdc++) version 9 or 10 may lead to compilation errors (caused by oneTBB API changes). To overcome these issues, include oneDPL header files before the standard C++ header files, or disable parallel algorithms support in the standard library. For more information, please see Intel® oneAPI Threading Building Blocks (oneTBB) Release Notes.
- The
using namespace oneapi;directive in a oneDPL program code may result in compilation errors with some compilers including GCC 7 and earlier. Instead of this directive, explicitly useoneapi::dplnamespace, or create a namespace alias. - The implementation does not yet provide
namespace oneapi::stdas defined in the oneDPL Specification. - The use of the range-based API requires C++17 and the C++ standard libraries coming with GCC 8.1 (or higher) or Clang 7 (or higher).
std::tuple,std::paircannot be used with SYCL buffers to transfer data between host and device.- When used within DPC++ kernels or transferred to/from a device,
std::arraycan only hold objects whose type meets DPC++ requirements for use in kernels and for data transfer, respectively. std::array::atmember function cannot be used in kernels because it may throw an exception; usestd::array::operator[]instead.std::arraycannot be swapped in DPC++ kernels withstd::swapfunction orswapmember function in the Microsoft* Visual C++ standard library.- Due to specifics of Microsoft* Visual C++, some standard floating-point math functions
(including
std::ldexp,std::frexp,std::sqrt(std::complex<float>)) require device support for double precision. - The
oneapi::dpl::experimental::ranges::reversealgorithm is not available with-fno-sycl-unnamed-lambdaoption.
- Added the range-based versions of the following algorithms:
any_of,adjacent_find,copy_if,none_of,remove_copy_if,remove_copy,replace_copy,replace_copy_if,reverse,reverse_copy,rotate_copy,swap_ranges,unique,unique_copy. - Added new asynchronous algorithms:
inclusive_scan_async,exclusive_scan_async,transform_inclusive_scan_async,transform_exclusive_scan_async. - Added structured binding support for
zip_iterator::value_type.
- Fixed an issue with asynchronous algorithms returning
future<ptr>with unified shared memory (USM).
- With Intel® oneAPI DPC++/C++ Compiler,
unseqandpar_unseqexecution policies do not use OpenMP SIMD pragmas due to compilation issues with the-fopenm-simdoption, possibly resulting in suboptimal performance. - The
oneapi::dpl::experimental::ranges::reversealgorithm does not compile with-fno-sycl-unnamed-lambdaoption.
exclusive_scanandtransform_exclusive_scanalgorithms may provide wrong results with vector execution policies when building a program with GCC 10 and using -O0 option.- Some algorithms may hang when a program is built with -O0 option, executed on GPU devices and large number of elements is to be processed.
- The use of oneDPL together with the GNU C++ standard library (libstdc++) version 9 or 10 may lead to compilation errors (caused by oneTBB API changes). To overcome these issues, include oneDPL header files before the standard C++ header files, or disable parallel algorithms support in the standard library. For more information, please see Intel® oneAPI Threading Building Blocks (oneTBB) Release Notes.
- The
using namespace oneapi;directive in a oneDPL program code may result in compilation errors with some compilers including GCC 7 and earlier. Instead of this directive, explicitly useoneapi::dplnamespace, or create a namespace alias. - The implementation does not yet provide
namespace oneapi::stdas defined in the oneDPL Specification. - The use of the range-based API requires C++17 and the C++ standard libraries coming with GCC 8.1 (or higher) or Clang 7 (or higher).
std::tuple,std::paircannot be used with SYCL buffers to transfer data between host and device.- When used within DPC++ kernels or transferred to/from a device,
std::arraycan only hold objects whose type meets DPC++ requirements for use in kernels and for data transfer, respectively. std::array::atmember function cannot be used in kernels because it may throw an exception; usestd::array::operator[]instead.std::arraycannot be swapped in DPC++ kernels withstd::swapfunction orswapmember function in the Microsoft* Visual C++ standard library.- Due to specifics of Microsoft* Visual C++, some standard floating-point math functions
(including
std::ldexp,std::frexp,std::sqrt(std::complex<float>)) require device support for double precision.
- Added the range-based versions of the following algorithms:
all_of,any_of,count,count_if,equal,move,remove,remove_if,replace,replace_if. - Added the following utility ranges (views):
generate,fill,rotate.
- Improved performance of
discard_block_engine(includingranlux24,ranlux48,ranlux24_vec,ranlux48_vecpredefined engines) andnormal_distribution. - Added two constructors to
transform_iterator: the default constructor and a constructor from an iterator without a transformation.transform_iteratorconstructed these ways uses transformation functor of type passed in template arguments. transform_iteratorcan now work on top of forward iterators.
- Fixed execution of
swap_rangesalgorithm withunseq,parexecution policies. - Fixed an issue causing memory corruption and double freeing in scan-based algorithms compiled with -O0 and -g options and run on CPU devices.
- Fixed incorrect behavior in the
exclusive_scanalgorithm that occurred when the input and output iterator ranges overlapped. - Fixed error propagation for async runtime exceptions by consistently calling
sycl::event::wait_and_throwinternally. - Fixed the warning:
local variable will be copied despite being returned by name [-Wreturn-std-move].
- No new issues in this release.
exclusive_scanandtransform_exclusive_scanalgorithms may provide wrong results with vector execution policies when building a program with GCC 10 and using -O0 option.- Some algorithms may hang when a program is built with -O0 option, executed on GPU devices and large number of elements is to be processed.
- The use of oneDPL together with the GNU C++ standard library (libstdc++) version 9 or 10 may lead to compilation errors (caused by oneTBB API changes). To overcome these issues, include oneDPL header files before the standard C++ header files, or disable parallel algorithms support in the standard library. For more information, please see Intel® oneAPI Threading Building Blocks (oneTBB) Release Notes.
- The
using namespace oneapi;directive in a oneDPL program code may result in compilation errors with some compilers including GCC 7 and earlier. Instead of this directive, explicitly useoneapi::dplnamespace, or create a namespace alias. - The implementation does not yet provide
namespace oneapi::stdas defined in the oneDPL Specification. - The use of the range-based API requires C++17 and the C++ standard libraries coming with GCC 8.1 (or higher) or Clang 7 (or higher).
std::tuple,std::paircannot be used with SYCL buffers to transfer data between host and device.- When used within DPC++ kernels or transferred to/from a device,
std::arraycan only hold objects whose type meets DPC++ requirements for use in kernels and for data transfer, respectively. std::array::atmember function cannot be used in kernels because it may throw an exception; usestd::array::operator[]instead.std::arraycannot be swapped in DPC++ kernels withstd::swapfunction orswapmember function in the Microsoft* Visual C++ standard library.- Due to specifics of Microsoft* Visual C++, some standard floating-point math functions
(including
std::ldexp,std::frexp,std::sqrt(std::complex<float>)) require device support for double precision.
- Added support of parallel, vector and DPC++ execution policies for the following algorithms:
shift_left,shift_right. - Added the range-based versions of the following algorithms:
sort,stable_sort,merge. - Added experimental asynchronous algorithms:
copy_async,fill_async,for_each_async,reduce_async,sort_async,transform_async,transform_reduce_async. These algorithms are declared inoneapi::dpl::experimentalnamespace and implemented only for DPC++ policies. In order to make these algorithms available the<oneapi/dpl/async>header should be included. Use of the asynchronous API requires C++11. - Utility function
wait_for_allenables waiting for completion of an arbitrary number of events. - Added the
ONEDPL_USE_PREDEFINED_POLICIESmacro, which enables predefined policy objects andmake_device_policy,make_fpga_policyfunctions without arguments. It is turned on by default.
- Improved performance of the following algorithms:
count,count_if,is_partitioned,lexicographical_compare,max_element,min_element,minmax_element,reduce,transform_reduce, andsort,stable_sortwhen using Radix sort [1]. - Improved performance of the linear_congruential_engine RNG engine (including
minstd_rand,minstd_rand0,minstd_rand_vec,minstd_rand0_vecpredefined engines).
- Fixed runtime errors occurring with
find_end,search,search_nalgorithms when a program is built with -O0 option and executed on CPU devices. - Fixed the majority of unused parameter warnings.
exclusive_scanandtransform_exclusive_scanalgorithms may provide wrong results with vector execution policies when building a program with GCC 10 and using -O0 option.- Some algorithms may hang when a program is built with -O0 option, executed on GPU devices and large number of elements is to be processed.
- The use of oneDPL together with the GNU C++ standard library (libstdc++) version 9 or 10 may lead to compilation errors (caused by oneTBB API changes). To overcome these issues, include oneDPL header files before the standard C++ header files, or disable parallel algorithms support in the standard library. For more information, please see Intel® oneAPI Threading Building Blocks (oneTBB) Release Notes.
- The
using namespace oneapi;directive in a oneDPL program code may result in compilation errors with some compilers including GCC 7 and earlier. Instead of this directive, explicitly useoneapi::dplnamespace, or create a namespace alias. - The implementation does not yet provide
namespace oneapi::stdas defined in the oneDPL Specification. - The use of the range-based API requires C++17 and the C++ standard libraries coming with GCC 8.1 (or higher) or Clang 7 (or higher).
std::tuple,std::paircannot be used with SYCL buffers to transfer data between host and device.- When used within DPC++ kernels or transferred to/from a device,
std::arraycan only hold objects whose type meets DPC++ requirements for use in kernels and for data transfer, respectively. std::array::atmember function cannot be used in kernels because it may throw an exception; usestd::array::operator[]instead.std::arraycannot be swapped in DPC++ kernels withstd::swapfunction orswapmember function in the Microsoft* Visual C++ standard library.- Due to specifics of Microsoft* Visual C++, some standard floating-point math functions
(including
std::ldexp,std::frexp,std::sqrt(std::complex<float>)) require device support for double precision.
- This version implements the oneDPL Specification v1.0, including parallel algorithms, DPC++ execution policies, special iterators, and other utilities.
- oneDPL algorithms can work with data in DPC++ buffers as well as in unified shared memory (USM).
- For several algorithms, experimental API that accepts ranges (similar to C++20) is additionally provided.
- A subset of the standard C++ libraries for Microsoft* Visual C++, GCC, and Clang is supported
in DPC++ kernels, including
<array>,<complex>,<functional>,<tuple>,<type_traits>,<utility>and other standard library API. For the detailed list, please refer to oneDPL Guide. - Standard C++ random number generators and distributions for use in DPC++ kernels.
- The use of oneDPL together with the GNU C++ standard library (libstdc++) version 9 or 10 may lead to compilation errors (caused by oneTBB API changes). To overcome these issues, include oneDPL header files before the standard C++ header files, or disable parallel algorithms support in the standard library. For more information, please see Intel® oneAPI Threading Building Blocks (oneTBB) Release Notes.
- The
using namespace oneapi;directive in a oneDPL program code may result in compilation errors with some compilers including GCC 7 and earlier. Instead of this directive, explicitly useoneapi::dplnamespace, or create a namespace alias. - The
partial_sort_copy,sortandstable_sortalgorithms are prone toCL_BUILD_PROGRAM_FAILUREwhen a program uses Radix sort [1], is built with -O0 option and executed on CPU devices. - The implementation does not yet provide
namespace oneapi::stdas defined in the oneDPL Specification. - The use of the range-based API requires C++17 and the C++ standard libraries coming with GCC 8.1 (or higher) or Clang 7 (or higher).
std::tuple,std::paircannot be used with SYCL buffers to transfer data between host and device.- When used within DPC++ kernels or transferred to/from a device,
std::arraycan only hold objects whose type meets DPC++ requirements for use in kernels and for data transfer, respectively. std::array::atmember function cannot be used in kernels because it may throw an exception; usestd::array::operator[]instead.std::arraycannot be swapped in DPC++ kernels withstd::swapfunction orswapmember function in the Microsoft* Visual C++ standard library.- Due to specifics of Microsoft* Visual C++, some standard floating-point math functions
(including
std::ldexp,std::frexp,std::sqrt(std::complex<float>)) require device support for double precision.
| [1] | (1, 2, 3, 4, 5, 6, 7) The sorting algorithms in oneDPL use Radix sort for arithmetic data types and
sycl::half (since oneDPL 2022.6) compared with std::less or std::greater, otherwise Merge sort. |