[ARCHIVED] The C++ Standard Library for your entire system. See https://github.com/NVIDIA/cccl
libcu++ 1.7.0 is a major release. It adds cuda::std::atomic_ref
for integral
types. cuda::std::atomic_ref
may potentially replace uses of CUDA specific
atomicOperator(_Scope)
calls and provides a singular API for host and device code.
Supported ABI Versions: 4 (default), 3, and 2.
cuda::std::atomic_ref
for integral types.<nv/target>
when C or pre-C++11 dialects are used.<cuda/std/atomic>
for GCC/Clang.<nv/target>
on MSVC, fallback macros would always choose pre-C++11 backend.<nv/target>
.__int128
support._LIBCUDACXX_CUDACC_VER
broken from #217.
memcpy_async
should cache only in L2 when possible.atomic/atomic_ref
ctors to prevent copy construction.libcu++ 1.6.0 is a major release. It changes the default alignment of cuda::std::complex
for
better code generation and changes cuda::std::atomic
to use <nv/target>
as the primary dispatch mechanism.
This release introduces ABI version 4, which is now the default.
Supported ABI Versions: 4 (default), 3, and 2.
<cuda/std/barrier>
and <cuda/std/atomic>
failed to compile with NVRTC.<nv/target>
to be used under NVRTC.<nv/target>
to build when compiled under C and C++98.
cuda::std::complex
alignment for enhanced performance.cuda::std::chrono
literals to double
.libcu++ 1.6.0 is a major release. It changes the default alignment of cuda::std::complex
for
better code generation and changes cuda::std::atomic
to use <nv/target>
as the primary dispatch mechanism.
This release adds cuda::annotated_ptr
and cuda::access_property
, two APIs that allow associating an address
space and an explicit caching policy with a pointer, and the related cuda::apply_access_property
,
cuda::associate_access_property
and cuda::discard_memory
APIs.
This release introduces ABI version 4, which is now the default.
Supported ABI Versions: 4 (default), 3, and 2.
<cuda/std/barrier>
and <cuda/std/atomic>
failed to compile with NVRTC.<nv/target>
to be used under NVRTC.<nv/target>
to build when compiled under C and C++98.
cuda::std::complex
alignment for enhanced performance.cuda::std::chrono
literals to double
.libcu++ 1.5.0 is a major release. It adds <nv/target>
,
the library support header for the new if target
target specialization mechanism.
Supported ABI Versions: 3 (default) and 2.
Included in: CUDA Toolkit 11.4.
<nv/target>
- Portability macros for NVCC/NVC++ and other compilers.libcu++ 1.4.1 is a minor bugfix release.
Supported ABI versions: 3 (default) and 2.
Included in: CUDA Toolkit 11.3.
constexpr
to synchronization object constructors.
cuda::std::complex
for NVRTC.__is_convertible
, which NVCC treats as a context sensitive keyword.libcu++ 1.4.0 adds <cuda/std/complex>
, NVCC + MSVC support for <cuda/std/tuple>
, and backports of C++20 <cuda/std/chrono>
and C++17 <cuda/std/type_traits>
features to C++14.
Supported ABI versions: 3 (default) and 2.
<cuda/std/complex>
.
long double
is not supported and disabled when building with NVCC.<cuda/std/chrono>
backported to C++14.
<cuda/std/type_traits>
backported to C++14.
cuda::std::byte
(in <cuda/std/cstddef>
) backported to C++14.
cuda::std::is_constant_evaluated
backported to C++11.
<cuda/pipeline>
and the asynchronous operations API.<cuda/std/tuple>
now works on a set of most recent MSVC compilers.<cuda/std/chrono>
/<cuda/std/type_traits>
backports.
libcu++ 1.0.0 is the first release of libcu++, the C++ Standard Library for your entire system. It brings C++ atomics to CUDA: <cuda/[std/]atomic>
. It also introduces <cuda/std/type_traits>
, <cuda/std/cassert>
, <cuda/std/cfloat>
, <cuda/std/cstddef>
, and <cuda/std/cstdint>
.
<cuda/[std/]atomic>
:
cuda::thread_scope
: An enumeration that specifies which group of threads can synchronize with each other using a concurrency primitive.cuda::atomic<T, Scope>
: Scoped atomic objects.cuda::std::atomic<T>
: Atomic objects.<cuda/std/type_traits>
: Type traits and metaprogramming facilities.<cuda/std/cassert>
: assert
, an error-reporting mechanism.<cuda/std/cstddef>
: Builtin fundamental types.<cuda/std/cstdint>
: Builtin integral types.<cuda/std/cfloat>
: Builtin floating point types.libcu++ 1.1.0 introduces the world's first implementation of the Standard C++20 synchronization library: <cuda/[std/]barrier>
, <cuda/std/latch>
, <cuda/std/semaphore>
, cuda::[std::]atomic_flag::test
, cuda::[std::]atomic::wait
, and cuda::[std::]atomic::notify*
. An extension for managing asynchronous local copies, cuda::memcpy_async
is introduced as well. It also adds <cuda/std/chrono>
, <cuda/std/ratio>
, and most of <cuda/std/functional>
.
<cuda/[std/]barrier>
: C++20's cuda::[std::]barrier
, an asynchronous thread coordination mechanism whose lifetime consists of a sequence of barrier phases, where each phase allows at most an expected number of threads to block until the expected number of threads arrive at the barrier. It is backported to C++11. The cuda::barrier
variant takes an additional cuda::thread_scope
parameter.<cuda/barrier>
: cuda::memcpy_async
, asynchronous local copies. This facility is NOT for transferring data between threads or transferring data between host and device; it is not a cudaMemcpyAsync
replacement or abstraction. It uses cuda::[std::]barrier
s objects to synchronize the copies.<cuda/std/functional>
: common function objects, such as cuda::std::plus
, cuda::std::minus
, etc. cuda::std::function
, cuda::std::bind
, cuda::std::hash
, and cuda::std::reference_wrapper
are omitted.__cuda_memcmp
inline to fix ODR violations when compiling multiple translation units.libcu++ 1.2.0 adds <cuda/pipeline>
/cuda::pipeline
, a facility for coordinating cuda::memcpy_async
operations. This release introduces ABI version 3, which is now the default.
Supported ABI versions: 3 (default) and 2.
Included in: CUDA 11.1.
cuda::[std::]barrier
by changing its alignment. Users may define _LIBCUDACXX_CUDA_ABI_VERSION=2
before including any libcu++ or CUDA headers to use ABI version 2, which was the default for the 1.1.0 / CUDA 11.0 release. Both ABI version 3 and ABI version 2 will be supported until the next major CUDA release.<cuda/pipeline>
: cuda::pipeline
, a facility for coordinating cuda::memcpy_async
operations.<cuda/std/version>
: API version macros _LIBCUDACXX_CUDA_API_VERSION
, _LIBCUDACXX_CUDA_API_VERSION_MAJOR
, _LIBCUDACXX_CUDA_API_VERSION_MINOR
, and _LIBCUDACXX_CUDA_API_VERSION_PATCH
._LIBCUDACXX_CUDA_ABI_VERSION
to request a particular supported ABI version. _LIBCUDACXX_CUDA_ABI_VERSION_LATEST
is set to the latest ABI version, which is always the default.<cuda/latch>
/<cuda/semaphore>
: <cuda/*>
headers added for cuda::latch
, cuda::counting_semaphore
, and cuda::binary_semaphore
. These features were available in prior releases, but you had to include <cuda/std/latch>
and <cuda/std/semaphore>
to access them.libcu++ 1.3.0 adds <cuda/std/tuple>
and cuda::std::pair
, although they are not supported with NVCC + MSVC. It also adds documentation.
Supported ABI versions: 3 (default) and 2.
Included in: CUDA 11.2.
<cuda/std/tuple>
: cuda::std::tuple
, a fixed-size collection of heterogeneous values. Not supported with NVCC + MSVC.<cuda/std/utility>
: cuda::std::pair
, a collection of two heterogeneous values. The only <cuda/std/utility>
facilities supported are cuda::std::pair
. Not supported with NVCC + MSVC.__builtin_is_constant_evaluated
usage with NVCC in C++11 mode because it's broken.__threading_support
which have inconsistent qualifiers. Thanks to Gonzalo Brito Gadeschi for this contribution.