NumPy & SciPy for GPU
This is the release note of v12.0.0a1. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
cupyx.scipy
APIs (#6823, #6849, #6855, #6890, #6958, #6971)The coverage of SciPy interpolate
, stats
& special
APIs has increased. (Thanks @khushi-411 & @andoorve!)
Arm (aarch64) wheels are now compiled with support for compute capability 8.7.
These wheels are available through our Pip index: pip install --pre cupy-cuda11x -f https://pip.cupy.dev/aarch64
cupy.heaviside
api. (#6798)cupyx.scipy.special.log_softmax
(#6823)cupyx.scipy.stats.boxcox_llf
(#6849)cupyx.scipy.stats.{zmap, zscore}
(#6855)cupyx.scipy.special.softmax
(#6890)dtype
, fweights
, aweights
to cupy.cov
(#6892)cupyx.scipy.interpolate.BarycentricInterpolator
(#6958)scipy.special.cosm1
to cupyx
(#6971)__device__
option is missing (#6837)augassign
target is evaluated twice in JIT (#6844)_compile.py
(#6859)nanvar
and nanstd
(#6869)cupy.array_api
(#6871)kind
in sort
/argsort
and fix cupy.array_api.{sort,argsort}
accordingly (#6872)cupy-wheel
for v11 (#6903)deg
in cupy.angle
(#6905)cupy.array_api
(cont'd) (#6932)@pytest.mark.parametrize
in some cases (#6984)keepdims
parameter for average
(#6852)equal_nan
parameter for unique
(#6853)cupyx.scipy.ndimage
utilities (#6953)boxcox_llf
(#6884)cupy.clip
to match numpy (#6920)argpartition
use the kth argument properly (#6921)distutils.utils
(#7006)cupy.win.cuda117
(#6880)XFAIL
for tests/cupyx_tests/scipy_tests/sparse_tests/test_coo.py
when scipy>=1.9.0rc2
(#6894)The CuPy Team would like to thank all those who contributed to this release!
@andfoy @andoorve @asi1024 @BasLaa @emcastillo @grlee77 @khushi-411 @kmaehashi @leofang @pri1311 @takagi @tom24d @toslunar @tpkessler
This is the release note of v11.1.0. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Arm (aarch64) wheels are now compiled with support for compute capability 8.7.
These wheels are available through our Pip index: pip install cupy-cuda11x -f https://pip.cupy.dev/aarch64
cupyx.scipy.special.log_softmax
(#6966)cupy.array_api
(#6929)kind
in sort
/argsort
and fix cupy.array_api.{sort,argsort}
accordingly (#6951)augassign
target is evaluated twice in JIT (#6964)cupy.array_api
(cont'd) (#6973)__device__
option is missing (#6991)_compile.py
(#6993)@pytest.mark.parametrize
in some cases (#7010)keepdims
parameter for average
(#6897)equal_nan
parameter for unique
(#6904)argpartition
use the kth argument properly (#7020)matmul
supports out
(#6899)XFAIL
for tests/cupyx_tests/scipy_tests/sparse_tests/test_coo.py
when scipy>=1.9.0rc2
(#6963)The CuPy Team would like to thank all those who contributed to this release!
@asi1024 @emcastillo @khushi-411 @kmaehashi @leofang @takagi @toslunar
This is the release note of v11.0.0. See here for the complete list of solved issues and merged PRs.
This release note only covers changes made since v11.0.0rc1 release. Check out our blog for highlights in the v11 release!
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
cupy-wheel
packageCurrently, downstream projects depending on CuPy had a hard time specifying a binary wheel as a dependency, and it was the users’ responsibility to install the correct package in their environments. CuPy v10 introduced the experimental cupy-wheel
meta-package. In this release, we declare this feature ready for production environments. cupy-wheel
will examine the users’ environment and automatically select the matching CuPy binary wheel to be installed.
For all changes in v11, please refer to the release notes of the pre-releases (alpha1, alpha2, beta1, beta2, beta3, rc1).
deg
in cupy.angle
(#6909)cupy-wheel
for v11 (#6913)dtype
of different size (#6850)cupy.win.cuda117
(#6885)The CuPy Team would like to thank all those who contributed to this release!
@emcastillo @kmaehashi @takagi
This is the release note of v10.6.0. See here for the complete list of solved issues and merged PRs.
This is the last planned release for CuPy v10 series. We are going to release v11.0.0 on July 28th. Please start testing your workload with the v11 release candidate (pip install --pre cupy-cuda11x -f https://pip.cupy.dev/pre
). See the Upgrade Guide for the list of possible breaking changes in v11.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Full support for CUDA 11.7 has been added as of this release. Binary packages can be installed with the following command: pip install cupy-cuda117
cupy.array_api
say "cupy" instead of "numpy" (#6795)cupy.median
for NaN inputs (#6760)ndimage.filter
tests for ROCm 4.0 (#6676)ndimage.filter
tests for ROCm 4.0 (#6676)The CuPy Team would like to thank all those who contributed to this release!
@asi1024 @asmeurer @emcastillo @kmaehashi @LostBenjamin @takagi
This is the release note of v11.0.0rc1. See here for the complete list of solved issues and merged PRs.
We are going to release v11.0.0 on July 28th. Please start testing your workload with this release candidate (pip install --pre cupy-cuda11x -f https://pip.cupy.dev/pre
). See the Upgrade Guide for the list of possible breaking changes.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Full support for CUDA 11.7 has been added as of this release. Binary packages can be installed with the following command: pip install --pre cupy-cuda11x -f https://pip.cupy.dev/pre
CuPy v11 provides a unified binary package named cupy-cuda11x
that supports all CUDA 11.2+ releases. This replaces per-CUDA version binary packages (cupy-cuda112
, cupy-cuda113
, …, cupy-cuda117
) provided in CuPy v10 or earlier.
Note that CUDA 11.1 or earlier still requires per-CUDA version binary packages. cupy-cuda102
, cupy-cuda110
, and cupy-cuda111
will be provided for CUDA 10.2, 11.0, and 11.1, respectively.
CuPy v11 provides cupy-cuda11x
binary package built for aarch64, which supports CUDA 11.2+ Arm SBSA and JetPack 5.
These wheels are available through our Pip index: pip install --pre cupy-cuda11x -f https://pip.cupy.dev/aarch64
ndarray
subclassing (#6720, #6755)This release allows users to subclass cupy.ndarray
, using the same protocol as NumPy:
class C(cupy.ndarray):
def __new__(cls, *args, info=None, **kwargs):
obj = super().__new__(cls, *args, **kwargs)
obj.info = info
return obj
def __array_finalize__(self, obj):
if obj is None:
return
self.info = getattr(obj, 'info', None)
a = C([0, 1, 2, 3], info='information')
assert type(a) is C
assert issubclass(type(a), cupy.ndarray)
assert a.info == 'information'
Note that view casting and new from template mechanisms are also supported as described by the NumPy documentation.
cupyx.distributed
for Sparse MatricesAll the collective calls implemented for dense matrices now support sparse matrices. Users interested in this feature should install mpi4py
in order to perform an efficient metadata exchange.
We would like to give a warm welcome to @khushi-411 who will be working in adding support for the cupyx.scipy.interpolate
APIs as part of her GSoC internship!
CuPy official Docker images have been upgraded. Users relying on these images may suffer from compatibility issues with preinstalled tools or libraries.
cupy.setxor1d
(#6582)cupyx.spatial.distance
support from pylibraft (#6690)cupy.ndarray
subclassing - Part 2 - View casting (#6720)broadcast
(#6758)reduce
(#6761)all_reduce
and minor fixes (#6762)all_to_all
, reduce_scatter
, send_recv
(#6765)cupy.ndarray
subclassing - Part 3 - New from template (ufunc) (#6775)cupyx.scipy.special.log_ndtr
(#6776)cupyx.scipy.special.expn
(#6790)cupy-cuda11x
wheel (#6800)CUPY_CUDA_VERSION
as much as possible (#6810)cupy.cuda.compile_with_cache
(#6818)cupy.poly1d.__pow__
(#6770)cupy.median
for NaN inputs (#6759)_cuda_types.py
(#6726)ndarray_base
(#6782)cupy-cuda11x
wheel (#6803)The CuPy Team would like to thank all those who contributed to this release!
@andoorve @asi1024 @asmeurer @cjnolet @emcastillo @khushi-411 @kmaehashi @leofang @LostBenjamin @pri1311 @rietmann-nv @takagi
This is the release note of v10.5.0. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Update (2022-06-17): Wheels for CUDA 11.5 Arm SBSA are now available in the Assets section below. (#6705)
ifdef
(#6740)ifdef
for ROCm >= 4.2 (#6751)scipy==1.8.1
sparse dot bugfix (#6728)The CuPy Team would like to thank all those who contributed to this release!
@asi1024 @emcastillo @kmaehashi @leofang @takagi
This is the release note of v11.0.0b3. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
einsum
backend (#6677) (thanks @leofang!)A new accelerator for CuPy has been added (CUPY_ACCELERATORS=cutensornet
).
This feature requires cuquantum-python >= 22.03
and cuTENSOR >= 1.5.0
. And is used to accelerate and support large array sizes in the cupy.linalg.einsum
API.
CuPy v11 will drop support for ROCm 4.2. We recommend users to use ROCm 4.3 or 5.0 instead.
As per NEP29, NumPy 1.18/1.9 support has been dropped on 2021. SciPy supported versions are the one released close to NumPy supported ones.
einsum
backend (#6677)cupy.poly
(#6697)ifdef
(#6739)bincount
, histogram2d
, histogramdd
with CUB (#6701)ifdef
for ROCm >= 4.2 (#6750)Dim3
class (#6644)scatter_add
example (#6696)LOBPCG
on ROCm 5.0+ (#6603)scipy==1.8.1
sparse dot bugfix (#6727)The CuPy Team would like to thank all those who contributed to this release!
@asi1024 @Dahlia-Chehata @emcastillo @kmaehashi @leofang @takagi
This is the release note of v11.0.0b2. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
CuPy JIT has been further enhanced thanks to @leofang and @eternalphane!
It is now possible to use CUDA cooperative groups and access .shape
and .strides
attributes of ndarrays.
import cupy
from cupyx import jit
@jit.rawkernel()
def kernel(x, y):
size = x.shape[0]
ntid = jit.gridDim.x * jit.blockDim.x
tid = jit.blockIdx.x * jit.blockDim.x + jit.threadIdx.x
for i in range(tid, size, ntid):
y[i] = x[i]
g = jit.cg.this_thread_block()
g.sync()
x = cupy.arange(200, dtype=cupy.int64)
y = cupy.zeros((200,), dtype=cupy.int64)
kernel[2, 32](x, y)
print(kernel.cached_code)
The above program emits the CUDA code as follows:
#include <cooperative_groups.h>
namespace cg = cooperative_groups;
extern "C" __global__ void kernel(CArray<long long, 1, true, true> x, CArray<long long, 1, true, true> y) {
ptrdiff_t i;
ptrdiff_t size = thrust::get<0>(x.get_shape());
unsigned int ntid = (gridDim.x * blockDim.x);
unsigned int tid = ((blockIdx.x * blockDim.x) + threadIdx.x);
for (ptrdiff_t __it = tid, __stop = size, __step = ntid; __it < __stop; __it += __step) {
i = __it;
y[i] = x[i];
}
cg::thread_block g = cg::this_thread_block();
g.sync();
}
cupyx.distributed
(#6628, #6658)CuPy v10 added the cupyx.distributed
API to perform interprocess communication using NCCL in a way similar to MPI. In CuPy v11 we are extending this API to support sparse matrices as defined in cupyx.scipy.sparse
. Currently only send
/recv
primitives are supported but we will be adding support for collective calls in the following releases.
Additionally, now it is possible to use MPI (through the mpi4py
python package) to initialize the NCCL communicator. This prevents from launching the TCP server used for communication exchange of CPU values. Moreover, we recommend to enable MPI for sparse matrices communication as this requires to exchange metadata per each communication call that lead to device synchronization if MPI is not enabled.
# run with mpiexec -n N python …
import mpi4py
comm = mpi4py.MPI.COMM_WORLD
workers = comm.Get_size()
rank = comm.Get_rank()
comm = cupyx.distributed.init_process_group(workers, rank, use_mpi=True)
cupy-wheel
(EXPERIMENTAL) (#6012)We have added a new package in the PyPI called cupy-wheel
. This meta package allows other libraries to add a dependency to CuPy with the ability to transparently install the exact CuPy binary wheel matching the user environment. Users can also install CuPy using this package instead of manually specifying a CUDA/ROCm version.
pip install cupy-wheel
This package is only available for the stable release as the current pre-release wheels are not hosted in PyPI.
This feature is currently experimental and subject to change so we recommend users not to distribute packages relying on it for now. Your suggestions or comments are highly welcomed (please visit #6688.)
cupyx.distributed
(#6628).shape
and .strides
(#6668)flatten(order)
(#6613)__repr__
for cupyx.profiler._time._PerfCaseResult
(#6617)cudaDevAttrMemoryPoolsSupported
to hip (#6621)kernel.cached_code
test (#6643)cupyx.distributed
(#6658)cupy.intersect1d
(#6586)float16::operator-()
only for ROCm 5.0+ (#6624)cupy.polyval
(#6664)memcpy_async
on CUDA 11.0 (#6671)--pre
option to instructions installing pre-releases (#6612)jenkins
requirements (#6632)TestIncludesCompileCUDA
for HEAD tests (#6646)/test mini
(#6653)The CuPy Team would like to thank all those who contributed to this release!
@asi1024 @code-review-doctor @danielg1111 @davidegavio @emcastillo @eternalphane @kmaehashi @leofang @okuta @takagi @toslunar
This is the release note of v10.4.0. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
cupy-wheel
(EXPERIMENTAL) (#6012)We have added a new package in the PyPI called cupy-wheel
. This meta package allows other libraries to add a dependency to CuPy with the ability to transparently install the exact CuPy binary wheel matching the user environment. Users can also install CuPy using this package instead of manually specifying a CUDA/ROCm version.
pip install cupy-wheel
This package is only available for the stable release as the current pre-release wheels are not hosted in PyPI.
This feature is currently experimental and subject to change so we recommend users not to distribute packages relying on it for now. Your suggestions or comments are highly welcomed (please visit #6688.)
cudaDevAttrMemoryPoolsSupported
to hip (#6626)float16::operator-()
only for ROCm 5.0+ (#6629)cupy.polyval
(#6666)--pre
option to instructions installing pre-releases (#6614)jenkins
requirements (#6634)TestIncludesCompileCUDA
for HEAD tests (#6650)/test mini
(#6655)The CuPy Team would like to thank all those who contributed to this release!
@asi1024 @code-review-doctor @danielg1111 @emcastillo @kmaehashi @leofang @takagi
This is the release note of v10.3.1. See here for the complete list of solved issues and merged PRs.
This is a hot-fix release for v10.3.0 which contained a regression that prevents CuPy from working on older CUDA GPUs (Maxwell or earlier).
The CuPy Team would like to thank all those who contributed to this release!
@kmaehashi @takagi