NumPy & SciPy for GPU
This is the release note of v11.6.0. See here for the complete list of solved issues and merged PRs.
This is the last planned release for CuPy v11 series. Please start testing your workload with the v12 release candidate to get ready for the final v12 release. To install:pip install -U --pre cupy-cuda11x -f https://pip.cupy.dev/pre
. See the Upgrade Guide for the list of possible breaking changes in v12.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
This release fixes a critical performance regression in CUDA 12.0 that the on-disk kernel cache is ineffective, causing kernels to be recompiled for each python process. Users with CUDA 12.0 are strongly suggested to upgrade to this release.
runtime.getDeviceProperties
(#7353)cupy.cuda.profiler.initialize
deprecated as it is removed in CUDA 12 (#7379)arange()
to raise TypeError
in boolean case (#7407)cupyx.scipy.sparse.eigsh
(#7361)TestRoundHalfway
(#7362)nanargmin/max
tests (#7381)fillvalue
overflow in cupyx.scipy.signal
test (#7401)The CuPy Team would like to thank all those who contributed to this release!
@asi1024 @emcastillo @kmaehashi @leofang @RisaKirisu
This is the release note of v12.0.0rc1. See here for the complete list of solved issues and merged PRs.
This is a release candidate of the CuPy v12 series. Please start testing your workload with this release to prepare for the final v12 release. To install: pip install -U --pre cupy-cuda11x -f https://pip.cupy.dev/pre
. See the Upgrade Guide for the list of possible breaking changes in v12.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
cupyx.scipy.interpolate
The following interpolators have been implemented: BPoly
, Akima1DInterpolator
, PchipInterpolator
.
Acknowledgements: This work was done by Edgar Andrés Margffoy Tuay (@andfoy) and Evgeni Burovski (@ev-br) under the support of the Chan Zuckerberg Initiative's Essential Open Source Software for Science program.
CuPy is now compatible with DLPack v0.8 to allow importing/exporting bool
arrays.
This release fixes a critical performance regression in CUDA 12.0 that the on-disk kernel cache is ineffective, causing kernels to be recompiled for each python process. Users with CUDA 12.0 are strongly suggested to upgrade to this release.
cupy.cuda.Device
Behavior (#7427)The CUDA current device (set via cupy.cuda.Device.use()
or underlying CUDA API cudaSetDevice()
) will now be reactivated when exiting a cupy.cuda.Device
context manager. This reverts the change introduced in CuPy v10, making the behavior identical to the one in CuPy v9 or earlier. Please refer to the Upgrade Guide for the background of this decision.
As per NEP 29, CuPy v12 drops support for Python 3.7 and NumPy 1.20. Support for SciPy 1.6 has been dropped as well.
Texture reference features (RawModule.get_texref()
and TextureReference
), which were marked deprecated in CUDA 10.1 and removed in CUDA 12.0, have been removed from CuPy.
cupyx.distributed._array
implementation (#7040)PchipInterpolator
to cupyx.scipy.interpolate
(#7255)Akima1DInterpolator
to cupyx.scipy.interpolate
(#7260)cached_code
to ElementwiseKernel
and ReductionKernel
(#7265)RegularGridInterpolator
(#7334)BPoly
to cupyx.scipy.interpolate module
(#7343)runtime.getDeviceProperties
(#7302)cupy.cuda.profiler.initialize
deprecated as it is removed in CUDA 12 (#7377)cupy.cuda.Device
behavior to v9 (#7427)ndarray.fill
to raise ComplexWarning
(#7393)arange()
to raise TypeError
in boolean case (#7394)cupyx.scipy.sparse.eigsh
(#7356)CUPY_INCLUDE_PATH
and CUPY_LIBRARY_PATH
env vars (#7305)TestRoundHalfway
(#7338)nanargmin/max
tests (#7341)fillvalue
overflow in cupyx.scipy.signal
test (#7397)The CuPy Team would like to thank all those who contributed to this release!
Contributors: @andfoy @asi1024 @emcastillo @ev-br @kmaehashi @leofang @Nordicus @Raghav323 @RisaKirisu @seberg @wstolp
This is the release note of v11.5.0. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
CuPy now supports CUDA 12.0 and NVIDIA's latest H100 GPU. Binary packages are available for Linux (x86_64/aarch64) and Windows.
$ pip install cupy-cuda12x
For aarch64:
$ pip install cupy-cuda12x -f https://pip.cupy.dev/aarch64
Note that cuDNN support is unavailable at this time as cuDNN for CUDA 12 has not yet been released.
cupy-cuda12x
to cupy-wheel
(#7327)array_api
(#7321)broadcast_to
(#7291)The CuPy Team would like to thank all those who contributed to this release!
@anaruse @hubertlu-tw @kmaehashi @leofang @takagi
This is the release note of v12.0.0b3. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
CuPy now supports CUDA 12.0 and NVIDIA's latest H100 GPU. Binary packages are available for Linux (x86_64/aarch64) and Windows.
$ pip install cupy-cuda12x --pre -f https://pip.cupy.dev/pre
Note that cuDNN support is unavailable at this time as cuDNN for CUDA 12 has not yet been released.
NVTX support in CuPy is now backed by NVTX3 instead of the legacy NVTX1.
cupyx.scipy.interpolate.make_interp_spline
(#7195)RegularGridInterpolator
and interpn
from scipy.interpolate
(#7197)PPoly
to cupyx.scipy.interpolate
(#7204)uniform()
to random generator (#7205)make_interp_spline(..., bc_type="periodic")
(#7206)CubicHermiteSpline
to cupyx.scipy.interpolate
(#7242)cupyx.scipy.sparse.linalg.spsolve
: allow two-dimensional right-hand sides in A @ X = B
(#7219)cupyx.scipy.sparse.linalg.eigsh
(#7269)RegularGridInterpolator
(#7275)cupy-cuda12x
to cupy-wheel
(#7300)array_api
(#7313)broadcast_to
(#7271)cupyx.scipy.interpolate
(#7262)The CuPy Team would like to thank all those who contributed to this release!
@anaruse @andfoy @asi1024 @ev-br @hubertlu-tw @ideasrule @kmaehashi @leofang @mandal-saswata @oishigyunyu @takagi
This is the release note of v11.4.0. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
lexsort
(#7191)cupyx.scipy.ndimage.zoom
for outputs of size 1 when mode is 'opencv'
(#7202)list(kwargs)
instead of list(kwargs.keys)
(#7213)The CuPy Team would like to thank all those who contributed to this release!
@emcastillo @hadipash @jjmortensen @kmaehashi @takagi
This is the release note of v12.0.0b2. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
cupyx.scipy.interpolate
APIs (#7086, #7190 and #7215)Increased coverage of cupyx.scipy.interpolate
APIs, which now includes BSpline
, RBFInterpolator
, splantider
and splder
.
Acknowledgements: This work was done by Edgar Andrés Margffoy Tuay (@andfoy) and Evgeni Burovski (@ev-br) under the support of the Chan Zuckerberg Initiative's Essential Open Source Software for Science program.
cupyx.jit
(#7145)Now it is possible to use the CUB reduction classes, cub::WarpReduce
and cub::BlockReduce
, in kernels written using CuPy JIT.
import cupy, cupyx
from cupy.cuda import runtime
from cupyx import jit
@jit.rawkernel()
def warp_reduce_sum(x, y):
WarpReduce = jit.cub.WarpReduce[cupy.int32]
temp_storage = jit.shared_memory(
dtype=WarpReduce.TempStorage, size=1)
i, j = jit.blockIdx.x, jit.threadIdx.x
value = x[i, j]
aggregator = WarpReduce(temp_storage[0])
aggregate = aggregator.Reduce(value, jit.cub.Sum())
if j == 0:
y[i] = aggregate
warp_size = 64 if runtime.is_hip else 32
h, w = (32, warp_size)
x = cupy.arange(h * w, dtype=cupy.int32).reshape(h, w)
cupy.random.shuffle(x)
y = cupy.zeros(h, dtype=cupy.int32)
warp_reduce_sum[h, w](x, y)
Acknowledgements: This work was done by Tsutsui Masayoshi (@TsutsuiMasayoshi) as a part of the internship program at Preferred Networks.
BSpline
to interpolate
module (#7086)cub::WarpReduce
and cub::BlockReduce
(#7145)cupyx.scipy.interpolate.RBFInterpolator
(#7190)splder
and splantider
(#7215)_PerfCaseResult.to_str
format (#7152)lexsort
(#7178)cupyx.scipy.ndimage.zoom
for outputs of size 1 when mode is 'opencv'
(#7192)warnings.warn()
(#7194)list(kwargs)
instead of list(kwargs.keys)
(#7203)comb()
for Python 3.7 (#7221)The CuPy Team would like to thank all those who contributed to this release!
@anaruse @andfoy @asi1024 @emcastillo @ev-br @hadipash @jjmortensen @kmaehashi @takagi @TsutsuiMasayoshi
This is the release note of v11.3.0. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
This release adds support for CUDA 11.8 and the latest NVIDIA H100 GPUs. Note that CUDA 11.8 support is included in the cupy-cuda11x
wheel.
Wheels are now available for Python 3.11.
cutensorPermutation
(#7083)inf
/nan
in cupy.fuse (#7128)op.routine
including in0_type
(#7096)__del__
in TCPStore
(#7111)cupy.nansum
in fusing (#7114)TypeError
in cupy._core.fusion._call_ufunc()
(#7130)/test jenkins
to trigger Jenkins only (#7129)The CuPy Team would like to thank all those who contributed to this release!
@anaruse @andfoy @asi1024 @emcastillo @kmaehashi @leofang @takagi
This is the release note of v12.0.0b1. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
This release adds support for CUDA 11.8 and the latest NVIDIA H100 GPUs. Note that CUDA 11.8 support is included in the cupy-cuda11x
wheel.
Wheels are now available for Python 3.11.
ufunc
MethodsThis release adds ufunc.reduce
, ufunc.accumulate
, ufunc.reduceat
, and ufunc.at
methods. See the documentation for more details.
cupyx.jit
(#7054, #7139)Now it is possible to use the Thrust library device functions in kernels written using CuPy JIT.
import cupy, cupyx
@cupyx.jit.rawkernel()
def sort_by_key(x, y):
i = cupyx.jit.threadIdx.x
x_array = x[i]
y_array = y[i]
cupyx.jit.thrust.sort_by_key(
cupyx.jit.thrust.device,
x_array.begin(),
x_array.end(),
y_array.begin(),
)
h, w = (256, 256)
x = cupy.arange(h * w, dtype=cupy.int32)
cupy.random.shuffle(x)
x = x.reshape(h, w)
y = cupy.arange(h * w, dtype=cupy.int32)
cupy.random.shuffle(y)
y = y.reshape(h, w)
sort_by_key[1, 256](x, y)
Currently supported Thrust functions are count
, copy
, find
, mismatch
, sort
, sort_by_key
.
Acknowledgements: This work was done by Tsutsui Masayoshi (@TsutsuiMasayoshi) as a part of the internship program at Preferred Networks.
ndarray.scatter_{add,max,min}
(#7097)cupy.ndarray.scatter_{add,max,min}
methods are marked as deprecated. Use the corresponding ufunc methods (cupy.{add,maximum,minimum}.at
) instead.
cupyx
(#7013)Previously, CuPy has been providing high-level wrappers for CUDA libraries as cupy.cudnn
, cupy.cusolver
, cupy.cusparse
, and cupy.cutensor
. These modules are now moved to cupyx
as a part of the cupy
namespace cleanup. The old modules are still available but marked as deprecated. Note that these modules are still undocumented and may be subject to change.
axis
to cupy.logspace
(#6797)thrust::count, device
in CuPy JIT (#7054)cupy.ndarray.searchsorted
(#7059)add.at
, maximum.at
, minimum.at
(#7077)subtract.at
, bitwise_and.at
, bitwise_or.at
, bitwise_xor.at
(#7099)ufunc.reduce
and ufunc.accumulate
(#7105)cupy.add.reduceat
(#7115)cupy.min_scalar_type
(#7136)cupy.cudnn
cupy.cusolver
cupy.cutensor
cupy.cusparse
to cupyx
(#7013)ndarray.scatter_{add, max, min}
(#7097)inf
/nan
in cupy.fuse (#7122)TypeError
instead of ValueError
in cupy.from_dlpack
when CPU tensor is passed (#7133)__del__
in TCPStore
(#6989)op.routine
including in0_type
(#7076)cupy.nansum
in fusing (#7102)TypeError
in cupy._core.fusion._call_ufunc()
(#7113)_ufunc_method
directory (#7116)/test jenkins
to trigger Jenkins only (#7126)thrust::sort
test (#7162)The CuPy Team would like to thank all those who contributed to this release!
@anaruse @andfoy @asi1024 @Diwakar-Gupta @emcastillo @IncubatorShokuhou @kmaehashi @MarcoGorelli @takagi @TsutsuiMasayoshi
This is the release note of v12.0.0a2. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
cupyx.scipy
APIs (#6773, #6990, #7014, #7015, #7036)The coverage of SciPy interpolate
& special
APIs has increased. (Thanks @khushi-411 & @1MrEnot!)
ufunc
methods (#7049)Starting from v12, CuPy will support the corresponding NumPy ufunc
methods.
This release adds compatibility with ufunc.outer
. Check the tracking issue (#7082) for detailed information.
cupyx.scipy.special.logsumexp
(#6773)cupyx.scipy.interpolate.KroghInterpolator
(#6990)scipy.special.expi
and scipy.special.exp1
(#7014)cupy.byte_bounds
(#7015)cupyx.scipy.special.k0
, cupyx.scipy.special.k1
, cupyx.scipy.special.k0e
, cupyx.scipy.special.k1e
(#7036)ufunc.outer
(#7049)cupy.apply_along_axis
for tuple retval (#7068)cutensorPermutation
(#7070)csrsm2
memory leak (#7039)_routine_indexing.pyx
(#7053)The CuPy Team would like to thank all those who contributed to this release!
@1MrEnot @andfoy @asi1024 @betatim @khushi-411 @kmaehashi @leofang @maronuu @takagi @wyli
This is the release note of v11.2.0. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
csrsm2
memory leak (#7041)_routine_indexing.pyx
(#7056)distutils.utils
(#7009)The CuPy Team would like to thank all those who contributed to this release!
@andfoy @asi1024 @betatim @kmaehashi @leofang @takagi @wyli