Skip to content

Releases: cupy/cupy

v14.1.1

01 Jun 08:42
2c7c8a5

Choose a tag to compare

CuPy v14.1.1 Release Note

This hotfix release removes an unexpected dependency on pytest in the v14.1.0 release.

πŸ’¬ Join the Matrix chat to talk with developers and users and ask quick questions!

πŸ™Œ Help us sustain the project by sponsoring CuPy!

πŸ“ Changes

See here for the complete list of merged PRs.

Bug Fixes

  • BUG: Avoid hard pytest dependency in cupy.testing (and test) (#9968)

Installation

  • Bump version to v14.1.1 (#9971)

πŸ‘₯ Contributors

The CuPy Team would like to thank all those who contributed to this release!

@kmaehashi @seberg

v14.1.0

23 May 01:34
7ed30be

Choose a tag to compare

CuPy v14.1.0 Release Note

This release for the CuPy v14 series introduces new features, enhancements, and bug fixes.

πŸ’¬ Join the Matrix chat to talk with developers and users and ask quick questions!

πŸ™Œ Help us sustain the project by sponsoring CuPy!

✨ Highlights

Support for large sparse matrices

CuPy now supports large sparse matrices, allowing 64-bit sized dimensions and number of nonzero elements. Similar to SciPy, creation functions will automatically choose the larger index dtype for the sparsity pattern. The added functionality mostly uses newly wrapped cuSPARSE calls.

Initial support for free-threaded Python

CuPy 14.1 now releases free-threaded Python 3.14t Linux wheels and includes a number of thread-safety fixes. As threading issues can be intermittent, please report any issues you encounter. A known limitation is that some threaded CUDA graph-capture calls may fail when using threads.

Support for structured dtypes with fields

CuPy now supports structured dtypes with fields in kernels. This enables previously missing features such as comparisons and casts/copies. Because CUDA requires a larger alignment in some cases, CuPy now includes the make_aligned_dtype helper to create structured dtypes with larger alignments than guaranteed by NumPy’s align=True.

Caching for CUDA C++ template instantiations

CuPy now caches template kernels instantiated using name_expressions with RawModule. This avoids recompilation in cases where CuPy was previously unable to use the on-disk cache.

Optional faster kernel compilation using PCH

Users can now set the environment variable CUPY_NVRTC_USE_PCH=1 to use NVRTC’s precompiled headers (PCH) with CUDA 12.8+. This can drastically speed up compilation of multiple kernels and should be especially useful when the on-disk cache is cold or not used.

Support for CUDA 13.2

CuPy now supports CUDA 13.2 and NCCL 2.29.

New API coverage

CuPy now supports cupy.byteswap, cupy.isdtype, cupy.matrix_transpose, cupy.linalg.matmul and the cupyx.scipy.linalg.sparse.bicgstab (BIConjugate Gradient STABilized) solver. cupy.repeat was sped up and extended to allow a CuPy array of repeats.

πŸ“ Changes

See here for the complete list of merged PRs.

New Features

  • Implement ndarray.byteswap() (#9868)
  • Add bicgstab solver for sparse linear systems (#9889)
  • Add int64 index support to cupyx.scipy.sparse (#9914)
  • ENH: (almost) full structured dtype support (#9927)

Enhancements

  • BUG,MAINT: Restructure SingleDeviceMemoryPool and locking (#9802)
  • Always query hipcc for include directories in hiprtc (#9820)
  • Allow cupy.ndarray as repeats argument to cupy.repeat (#9855)
  • Support caching CUBINs generated with name_expressions (#9912)
  • Deprecate sparse matrix APIs removed in SciPy 1.14 (#9921)
  • Add cupy.linalg.matmul, cupy.linalg.matrix_transpose and cupy.matrix_transpose (#9929)

NumPy/SciPy Compatibility

  • Skip test_solve_singular_empty on NumPy >= 2.4 (#9843)
  • Add cupy.isdtype (#9891)

Performance Improvements

  • Allow using PCH via CUPY_NVRTC_USE_PCH=1 and use it for tests (#9783)
  • Use expected precision in cupyx.scipy.ndimage interpolation functions (zoom, shift, rotate, affine_transform, map_coordinates) (#9808)
  • Remove NumericTraits specializations for complex types (#9883)
  • Validate hypergeometric inputs without syncing (#9885)

Bug Fixes

  • fix(hip): cap linear_launch grid dim to prevent AQL work-item overflow (#9747)
  • MAINT: add missing names to linalg.all (#9762)
  • Guard against None conda prefix in _get_conda_cuda_path() (#9784)
  • Fix DistributedArray missing NotImplementedError overrides for mdspan and mT (#9789)
  • cupyx.scipy.sparse.linalg.gmres: report non-convergence when maxiter is not divisible by restart (#9796)
  • BUG: Fix bug with minimum_phase (#9806)
  • BUG: Don't use --device-as-default-execution-space for hip (#9819)
  • BUG: Fix regression for 32bit index flag in .real and broadcast (#9865)
  • BUG: Fix incomplete size guard for CUB segmented reduce and scan (#9869)
  • BUG: Make cutensor bindings threadsafe (and some small fixes) (#9870)
  • MAINT,BUG: cleanup pending, simplify PooledMemory, use pymutex (#9874)
  • sparse: work around cuSPARSE SpMM gridDim.y overflow (#9850) (#9875)
  • BUG: fix cupy.interp returning nan at exact knot when fp contains inf (#9876)
  • fix(hip): use event-based sync for cross-device D2D copies (#9879)
  • Fix ZeroDivisionError when sorting along zero-length axis (#9816) (#9880)
  • Fix remaining floating-point inconsistencies and improve tests for cupyx.scipy.ndimage.interpolation (#9893)
  • Fix hip mask errors (#9897)
  • Fix silent corruption in thrust sort/argsort/lexsort under OOM (#9901)
  • Fix cupy.kron raising ValueError on empty arrays + refactor and performance improvement (#9917)
  • scipy.ndimage.label index overflow (#9919)
  • Use cuda.pathfinder for CUDA component discovery (#9933)
  • BUG: Introduce a "promotion" step and fix integer comparisons (#9935)
  • Fix arguments to get_current_callback_manager for HIP (#9955)

Code Fixes

  • Remove old Python 2 buffer protocol functions to remove warnings (#9726)
  • Implement kernel cache save/load abstraction (#9743)
  • Cython Compilation Warnings of implicit noexcept (#9754)
  • Ingnore Cython IF warnings and avoid DEF uses (#9788)
  • TST: Fixup some more tests (mainly cupyx) for free-threading (#9821)
  • Bump SciPy minimum to 1.14 and remove now-dead version handling (#9925)

Documentation

  • DOC: Fix scipy.linalg.* comparison table (#9744)
  • DOC: note in Comparison Table that np.ma is not implemented; suggest alternative (#9844)
  • Document env vars for source builds in Conda envs (#9924)

Installation

  • Bump version to v14.1.0 (#9953)

Tests

  • TST: Protect graph tests from failing when GPU is busy (#9716)
  • CI: Bump windows kernel cache size (#9735)
  • TST: migrate tests from unittest to pytest (#9740)
  • CI: Use GCP-backed kernel cache in Windows CI (#9761)
  • CI: Use GCP-backed kernel cache in Linux CI (#9770)
  • CI: Fix cuda120 CI failures due to FutureWarning (#9778)
  • Update test_assumed_runtime_version for Windows + CUDA >=13.0 (#9786)
  • TST: Skip many tests when running with pytest-run-parallel (#9798)
  • CI: Make sure local cache is warmed up at job start time (#9799)
  • CI: Revert to use NCCL 2.28 in CUDA 13.1 CI (#9836)
  • Small Fix for cusparseLt v0.9.0 (#9837)
  • Cherry pick rocm fixes (#9871)
  • DEV: Make CUPY_TEST_GPU_LIMIT more reliable. (#9882)
  • Remove test_assumed_runtime_version (#9903)
  • CI: Add CUDA 13.2 and NCCL 2.29 support (#9908)
  • TST: Work around NumPy 2.4.5 regression in conj(). (#9931)
  • Advertise free-threading support and add linux CI run (#9934)

πŸ‘₯ Contributors

The CuPy Team would like to thank all those who contributed to this release!

@astroboylrx @Bhuvan1527 @eriknw @ev-br @gdaisukesuzuki @gpinkert @grlee77 @ikrommyd @jberg5 @jeremyfirst22 @kmaehashi @larsoner @leofang @ManuCorrea @marco-pas @mdhaber @megha-darda @seberg

v14.0.1

20 Feb 13:39
3df3b7e

Choose a tag to compare

CuPy v14.0.1 Release Note

This is a hot-fix release that addresses several issues reported after the v14.0.0 release.

Note

Check out our blog post for the key highlights and major changes in CuPy v14!

πŸ“ Changes

See here for the complete list of merged PRs.

Bug Fixes

  • Fix minimum NCCL version (#9701)
  • Fix cupy_backends.cuda.libs not raising AttributeError (#9724)
  • Fix: relax dtype check in views (including zero-copy array constructors) (#9725)

Documentation

  • Doc: Fix missing API references (#9708)

Installation

  • Bump version to v14.0.1 (#9731)
  • Bump version in Dockerfile to v14.0.1 (#9733)

Tests

  • CI: Remove mpi4py v3 from CI (#9715)
  • CI: Add cupy.win.cuda131 (#9730)

Others

  • CI: Adjust wait/retry in PyPI upload workflow (#9710)

πŸ‘₯ Contributors

The CuPy Team would like to thank all those who contributed to this release!

@kmaehashi @leofang

v14.0.0

18 Feb 09:39
e0faa4d

Choose a tag to compare

CuPy v14.0.0 Release Note

CuPy v14 is our first major update in two years, bringing significant enhancements to the ecosystem, including NumPy v2 semantics, improved installation via CUDA Pip Wheels, bfloat16 and structured dtypes, and expanded platform support.

Note

Check out our blog post for the key highlights and major changes in CuPy v14!

The following part of the release note only covers the changes since the last pre-release (v14.0.0rc1).

πŸ“ Changes

See here for the complete list of merged PRs.

New Features

  • Add %gpu_timeit IPython magic (#9572)
  • Support viewing CuPy arrays as cuda::std::mdspan on host/device (#9639)
  • Add CUDA Stream Protocol support (#9640)
  • ENH: ml_dtypes.bfloat16 support (#9659)

Enhancements

  • Support CUDA 13.1 and NCCL 2.28 (#9550)
  • Update to CCCL v3.1.2 (#9556)
  • Support cuTensor 2.4 (#9601)
  • MAINT: Delete the array api namespace and all related things (#9609)
  • Avoid allocating an intermediate array in cupy.random.choice (#9619)
  • DEP: Deprecate jitify=True support (and jitify=False) (#9671)
  • Support cp.from_dlpack with ml_dtypes.bfloat16 Optionally (#9675)
  • Improve pathfinder support (#9683)
  • Add guard for unsupported CUDA version in cuFFT setJITCallback (#9692)
  • ENH: make broadcast_arrays, meshgrid return a tuple not list (#9599)

Performance Improvements

  • MAINT,ENH: Simplify CScalar handling and ready it for arbitrary dtypes (#9546)
  • Add fast-path for gufunc (specifically matmul) (#9564)
  • MAINT,PERF: Add fast-path and avoid errors in operators/array-ufunc (#9574)
  • MAINT: A few small micro-optimizations (#9679)

Bug Fixes

  • Fix ctk extras (#9553)
  • Fix intersect1d crash with empty arrays (#9563)
  • Add N-D array support to upfirdn (#9565)
  • Fix conda compiler detection (#9577)
  • BUG: Make sure indexing routines are OK with huge arrays (#9580)
  • Add missing ROCm packages in duplicate detection (#9595)
  • ENH: do not unload modules/code that have been used (#9617)
  • BUG: Fix RandomState thread safety (and memory initialization) (#9623)
  • Fix #9568: Handle negative inputs in sawtooth and square waveforms (#9624)
  • Fix compute capability list and remove CUDA 11 related code (#9626)
  • TST: Skip bfloat16 dlpack test on old CUDA versions (#9681)
  • Fill bfloat16 gaps around arange and some reduction (#9694)

Code Fixes

  • Fix CI (mini) (#9578)
  • TYP: Fix small typing issues found in new pre-commit env (#9581)
  • TST: migrate tests from unittest to pytest (#9610)
  • Make FFT config more thread-safe and adatapt FFT tests (#9616)

Documentation

  • Update docs for CuPy v15 (#9545)
  • Fix #7828: Remove self from ndarray docstring signature (#9618)
  • Add missing items to migration guide (#9685)
  • Doc: add support for CUDA 13.1 (#9697)

Installation

  • Declare Python 3.14 support in package metadata (#9598)
  • Bump CuPy version in Docker images (#9702)

Tests

  • Update branch configuration (#9544)
  • Fix cuda-python CI & the test status retrieval in Windows CI (#9571)
  • BUG,TST: Fix cupy_tests/core_tests runs for pytest-run-parallel (#9583)
  • TST: Reorganize cublas tests for thread-safety (#9603)
  • Fix TestChoiceReplaceFalse check (#9633)
  • Support NumPy 2.4 (#9635)
  • Bump disk size limit for cuda.head CI (#9637)
  • Fix Windows CI + Tightening up /test logic (#9638)
  • CI: Update cache handling in CI (#9645)
  • CI: Bump disk image size for all ci axes (#9646)
  • CI: Fix expiry (#9652)
  • Avoid caching cuFFT legacy callback artifacts (#9678)
  • Upgrade rocm test environment to 7.1 (#9687)
  • TST: Slightly bump SVD test tolerance (but tighten it for float64) (#9688)

Others

πŸ‘₯ Contributors

The CuPy Team would like to thank all those who contributed to this release!

@asi1024 @EarlMilktea @ev-br @gpinkert @Harishjitu @isVoid @kmaehashi @leofang @megha-darda @seberg @aman-coder03 @yangcal @yujiteshima

v14.0.0rc1

22 Dec 14:36
66a29e2

Choose a tag to compare

v14.0.0rc1 Pre-release
Pre-release

CuPy v14.0.0rc1 Release Note

This is the release candidate of the CuPy v14 planned to be shipped in January 2026. We encourage you to start testing your workload with v14.0.0rc1 and report back any feedback on our issue tracker. Refer to the Upgrade Guide for the list of changes you need to be aware of when migrating from CuPy v13 or earlier. Pre-built binary packages are available for testing:

# For CUDA 13.x
pip install cupy-cuda13x --pre -U -f https://pip.cupy.dev/pre

# For CUDA 12.x
pip install cupy-cuda12x --pre -U -f https://pip.cupy.dev/pre

# For ROCm 7.0
pip install cupy-rocm-7-0 --pre -U -f https://pip.cupy.dev/pre

CuPy v14 introduces support for the CUDA Toolkit package distributed on PyPI. If the CUDA Toolkit is not present in your environment, you can install CuPy alongside the necessary toolkit components by using the [ctk] extras, as follows:

pip install 'cupy-cuda13x[ctk]' --pre -U -f https://pip.cupy.dev/pre

πŸ’¬ Join the Matrix chat to talk with developers and users and ask quick questions!

πŸ™Œ Help us sustain the project by sponsoring CuPy!

✨ Highlights

CUDA Python Packages

CuPy now supports NVIDIA CUDA packages distributed on PyPI! This enhancement allows users to leverage CuPy without a system-wide CUDA Toolkit installation, and also provides better interoperability with other Python packages that utilize CUDA, such as PyTorch.

AMD ROCm 7 Support

Support for the AMD ROCm 7 platform is now included in CuPy, along with a cupy-rocm-7-0 binary package specifically built for ROCm 7.0.

Enhanced NumPy/SciPy API Coverage

CuPy now offers a greater number of APIs compatible with NumPy and SciPy, including cupy.linalg.eig and cupy.linalg.eigvals.

πŸ› οΈ Breaking Changes

  • CuPy v14 follows NumPy 2 in most of its behavior.
  • Support for CUDA 11 and Python 3.9 has been dropped.
  • All cuDNN-related functionality has been completely removed from CuPy. We recommend users who need to access cuDNN functionality from Python to consider using cuDNN Frontend instead.

Please refer to the Upgrade Guide for details.

πŸ“ Changes

See here for the complete list of merged PRs.

New Features

  • Implement cupy.linalg.eig / cupy.linalg.eigvals (#8854, #8980)
  • Implement cupy.linalg.cond (#9140)
  • Support nD reductions for sparse arrays (#9209)
  • Add cupy.bool, cupy.long and cupy.ulong (#9253)
  • Add function bitwise_count (#9390)
  • ENH: Minimal support for structured dtypes (#9440)
  • Support CUDA wheels (#9444)

Enhancements

  • Releasing the GIL during Thrust sorts (#7760)
  • Support new cuFFT callbacks (#8242)
  • Make cupyx.scipy submodule imports lazy (#8706)
  • Patch for supporting cusparseLt 0.7.1 (#9004)
  • Migrate to pyproject.toml (#9079)
  • Allow build on ROCm 6.4 (#9099)
  • Implements Lapack potrs (#9116)
  • Support NCCL for aarch64 (#9137)
  • Add xsf as submodule for special function scalar kernels (#9159)
  • Support CUDA 12.9 and NCCL 2.26 (#9200)
  • Support loading NCCL from Pip packages (#9201)
  • Update cupyx.scipy.special functions for SciPy 1.16 (#9207)
  • Pin xsf to version 0.1.2 (#9250)
  • Add CUDA 13 & NCCL 2.27 support (#9289)
  • Remove bundled header files (#9295)
  • Support building NVTX on Windows without Nsight Systems (#9301)
  • Change default c++ RTC standard from 11 to 14 (#9334)
  • Support CUDA Array Inferface in ROCm build (#9340)
  • Build against cuda-bindings 12.x (#9377)
  • feat(kernels) fix HIPRTC build error ROCm 6/7 (#9382)
  • feat(JIT) Remove CCCL includes dir (#9383)
  • feat(hipCUB) Use c++17 for hipCUB (#9384)
  • Enable rocm7 (#9398)
  • Drop support of CUDA 11.x and NumPy v1.x (#9406)
  • feat(ptds) Add PTDS support and launch_host_func (#9407)
  • [nccl] add nccl comm split (#9411)
  • Fix message of shims removed in NumPy v2 (#9412)
  • Replace fastrlock with C++ recursive_mutex (#9414)
  • ENH,MAINT: Allow cupy-like protocols in setitem and consolidate (#9418)
  • ENH: Accept nested CAI arrays in cupy.asarray and indexing (#9419)
  • Make cuda.is_available() guard against (almost) all errors (#9420)
  • Deprecate cupyx.tools.install_library tool (#9432)
  • Drop support for ROCm 6 or earlier (#9433)
  • ENH,BUG: Allow strides without mem, fix empty byte-bounds (#9453)
  • DOC: Remove experimental on async allocator (#9455)
  • Remove deprecated nvrtc.getNVVM API (#9457)
  • Adopt cuda.bindings new module layout (#9458)
  • Revert typo fix in cupy/_core/include/cupy/complex/complex.h (#9460)
  • patch for supporting cusparseLt 0.8.1 (#9509)
  • Fix csr_matrix.minimum/maximum dtype promotion rule (#8844)
  • ENH: cupyx/signal: add freqz_sos, a preferred alias for sosfreqz (#9114)
  • Fix cp.empty(None) to raise TypeError (#9160)
  • feat: make cupy.nan_to_num broadcast nan, posinf, and neginf kwargs (#9240)
  • Fix resample error message for SciPy 1.16 update (#9241)
  • Fix freqz for complex w (#9243)
  • Fix boxcox_llf for SciPy 1.16 (#9263)
  • Fix return dtype of csr_matrix minmax with scalar (#9409)
  • signal.cspline1d_eval,qspline1d_eval throw exception for empty cj array (#9484)
  • ENH: allow python scalars in the 2nd argument of searchsorted (#9512)

Performance Improvements

  • Add short cut for subsetting along the minor axis (#8468)
  • Implement lazy load for cuquantum (#9102)
  • Accelerate duplicate installation check (#9325)
  • Lazy load the testing module (#9336)
  • Delay all imports of cupyx inside cupy (#9338)
  • Fix cuTENSOR workspace size query (#9399)
  • Invoke thrust with par_nosync (#9497)

Bug Fixes

  • Fix illegal memory access in LinearNDInterpolator (#8983)
  • BUG: cupyx.scipy.signal: make gammatone return arrays (#9117)
  • Support Cython 3.1 (#9131)
  • BUG: fix cupyx.scipy.linalg.expm (#9142)
  • Fix cuSOLVER feature/version detection for eig and eigvals (#9147)
  • Fix overflow in CUB reduction (#9248)
  • Fix lsmr type promotion rule for complex dtype (#9273)
  • Allow host function call during CUDA graph capture (#9279)
  • Fix UnboundLocalError when blocking=True (#9280)
  • CUDA 11.1 or earlier is no longer supported (#9281)
  • [BugFix] Fix upfirdn kernel launch bug for 2D arrays (#9352)
  • Fix tf2sos failing for constant transfer function. (#9395)
  • Fix repeated variable in hilbert2 (#9396)
  • Fix Python version requirements in pyproject.toml (#9421)
  • Change nccl get_unique_id to return a bytes string (#9438)
  • [bug] Include type_traits in filters (#9479)
  • File cache: use os.replace (clarity) and accept PermissionError (#9483)
  • BUG: Fix incorrect initialization in bspline kernel (#9486)
  • Do not pass filter=data for ZIP (cuTENSOR on Windows) (#9492)
  • BUG: Fix typo in advise and prefetch affecting cuda 13 (#9493)
  • Fix Windows directory path for cuTENSOR 2.3+ (#9519)
  • Fix cuTENSOR import libs missing in Windows by cupyx.tools.install_library installation (#9527)

Code Fixes

  • Enable ruff rule UP (#8849)
  • Add ruff rules for static typing (#9154)
  • Remove deprecated modules (#9337)
  • Fix for Ruff UP041 (#9423)
  • Fix for Ruff UP007, UP035, UP045 (#9424)
  • MAINT: Specify texture address precisely (#9471)
  • (small fix) amend generate.py around cuSPARSELt (#9524)

Documentation

  • Add an AI policy to prohibit misuse of the issue tracker (#9062)
  • Update ROCm docs (#9105)
  • Fix missing items in API reference (#9130)
  • Docs: Update build-time requirement of Cython (#9134)
  • Improve API reference list (#9165)
  • Fix WARNING: Inline emphasis start-string without end-string (#9167)
  • Bump supported NumPy version to v2.3 (#9198)
  • Improve RawKernel documentation regarding views (closes #9233) (#9275)
  • Docs only: s/"recoreded"/recorded (#9287)
  • CUDA 13 Update docs (#9294)
  • CI: update docs (#9375)
  • Improve CI docs (#9415)
  • feat(docs) Updating AMD docs (#9451)
  • Fixed some typos in the documentation (#9454)
  • Fix typos in kernels.rst (#9467)
  • [docs] Update README.md (#9478)
  • Prepare for upgrade guide for CuPy v14 (#9485)
  • DOC: Add Nsight Compute profiling tutorial for CuPy kernels (#9514)
  • Remove outdated compiler requirement info in install docs (#9520)
  • Fill in compatibility matrix upper bound for CuPy v13 (#9521)
  • DOC: add support for Python 3.14 (#9530)

Installation

  • Limit Cython version to 3.0 or 3.1 (#9133)
  • Make rebuild faster for development (#9136)
  • Bump supported NumPy version for CuPy v14 (#9164)
  • Fix long_description missing after pyproject.toml migration (#9227)
  • Do not include files listed in MANIFEST.in to wheels (#9230)
  • Drop cuDNN entirely (#9326)
  • Bump CUDA/Ubuntu version in Docker image (#9342)
  • Update conda-build support for conda CUDA 13 packages (#9378)
  • Update install_library.py to support cuTENSOR 2.3 and drop CUDA 11.x (#9439)
  • Fix unnecessary assertion handling in setup.py (#9499)
  • MAINT: Remove -fno-gnu-unique again (#9517)

Tests

  • Fix handling of ROCm self-hosted CIs (#8860)
  • Do gc.collect() in MemoryHook test code to avoid free hook to happen (#9092)
  • np.unique_values may return unsorted data from NumPy 2.3 (#9161)
  • np.sum has numerical change in NumPy 2.3 (#9162)
  • Add test cases for batchwise solve_triangular (as xfail) (#9173)
  • CI: NumPy 2.3 (#9178)
  • Add NumPy 2.3 + windows CI (#9195)
  • Update pre-commit settings (#9199)
  • Add cupy.win.cuda129 CI (#9213)
  • Fix test trigger phrase for cupy.win.cuda129 CI (#9215)
  • CI: Introduce per-PR kernel cache (#9234)
  • Skip some signal...
Read more

v13.6.0

18 Aug 09:28
25e552d

Choose a tag to compare

This is the release note of v13.6.0. See here for the complete list of solved issues and merged PRs.

🌏 We just launched our LinkedIn page. Follow us for the latest news and updates!

πŸ’¬ Join the Matrix chat to talk with developers and users and ask quick questions!

πŸ™Œ Help us sustain the project by sponsoring CuPy!

✨ Highlights

This release adds support for CUDA 13.x. Binary packages are available on PyPI: pip install cupy-cuda13x.

πŸ“ Changes

Enhancements

  • Update cupyx.scipy.special functions for SciPy 1.16 (#9246)
  • Add CUDA 13 support (#9300)
  • Support building NVTX on Windows without Nsight Systems (#9304)
  • Remove bundled header files (#9305)
  • Fix freqz for complex w (#9259)
  • Fix resample error message for SciPy 1.16 update (#9262)

Bug Fixes

  • Fix overflow in CUB reduction (#9254)
  • Fix lsmr type promotion rule for complex dtype (#9277)
  • Fix UnboundLocalError when blocking=True (#9282)
  • Allow host function call during CUDA graph capture (#9283)
  • CUDA 11.1 or earlier is no longer supported (#9285)

Documentation

  • Docs only: s/"recoreded"/recorded (#9288)
  • CUDA 13 Update docs (#9299)

Installation

  • [v13] Bump version to v13.6.0 (#9314)

Tests

  • CI: Introduce per-PR kernel cache (#9235)
  • Add test cases for batchwise solve_triangular (as xfail) (#9245)
  • Relax tolerance of test_hilbert (#9255)
  • Skip some signal q dtype tests (#9256)
  • Increase CPU memory limit of linux.cuda{128,129} CIs (#9261)
  • Support nD reductions for sparse arrays (#9268)
  • [v13] Missing backport of special function tests (#9269)
  • [v13] Wrong test skip condition of test_zscore_empty (#9270)
  • Support SciPy 1.16 on Windows (#9276)
  • Support SciPy 1.16 on Linux (#9284)
  • CI: NVTX1 removed from Windows machine image (#9303)
  • Fix CI failure in CUDA 12.4 (#9311)
  • [v13] Fix scipy version condition of COO matrix test (#9312)

Others

πŸ‘₯ Contributors

The CuPy Team would like to thank all those who contributed to this release!

@asi1024 @brycelelbach @Ellecee @emcastillo @kmaehashi @robertmaynard

v13.5.1

11 Jul 04:59
f450813

Choose a tag to compare

This is the release note of v13.5.1. This is a hot-fix release to address an issue related to the buffer protocol support for UMP added in v13.5.0 (#9223). See here for the complete list of solved issues and merged PRs.

πŸ’¬ Join the Matrix chat to talk with developers and users and ask quick questions!

πŸ™Œ Help us sustain the project by sponsoring CuPy!

πŸ“ Changes

Bug Fixes

  • Fix buffer protocol to raise TypeError when it is not meant to be supported (#9222)

Installation

  • Bump version to v13.5.1 (#9224)
  • Fix long_description missing after pyproject.toml migration (#9231)

πŸ‘₯ Contributors

The CuPy Team would like to thank all those who contributed to this release!

@kmaehashi @leofang

v13.5.0

03 Jul 06:59
e30a0cc

Choose a tag to compare

Note

2025-07-11: We have marked this release as "yanked" on PyPI to prevent new installations due to unexpected regressions. The hot-fix release v13.5.1 is available.

This is the release note of v13.5.0. See here for the complete list of solved issues and merged PRs.

πŸ’¬ Join the Matrix chat to talk with developers and users and ask quick questions!

πŸ™Œ Help us sustain the project by sponsoring CuPy!

✨ Highlights

  • CuPy now supports NVIDIA CUDA 12.9 and AMD ROCm 6.4 platforms, and NumPy 2.3.
  • Unified Memory Programming support for HMM/ATS-enabled systems (such as NVIDIA Grace Hopper Superchip) has been added. Refer to the documentation for the usage.
  • Binary packages on PyPI (wheels) can now load NCCL packages installed via Pip (e.g., nvidia-nccl-cu12). In addition, Arm (aarch64) wheels are now built with NCCL support enabled.

Request for Comments

We are going to finalize the following RFC issues.

  • Drop support for cuDNN in CuPy v14 (#8215)
  • Update set of supported ROCm versions in CuPy v13/v14 (#8607)
  • Remove cupyx.tools.install_library in CuPy v14 (#9204)

πŸ“ Changes

New Features

  • Support system allocated memory (#9033)

Enhancements

  • Fix rocThrust build for ROCm 6.3 (#9023)
  • Allow discovering cuTENSOR using major version (#9037)
  • Support FIPS enabled machines with MD5 hashing (#9055)
  • Update cutensornet accelerator based on cuquantum-python 25.03 deprecation (#9058)
  • Refactor hashing (#9059)
  • Raise user warning in both {to,from}Dlpack & Update the Interoperability page (#9061)
  • Allow build on ROCm 6.4 (#9100)
  • Migrate to pyproject.toml (#9135)
  • Support NCCL for aarch64 (#9141)
  • Support loading NCCL from Pip packages (#9208)
  • Support CUDA 12.9 and NCCL 2.26 (#9211)
  • Fix cupyx.scipy.stats.zscore for SciPy 1.15 (#9024)

Performance Improvements

  • Implement lazy load for cuquantum (#9104)

Bug Fixes

  • JIT: Support empty return (#9001)
  • API: Revert toDlpack() default to the old unversioned one (#9007)
  • BUG: Hot fix for numpy 2 support in some fusion paths (#9012)
  • Fix compilation error of cupy.inf in fusion2 (#9043)
  • Support Cython 3.1 (#9132)
  • Fix cupyx.scipy.linalg.expm (#9144)

Code Fixes

  • Fix get_typename to emit thrust::complex (#9054)

Documentation

  • Add an AI policy to prohibit misuse of the issue tracker (#9095)
  • Update ROCm docs (#9108)
  • Docs: Update build-time requirement of Cython (#9145)
  • Fix WARNING: Inline emphasis start-string without end-string (#9168)
  • Improve API reference list (#9189)
  • Bump supported NumPy version to v2.3 (#9203)

Installation

  • Limit Cython version to 3.0 or 3.1 (#9146)
  • Bump NumPy version restriction (#9166)
  • Make rebuild faster for development (#9196)
  • Bump version to v13.5.0 (#9212)

Tests

  • CI: Do not run full CI on CUDA 12.0/12.1/12.2 + Windows (#9000)
  • CI: Pin setuptools version on Windows (#9039)
  • Revert "CI: Pin setuptools version on Windows" (#9056)
  • Mark xfails in some spline tests for SciPy 1.15 (#9060)
  • Support SciPy 1.15 (#9063)
  • Skip some dtype checks with NumPy 2.x (#9064)
  • Skip tests for different behavior of integer overflow from NumPy 2 (#9072)
  • Skip some cupyx.scipy.special tests for SciPy 1.15 (#9073)
  • Skip some tests for numerical error from NumPy 2 (#9075)
  • Do gc.collect() in MemoryHook test code to avoid free hook to happen (#9093)
  • np.sum has numerical change in NumPy 2.3 (#9169)
  • Fix cp.empty(None) to raise TypeError (#9174)
  • CI: NumPy 2.3 (#9194)
  • Add NumPy 2.3 + windows CI (#9197)
  • Update pre-commit settings (#9202)
  • Add cupy.win.cuda129 CI (#9214)
  • Fix test trigger phrase for cupy.win.cuda129 CI (#9217)

Others

  • Allow specifying no libraries when generating wheel metadata (#9080)
  • Upgrade pre-commit hooks (#9156)

πŸ‘₯ Contributors

The CuPy Team would like to thank all those who contributed to this release!

@asi1024 @Azusachan @EarlMilktea @ev-br @jakirkham @kmaehashi @leofang @MattTheCuber @rongou @seberg @yangcal

v14.0.0a1

04 Apr 08:41
1e8ade1

Choose a tag to compare

v14.0.0a1 Pre-release
Pre-release

This is the release note of v14.0.0a1. See here for the complete list of solved issues and merged PRs.

πŸ’¬ Join the Matrix chat to talk with developers and users and ask quick questions!

πŸ™Œ Help us sustain the project by sponsoring CuPy!

✨ Highlights

This is the first alpha release of the CuPy v14 series, containing:

  • New type promotion rules and behaviors aligned with the NumPy 2 specification.
  • 42 new NumPy/SciPy-compatible APIs, including cupy.concat, cupyx.scipy.interpolate.CubicSpline, cupyx.scipy.spatial.Delaunay, cupyx.scipy.ndimage.find_objects, and cupyx.scipy.special.lambertw. See the Comparison Table for the detailed coverage.

Binary packages are available for testing. Try installing now by:

$ pip install cupy-cuda12x --pre -U -f https://pip.cupy.dev/pre

πŸ› οΈ Changes without compatibility

  • CuPy v14’s behavior will be aligned with NumPy v2.
  • Support for Python 3.9, NumPy 1.22 and 1.23, SciPy 1.7, 1.8, and 1.9 has been dropped. (#8491)
  • cupy.random.choice may return different results from CuPy v13. (#8483)
  • Building CuPy from source code now requires Cython 3.0. (#8457)
  • cupyx.scipy.linalg.{tri,tril,triu} APIs were removed from CuPy to follow the latest SciPy’s specification. Use cupy.{tri,tril.triu} instead. (#8499)
  • NumPy fallback mode (cupyx.fallback_mode) has been removed as discussed in #8497. (#8816)
  • Legacy DLPack APIs (cupy.toDlpack and cupy.fromDlpack) are now marked deprecated. Use cupy.from_dlpack instead. See the documentation for the usage. (#8831)

πŸ“ Changes

New Features

  • Add KDTree to cupyx.scipy.spatial (#7671)
  • Add neighbors option to RbfInterpolator (#7864)
  • ENH: cupyx/signal: add sweep_poly (#7873)
  • Add 2D Delaunay triangulation (#7985)
  • Add cupyx.signal.pulse_compression from cuSignal's non SciPy-compat API (#8022)
  • Add LinearNDInterpolator to cupyx.scipy.interpolate (#8035)
  • Add cupyx.signal.convolve1d3o from cuSignal's non SciPy-compat API (#8037)
  • Add cupyx.signal.{firfilter,firfilter_zi,firfilter2} (#8052)
  • Add cupyx.signal.{pulse_doppler, cfar_alpha} (#8057)
  • Add cupyx.signal.{complex_cepstrum,real_cepstrum,inverse_complex_cepstrum,minimum_phase} (#8062)
  • Add cupyx.signal.mvdr (#8077)
  • ENH: signal: add lanczos and kaiser_bessel_derived windows (#8081)
  • Add cupyx.signal.ca_cfar (#8087)
  • Add cupyx.signal.convolve1d2o (#8101)
  • Add cupyx.signal.freq_shift (#8128)
  • Add lambertw function (#8140)
  • Add cupyx.signal.channelize_poly (#8141)
  • Add cupyx.scipy.interpolate.CubicSpline (#8175)
  • Add apply_over_axes API (#8177)
  • Add cupy.put_along_axis API (#8199)
  • Add CloughTocher2DInterpolator to cupyx.scipy.interpolate (#8208)
  • Add NearestNDInterpolator to cupyx.scipy.interpolate (#8220)
  • Add NdBSpline to cupyx.scipy.interpolate (#8223)
  • ENH: cupyx/scipy/interpolate: add *UnivariateSpline for 1D smoothing splines (#8267)
  • Add NdBSpline based interpolation methods to RGI (#8276)
  • ENH: cupyx/interpolate: port interp1d from scipy (#8289)
  • Add batched solve_triangular (#8329)
  • Add Incomplete Elliptic Integrals to special (#8425)
  • Support system allocated memory (#8442)
  • Add CUDA graph debug function (#8502)
  • Add sici and shichi to special for sine and cosine integrals (#8620)
  • Update unique_xxx (nep52) (#8665)
  • Add cupyx.scipy.ndimage.find_objects (#8916)

Enhancements

  • Support for break and continue keywords in CuPy JIT (#8010)
  • Make cupyx.signal.radartools private (#8047)
  • Remove usages of numpy.float_ and numpy.complex_ (#8050)
  • Support cusparseLt 0.6.1 (#8074)
  • Add incontiguous support for cutensor functions (#8149)
  • Add complex support for the digamma function (#8163)
  • Fix expm(complex matrix) (#8206)
  • Add CutensorMg support (#8212)
  • Add cudaStreamCreateWithPriority (#8219)
  • Add the nearest method for percentile/quantile estimation (#8224)
  • Various Jitify improvements (#8235)
  • Support fallback algorithm for spgemm (#8252)
  • Bump to cuTENSOR 2.0.1 (#8282)
  • Preload cuTENSORMg (#8283)
  • Use weakref.finalize instead of __del__ for RandomState._generator destruction (#8315)
  • Support ROCm 6 (#8319)
  • cupyx: cleanup use of deprecated NumPy functionality (NumPy 2.0 compatibility) (#8320)
  • Add wright_bessel function to special (#8324)
  • MAINT: fft, linalg: add __all__ lists (#8333)
  • Cuda 12.5 Tests (#8337)
  • Add axes support in ndimage filters module (#8339)
  • MAINT: interpolate: update RBF to scipy 1.13 (#8343)
  • Make CuPy import under NumPy 2.0 (#8346)
  • Lazy-preload NCCL (#8360)
  • Fix map_coordinates recompilation condition (#8378)
  • Disable jitify for cub & Bump CCCL (#8412)
  • Use custom less instead of specializing thrust (#8446)
  • Port to Cython 3.0 (#8457)
  • Avoid using Jitify everywhere inside CuPy (#8467)
  • Get rid of pkg_resources (#8480)
  • Drop support for Python 3.9, NumPy 1.22 and 1.23, SciPy 1.7, 1.8 and 1.9 (#8491)
  • Remove deprecated cupyx.scipy.linalg.{tri,tril,triu} (#8499)
  • Use .toarray() instead of .A attribute (#8508)
  • Support half option in scipy.signal.minimum_phase (#8510)
  • Increase MAX_NDIM to 64 (#8511)
  • Support CUDA 12.6 (#8513)
  • Fallback to system headers for future CUDA 12.x versions (#8518)
  • Extend runtime header search logic to conda (#8519)
  • Support copy=None in cp.array / cp.asarray / cp.asanyarray (#8545)
  • Fix dtype rule of cupy.scipy.stats.entropy for SciPy 1.14 (#8547)
  • Support setuptools 74.0.0 or later (#8583)
  • Add NCCL_ERROR_REMOTE_ERROR to the set of errors from NCCL (#8662)
  • Replace numpy.ComplexWarning with cupy.exceptions.ComplexWarning (#8676)
  • ENH: Implement dlpack v1 (#8683)
  • Fix some NumPy 2.x CI failures (cont.) (#8695)
  • Bump CUDA version in cuda11x-cuda-python CI (#8737)
  • [ROCm 6.2.2] Conditionally define CUDA_SUCCESS only if it's not (#8793)
  • Remove fallback mode (#8816)
  • Raise user warning in both {to,from}Dlpack & Update the Interoperability page (#8831)
  • Use a custom Min/Max instead of specializing CUB (#8846)
  • Updating pylibraft pairwise_distance to cuvs (#8847)
  • add axes support for additional functions in cupyx.scipy.ndimage (from SciPy 1.15.0) (#8858)
  • Raise VisibleDeprecationWarning for wavelet functions (#8865)
  • Support CUDA 12.8 + Blackwell GPUs (sm_100, sm_120) (#8899)
  • Bump library installers for CUDA 12.8 (#8914)
  • Use CCCL 2.8.x branch + Use CUPY_CACHE_KEY in hash keys (#8919)
  • Use NVIDIA CCCL 2.8 latest w/CUDA 12.3 fix (#8924)
  • Use C++17 in JIT compile (#8940)
  • Restore CUB histogram and bincount (#8950)
  • Broaden usage of C++17 (#8952)
  • cupyx.scipy.distance: initialize output array with empty instead of zeros (#8971)
  • cupyx.scipy.spatial.distance.cdist remove explicit zeroing of user-provided output array (#8988)
  • Fix rocThrust build for ROCm 6.3 (#9022)
  • Allow discovering cuTENSOR using major version (#9030)
  • Update cutensornet accelerator based on cuquantum-python 25.03 deprecation (#9045)
  • Support FIPS enabled machines with MD5 hashing (#9053)
  • Refactor hashing (#9057)

Enhancements for NumPy & SciPy compatibility:

  • Fix scp.signal.{medfilt,medfilt2d} to raise ValueError for complex64 inputs (#8059)
  • Deprecate cupyx.scipy wavelet functions (#8061)
  • Fix csrmatrix.__pow__ to raise ValueError for non-int other (#8063)
  • Fix cupyx.scipy.special.betainc for invalid inputs (#8065)
  • scipy.special.{btdtr,btdtri} are deprecated since SciPy 1.12 (#8066)
  • Fix boxcox_llf for SciPy 1.12 changes (#8095)
  • NEP50 (#8323)
  • Resolve Ruff NPY errors - fix exception imports and asfarray usage in test code (#8455)
  • Fix sparse.linalg function signatures following SciPy 1.14 (#8526)
  • NumPy 2.0 compatibility: (partially) sync with NEP52 (#8531)
  • Fix dtype rule of special functions for SciPy 1.14 (#8532)
  • Fix cupy.histogram arg order to match NumPy (v1.24+) (#8559)
  • Make cupy.linalg.solve compatible with numpy v2 (#8629)
  • Silence FutureWarning emitted when rcond is missing (#8638)
  • Fix some NumPy 2.x CI failures (#8690)
  • Support kind arg. in sorting methods (#8708)
  • Fix cupy.percentile for NumPy 2.x (#8726)
  • Fix some NumPy 2.x CI failures (cupyx) (#8727)
  • Skip some tests incompatible with NumPy 2.2 (#8817)
  • Fix scipy.spmatrix.sign for complex dtype inputs (#8822)
  • Fix return type of cupy.where for scalar arguments for NumPy 2.0 (#8835)
  • Fix cupyx.scipy.special.logsumexp for NumPy 2.0 (#8836)
  • Fix cupy.cov (#8839)
  • Fix cupy.histogramdd for NumPy 2.x (#8873)
  • Raise ValueError upon attempts to create 3-dim sparse array (#8877)
  • Disable contiguous_check for COO/dense matmul test (#8878)...
Read more

v13.4.1

21 Mar 07:28
6e3c9b7

Choose a tag to compare

This is the release note of v13.4.1. This is a hot-fix release addressing several issues including DLPack compatibility with existing user code. See here for the complete list of solved issues and merged PRs.

πŸ’¬ Join the Matrix chat to talk with developers and users and ask quick questions!

πŸ™Œ Help us sustain the project by sponsoring CuPy!

πŸ“ Changes

Bug Fixes

  • Revert toDlpack() default to the old unversioned one (#9011)
  • Hot fix for numpy 2 support in some fusion paths (#9016)
  • Fix compilation error of cupy.inf in fusion2 (#9044)

Tests

  • CI: Pin setuptools version on Windows (#9047)

Others

  • Bump version to v13.4.1 (#9051)

πŸ‘₯ Contributors

The CuPy Team would like to thank all those who contributed to this release!

@asi1024 @kmaehashi @seberg