-
-
Notifications
You must be signed in to change notification settings - Fork 1k
Illegal memory access in cupy.sum (cupy 13.6.0) #9780
Copy link
Copy link
Open
Description
Description
cupy.sum() crashes on H200 for large arrays. (cupy 13.6)
Traceback (most recent call last):
File "/nfs/roberts/project/pi_tz324/cgh42/si-gpu/err2.py", line 4, in <module>
cp.asnumpy(b)
File "/nfs/roberts/project/pi_tz324/shared/software/cupy/13.6.0/cupy/__init__.py", line 807, in asnumpy
return a.get(stream=stream, order=order, out=out, blocking=blocking)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "cupy/_core/core.pyx", line 1873, in cupy._core.core._ndarray_base.get
File "cupy/_core/core.pyx", line 1978, in cupy._core.core._ndarray_base.get
File "cupy/cuda/memory.pyx", line 690, in cupy.cuda.memory.MemoryPointer.copy_to_host_async
File "cupy_backends/cuda/api/runtime.pyx", line 636, in cupy_backends.cuda.api.runtime.memcpyAsync
File "cupy_backends/cuda/api/runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered
[[email protected] si-gpu]$ vim err2.py
[[email protected] si-gpu]$ python err2.py
[[email protected] si-gpu]$ vim err2.py
[[email protected] si-gpu]$ python err2.py
Traceback (most recent call last):
File "/nfs/roberts/project/pi_tz324/cgh42/si-gpu/err2.py", line 4, in <module>
cp.asnumpy(b)
File "/nfs/roberts/project/pi_tz324/shared/software/cupy/13.6.0/cupy/__init__.py", line 807, in asnumpy
return a.get(stream=stream, order=order, out=out, blocking=blocking)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "cupy/_core/core.pyx", line 1873, in cupy._core.core._ndarray_base.get
File "cupy/_core/core.pyx", line 1978, in cupy._core.core._ndarray_base.get
File "cupy/cuda/memory.pyx", line 690, in cupy.cuda.memory.MemoryPointer.copy_to_host_async
File "cupy_backends/cuda/api/runtime.pyx", line 636, in cupy_backends.cuda.api.runtime.memcpyAsync
File "cupy_backends/cuda/api/runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered
To Reproduce
import cupy as cp
j3c = cp.zeros((40000, 400, 200), dtype=cp.float32)
b = j3c.sum(axis=2)
cp.asnumpy(b)Installation
Built from GitHub source
(Also tried pip install cupy-cuda12x in fresh venv)
Environment
OS : Linux-4.18.0-553.87.1.el8_10.x86_64-x86_64-with-glibc2.28
Python Version : 3.12.3
CuPy Version : 13.6.0
CuPy Platform : NVIDIA CUDA
NumPy Version : 2.4.2
SciPy Version : 1.17.1
Cython Build Version : 3.1.8
Cython Runtime Version : 3.0.12
CUDA Root : /apps/software/2024a/software/CUDA/12.6.0
nvcc PATH : /apps/software/2024a/software/CUDA/12.6.0/bin/nvcc
CUDA Build Version : 12060
CUDA Driver Version : 12080
CUDA Runtime Version : 12060 (linked to CuPy) / 12060 (locally installed)
CUDA Extra Include Dirs : []
cuBLAS Version : 120600
cuFFT Version : 11206
cuRAND Version : 10307
cuSOLVER Version : (11, 6, 4)
cuSPARSE Version : 12502
NVRTC Version : (12, 6)
Thrust Version : 200800
CUB Build Version : 200800
Jitify Build Version : 1a0ca0e
cuDNN Build Version : None
cuDNN Version : None
NCCL Build Version : 22602
NCCL Runtime Version : 22602
cuTENSOR Version : 20500
cuSPARSELt Build Version : 801
Device 0 Name : NVIDIA H200
Device 0 Compute Capability : 90
Device 0 PCI Bus ID : 0000:CB:00.0
Additional Information
Build process:
export CUPY_NVCC_GENERATE_CODE="arch=compute_90,code=sm_90"
python -m build --wheel --no-isolation
With CUPY_ACCELERATORS=cutensor,cub, cupy will use cutensor rather than cub, and this bug doesn't occur.
Reactions are currently unavailable