Skip to content

Illegal memory access in cupy.sum (cupy 13.6.0) #9780

@chillenb

Description

@chillenb

Description

cupy.sum() crashes on H200 for large arrays. (cupy 13.6)

Traceback (most recent call last):
  File "/nfs/roberts/project/pi_tz324/cgh42/si-gpu/err2.py", line 4, in <module>                                        
    cp.asnumpy(b)
  File "/nfs/roberts/project/pi_tz324/shared/software/cupy/13.6.0/cupy/__init__.py", line 807, in asnumpy               
    return a.get(stream=stream, order=order, out=out, blocking=blocking)                                                
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                
  File "cupy/_core/core.pyx", line 1873, in cupy._core.core._ndarray_base.get                                           
  File "cupy/_core/core.pyx", line 1978, in cupy._core.core._ndarray_base.get                                           
  File "cupy/cuda/memory.pyx", line 690, in cupy.cuda.memory.MemoryPointer.copy_to_host_async                           
  File "cupy_backends/cuda/api/runtime.pyx", line 636, in cupy_backends.cuda.api.runtime.memcpyAsync                    
  File "cupy_backends/cuda/api/runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status                   
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered      
[[email protected] si-gpu]$ vim err2.py                                                                         
[[email protected] si-gpu]$ python err2.py                                                                      
[[email protected] si-gpu]$ vim err2.py                                                                         
[[email protected] si-gpu]$ python err2.py                                                                      
Traceback (most recent call last):
  File "/nfs/roberts/project/pi_tz324/cgh42/si-gpu/err2.py", line 4, in <module>                                        
    cp.asnumpy(b)
  File "/nfs/roberts/project/pi_tz324/shared/software/cupy/13.6.0/cupy/__init__.py", line 807, in asnumpy               
    return a.get(stream=stream, order=order, out=out, blocking=blocking)                                                
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                
  File "cupy/_core/core.pyx", line 1873, in cupy._core.core._ndarray_base.get                                           
  File "cupy/_core/core.pyx", line 1978, in cupy._core.core._ndarray_base.get                                           
  File "cupy/cuda/memory.pyx", line 690, in cupy.cuda.memory.MemoryPointer.copy_to_host_async                           
  File "cupy_backends/cuda/api/runtime.pyx", line 636, in cupy_backends.cuda.api.runtime.memcpyAsync                    
  File "cupy_backends/cuda/api/runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status                   
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered

To Reproduce

import cupy as cp                                                                                                       
j3c = cp.zeros((40000, 400, 200), dtype=cp.float32)
b = j3c.sum(axis=2)
cp.asnumpy(b)

Installation

Built from GitHub source

(Also tried pip install cupy-cuda12x in fresh venv)

Environment

OS                           : Linux-4.18.0-553.87.1.el8_10.x86_64-x86_64-with-glibc2.28                                
Python Version               : 3.12.3
CuPy Version                 : 13.6.0
CuPy Platform                : NVIDIA CUDA
NumPy Version                : 2.4.2
SciPy Version                : 1.17.1
Cython Build Version         : 3.1.8
Cython Runtime Version       : 3.0.12
CUDA Root                    : /apps/software/2024a/software/CUDA/12.6.0
nvcc PATH                    : /apps/software/2024a/software/CUDA/12.6.0/bin/nvcc
CUDA Build Version           : 12060
CUDA Driver Version          : 12080
CUDA Runtime Version         : 12060 (linked to CuPy) / 12060 (locally installed)
CUDA Extra Include Dirs      : []
cuBLAS Version               : 120600
cuFFT Version                : 11206
cuRAND Version               : 10307
cuSOLVER Version             : (11, 6, 4)
cuSPARSE Version             : 12502
NVRTC Version                : (12, 6)
Thrust Version               : 200800
CUB Build Version            : 200800
Jitify Build Version         : 1a0ca0e
cuDNN Build Version          : None
cuDNN Version                : None
NCCL Build Version           : 22602
NCCL Runtime Version         : 22602
cuTENSOR Version             : 20500
cuSPARSELt Build Version     : 801
Device 0 Name                : NVIDIA H200
Device 0 Compute Capability  : 90
Device 0 PCI Bus ID          : 0000:CB:00.0

Additional Information

Build process:

export CUPY_NVCC_GENERATE_CODE="arch=compute_90,code=sm_90"
python -m build --wheel --no-isolation

With CUPY_ACCELERATORS=cutensor,cub, cupy will use cutensor rather than cub, and this bug doesn't occur.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions