-
Notifications
You must be signed in to change notification settings - Fork 85
Description
Hi there,
I've been trying out the new kvikio.zarr.open_cupy_array feature at https://docs.rapids.ai/api/kvikio/23.10/zarr (added in #267), and have been encountering some issues with the LZ4 decoding. A little unsure if this belongs here in kvikIO or on numcodecs, but I'll post it here first.
Minimal example using CPU LZ4 decoder:
import cupy as cp
import zarr
import kvikio.zarr
za = zarr.create(
shape=(10,), fill_value=1, meta_array=cp.empty(shape=()), compressor=kvikio.zarr.LZ4()
)
zarr.save("rand10.zarr", cp.asnumpy(za[:]), zarr_version=2)
zg = kvikio.zarr.open_cupy_array(
store="rand10.zarr", compressor=kvikio.zarr.CompatCompressor.lz4(), mode="r", path="/"
)
print(zg.compressor) # Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)
zg[:]yields this error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[8], line 1
----> 1 zg[:]
File ~/mambaforge/envs/kvikio-env/lib/python3.10/site-packages/zarr/core.py:844, in Array.__getitem__(self, selection)
842 result = self.get_orthogonal_selection(pure_selection, fields=fields)
843 else:
--> 844 result = self.get_basic_selection(pure_selection, fields=fields)
845 return result
File ~/mambaforge/envs/kvikio-env/lib/python3.10/site-packages/zarr/core.py:970, in Array.get_basic_selection(self, selection, out, fields)
968 return self._get_basic_selection_zd(selection=selection, out=out, fields=fields)
969 else:
--> 970 return self._get_basic_selection_nd(selection=selection, out=out, fields=fields)
File ~/mambaforge/envs/kvikio-env/lib/python3.10/site-packages/zarr/core.py:1012, in Array._get_basic_selection_nd(self, selection, out, fields)
1006 def _get_basic_selection_nd(self, selection, out=None, fields=None):
1007 # implementation of basic selection for array with at least one dimension
1008
1009 # setup indexer
1010 indexer = BasicIndexer(selection, self)
-> 1012 return self._get_selection(indexer=indexer, out=out, fields=fields)
File ~/mambaforge/envs/kvikio-env/lib/python3.10/site-packages/zarr/core.py:1388, in Array._get_selection(self, indexer, out, fields)
1385 if math.prod(out_shape) > 0:
1386 # allow storage to get multiple items at once
1387 lchunk_coords, lchunk_selection, lout_selection = zip(*indexer)
-> 1388 self._chunk_getitems(
1389 lchunk_coords,
1390 lchunk_selection,
1391 out,
1392 lout_selection,
1393 drop_axes=indexer.drop_axes,
1394 fields=fields,
1395 )
1396 if out.shape:
1397 return out
File ~/mambaforge/envs/kvikio-env/lib/python3.10/site-packages/zarr/core.py:2228, in Array._chunk_getitems(self, lchunk_coords, lchunk_selection, out, lout_selection, drop_axes, fields)
2226 for ckey, chunk_select, out_select in zip(ckeys, lchunk_selection, lout_selection):
2227 if ckey in cdatas:
-> 2228 self._process_chunk(
2229 out,
2230 cdatas[ckey],
2231 chunk_select,
2232 drop_axes,
2233 out_is_ndarray,
2234 fields,
2235 out_select,
2236 partial_read_decode=partial_read_decode,
2237 )
2238 else:
2239 # check exception type
2240 if self._fill_value is not None:
File ~/mambaforge/envs/kvikio-env/lib/python3.10/site-packages/zarr/core.py:2098, in Array._process_chunk(self, out, cdata, chunk_selection, drop_axes, out_is_ndarray, fields, out_selection, partial_read_decode)
2096 if isinstance(cdata, PartialReadBuffer):
2097 cdata = cdata.read_full()
-> 2098 self._compressor.decode(cdata, dest)
2099 else:
2100 if isinstance(cdata, UncompressedPartialReadBufferV3):
File numcodecs/blosc.pyx:563, in numcodecs.blosc.Blosc.decode()
File ~/mambaforge/envs/kvikio-env/lib/python3.10/site-packages/numcodecs/compat.py:154, in ensure_contiguous_ndarray(buf, max_buffer_size, flatten)
126 def ensure_contiguous_ndarray(buf, max_buffer_size=None, flatten=True) -> np.array:
127 """Convenience function to coerce `buf` to a numpy array, if it is not already a
128 numpy array. Also ensures that the returned value exports fully contiguous memory,
129 and supports the new-style buffer interface. If the optional max_buffer_size is
(...)
151 return a view on memory exported by `buf`.
152 """
--> 154 return ensure_ndarray(
155 ensure_contiguous_ndarray_like(
156 buf, max_buffer_size=max_buffer_size, flatten=flatten
157 )
158 )
File ~/mambaforge/envs/kvikio-env/lib/python3.10/site-packages/numcodecs/compat.py:67, in ensure_ndarray(buf)
48 def ensure_ndarray(buf) -> np.ndarray:
49 """Convenience function to coerce `buf` to a numpy array, if it is not already a
50 numpy array.
51
(...)
65 return a view on memory exported by `buf`.
66 """
---> 67 return np.array(ensure_ndarray_like(buf), copy=False)
File cupy/_core/core.pyx:1475, in cupy._core.core._ndarray_base.__array__()
TypeError: Implicit conversion to a NumPy array is not allowed. Please use `.get()` to construct a NumPy array explicitly.If I try to force decoding with nvCOMP's GPU-based LZ4 decompressor like so:
zg._compressor = kvikio.zarr.LZ4()
zg[:]it also yields an error:
---------------------------------------------------------------------------
OutOfMemoryError Traceback (most recent call last)
Cell In[11], line 1
----> 1 zg[:]
File ~/mambaforge/envs/kvikio-env/lib/python3.10/site-packages/zarr/core.py:844, in Array.__getitem__(self, selection)
842 result = self.get_orthogonal_selection(pure_selection, fields=fields)
843 else:
--> 844 result = self.get_basic_selection(pure_selection, fields=fields)
845 return result
File ~/mambaforge/envs/kvikio-env/lib/python3.10/site-packages/zarr/core.py:970, in Array.get_basic_selection(self, selection, out, fields)
968 return self._get_basic_selection_zd(selection=selection, out=out, fields=fields)
969 else:
--> 970 return self._get_basic_selection_nd(selection=selection, out=out, fields=fields)
File ~/mambaforge/envs/kvikio-env/lib/python3.10/site-packages/zarr/core.py:1012, in Array._get_basic_selection_nd(self, selection, out, fields)
1006 def _get_basic_selection_nd(self, selection, out=None, fields=None):
1007 # implementation of basic selection for array with at least one dimension
1008
1009 # setup indexer
1010 indexer = BasicIndexer(selection, self)
-> 1012 return self._get_selection(indexer=indexer, out=out, fields=fields)
File ~/mambaforge/envs/kvikio-env/lib/python3.10/site-packages/zarr/core.py:1388, in Array._get_selection(self, indexer, out, fields)
1385 if math.prod(out_shape) > 0:
1386 # allow storage to get multiple items at once
1387 lchunk_coords, lchunk_selection, lout_selection = zip(*indexer)
-> 1388 self._chunk_getitems(
1389 lchunk_coords,
1390 lchunk_selection,
1391 out,
1392 lout_selection,
1393 drop_axes=indexer.drop_axes,
1394 fields=fields,
1395 )
1396 if out.shape:
1397 return out
File ~/mambaforge/envs/kvikio-env/lib/python3.10/site-packages/zarr/core.py:2228, in Array._chunk_getitems(self, lchunk_coords, lchunk_selection, out, lout_selection, drop_axes, fields)
2226 for ckey, chunk_select, out_select in zip(ckeys, lchunk_selection, lout_selection):
2227 if ckey in cdatas:
-> 2228 self._process_chunk(
2229 out,
2230 cdatas[ckey],
2231 chunk_select,
2232 drop_axes,
2233 out_is_ndarray,
2234 fields,
2235 out_select,
2236 partial_read_decode=partial_read_decode,
2237 )
2238 else:
2239 # check exception type
2240 if self._fill_value is not None:
File ~/mambaforge/envs/kvikio-env/lib/python3.10/site-packages/zarr/core.py:2098, in Array._process_chunk(self, out, cdata, chunk_selection, drop_axes, out_is_ndarray, fields, out_selection, partial_read_decode)
2096 if isinstance(cdata, PartialReadBuffer):
2097 cdata = cdata.read_full()
-> 2098 self._compressor.decode(cdata, dest)
2099 else:
2100 if isinstance(cdata, UncompressedPartialReadBufferV3):
File ~/mambaforge/envs/kvikio-env/lib/python3.10/site-packages/kvikio/zarr.py:240, in NVCompCompressor.decode(self, buf, out)
237 if is_host_buffer:
238 buf = cupy.asarray(buf)
--> 240 ret = self.get_nvcomp_manager().decompress(buf)
242 if is_host_buffer:
243 ret = cupy.asnumpy(ret)
File ~/mambaforge/envs/kvikio-env/lib/python3.10/site-packages/kvikio/nvcomp.py:148, in nvCompManager.decompress(self, data)
133 """Decompress a GPU buffer.
134
135 Parameters
(...)
143 An array of `self.dtype` produced after decompressing the input argument.
144 """
145 self.decompression_config = (
146 self._manager.configure_decompression_with_compressed_buffer(asarray(data))
147 )
--> 148 decomp_buffer = cp.empty(
149 self.decompression_config["decomp_data_size"], dtype="uint8"
150 )
151 self._manager.decompress(asarray(decomp_buffer), asarray(data))
152 return decomp_buffer.view(self.input_type)
File ~/mambaforge/envs/kvikio-env/lib/python3.10/site-packages/cupy/_creation/basic.py:22, in empty(shape, dtype, order)
7 def empty(shape, dtype=float, order='C'):
8 """Returns an array without initializing the elements.
9
10 Args:
(...)
20
21 """
---> 22 return cupy.ndarray(shape, dtype, order=order)
File cupy/_core/core.pyx:132, in cupy._core.core.ndarray.__new__()
File cupy/_core/core.pyx:220, in cupy._core.core._ndarray_base._init()
File cupy/cuda/memory.pyx:740, in cupy.cuda.memory.alloc()
File cupy/cuda/memory.pyx:1426, in cupy.cuda.memory.MemoryPool.malloc()
File cupy/cuda/memory.pyx:1447, in cupy.cuda.memory.MemoryPool.malloc()
File cupy/cuda/memory.pyx:1118, in cupy.cuda.memory.SingleDeviceMemoryPool.malloc()
File cupy/cuda/memory.pyx:1139, in cupy.cuda.memory.SingleDeviceMemoryPool._malloc()
File cupy/cuda/memory.pyx:1384, in cupy.cuda.memory.SingleDeviceMemoryPool._try_malloc()
File cupy/cuda/memory.pyx:1387, in cupy.cuda.memory.SingleDeviceMemoryPool._try_malloc()
OutOfMemoryError: Out of memory allocating 4,607,182,418,800,017,408 bytes (allocated so far: 3,584 bytes).The OutOfMemoryError is a bit concerning, because it is trying to allocate 4.607 Exabytes of memory on the GPU 😅
Version information from mamba list | grep cuda + numcodecs:
cuda-python 11.8.2 py310h01a121a_0 conda-forge
cuda-version 11.8 h70ddcb2_2 conda-forge
cudatoolkit 11.8.0 h4ba93d1_12 conda-forge
kvikio 23.10.00a cuda11_py310_231004_gcec84d8_26 rapidsai-nightly
libkvikio 23.10.00a cuda11_231003_gcec84d8_26 rapidsai-nightly
librmm 23.10.00a cuda11_231003_g9f105431_27 rapidsai-nightly
numcodecs 0.11.0 py310heca2aa9_1 conda-forge
pytorch 2.0.0 cuda112py310he0931da_302 conda-forge
rmm 23.10.00a cuda11_py310_231004_g9f105431_27 rapidsai-nightly
Was wondering if there's an extra flag that I'm missing, or if there's some bug that needs to be fixed. I'm mainly interested in the nvCOMP GPU-based decompression over the numcodecs CPU-based one, but it might be best to figure out both.