-
Notifications
You must be signed in to change notification settings - Fork 14
Closed
Labels
dataset readRelating to reading datasetsRelating to reading datasetsdataset writeRelating to writing datasetsRelating to writing datasetsenhancementNew feature or requestNew feature or requestperformanceRelating to speed and memory performanceRelating to speed and memory performance
Milestone
Description
Currently (v1.11.1.0), the treatment of HDF5 chunking is a bit inadequate:
- Chunking can only be set on a per-Data object basis
- Chunking can only be defined by explicitly setting the chunks shape on each axis
- Chunking is ignored in an output file unless native compression is on
- Chunks from an input file are not stored
A more comprehensive and flexible API is needed:
cfdm.writeshould chunk by default, and have a keywork argument (hdf5_chunks) to configure the default chunking.cfdm.readshould, by default, store HDF5 chunking on the returned data, so that it will be used when when writing out to a new netCDF4 file.- Setting a HDF5 chunking strategy should be more intuitive. E.g. it should be easy to "chunk the time axis by 12 elements, leaving all other axes unchunked":
f.nc_set_hdf_chunksizes({'T': 12}) - Setting HDF5 chunksizes follows the Dask API for defining its computaitonal chunk sizes. E.g.
f.nc_set_hdf_chunksizes("8 MiB")
PR to follow.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
dataset readRelating to reading datasetsRelating to reading datasetsdataset writeRelating to writing datasetsRelating to writing datasetsenhancementNew feature or requestNew feature or requestperformanceRelating to speed and memory performanceRelating to speed and memory performance