Extension to the HDF5 chunks API

Currently (`v1.11.1.0`), the treatment of HDF5 chunking is a bit inadequate:

- Chunking can only be set on a per-Data object basis
- Chunking can only be defined by explicitly setting the chunks shape on each axis
- Chunking is ignored in an output file unless native compression is on
- Chunks from an input file are not stored

A more comprehensive and flexible API is needed:

- `cfdm.write` should chunk by default, and have a keywork argument (`hdf5_chunks`) to configure the default chunking.
- `cfdm.read` should, by default, store HDF5 chunking on the returned data, so that it will be used when when writing out to a new netCDF4 file.
- Setting a HDF5 chunking strategy should be more intuitive. E.g. it should be easy to "chunk the time axis by 12 elements, leaving all other axes unchunked": `f.nc_set_hdf_chunksizes({'T': 12})`
- Setting HDF5 chunksizes follows the Dask API for defining its computaitonal chunk sizes. E.g. `f.nc_set_hdf_chunksizes("8 MiB")`

PR to follow.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extension to the HDF5 chunks API #309

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Extension to the HDF5 chunks API #309

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions