If a NetCDF file is chunked on disk, open it with compatible dask chunks

NetCDF4 data can be saved as chunks on disk, which [has several benefits](https://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters) including efficient reads when using a compatible chunk shape.  This is particularly important for files with chunk-based compression (ie all nc4 files with compression) or on HPC and parallel file systems ([eg](http://anusf.anu.edu.au/~jxa900/pres/COMP4300-2015/jxa900-Parallel-IO-reduced.pdf)), where IO is typically dominated by the number of reads and chunks-from-disk are often cached.  Caches are also common in network data backends such as Thredds OPeNDAP, in which case using disk-compatible chunks will reduce cache pressure as well as latency.

Xarray *can* use chunks, of course, but as of v0.9 the chunk size has to be specified manually - and the easiest way to discover it is to open the file and look at the `_Chunksizes` attribute for each variable.  I propose that `xr.open_dataset` (and `array`, and `mfdataset`) change their default behaviour.

If Dask is available and `chunks=None` (the default), `chunks` should be taken from the file on disk.  This may lead to a chunked or unchunked dataset.  To force an un-chunked load, users can specify `chunks={}`, or simple `.load()` the dataset after opening it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

If a NetCDF file is chunked on disk, open it with compatible dask chunks #1440

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

If a NetCDF file is chunked on disk, open it with compatible dask chunks #1440

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions