from_hdf5 function that uses filenames rather than files

The current approach to load data from HDF5 is to load in an h5py or netcdf4 Dataset object and pass that to `from_array`.  This is efficient, but also includes the file pointer within the dask graph.  This fails if you need to serialize the graph, such as is necessary in distributed computing.

It might be wise to instead create an explicit `da.from_hdf5` function that only stored the filename, datapath, and slice information within the graph.  

It would also be nice if this API was exposed up to `xarray` (cc @shoyer).

cc @rabernat @pwolfram


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

from_hdf5 function that uses filenames rather than files #922

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

from_hdf5 function that uses filenames rather than files #922

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions