Skip to content

from_hdf5 function that uses filenames rather than files #922

@mrocklin

Description

@mrocklin

The current approach to load data from HDF5 is to load in an h5py or netcdf4 Dataset object and pass that to from_array. This is efficient, but also includes the file pointer within the dask graph. This fails if you need to serialize the graph, such as is necessary in distributed computing.

It might be wise to instead create an explicit da.from_hdf5 function that only stored the filename, datapath, and slice information within the graph.

It would also be nice if this API was exposed up to xarray (cc @shoyer).

cc @rabernat @pwolfram

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions