Skip to content

da.store for hdf file or to_hdf fails with distributed scheduler #2488

@martindurant

Description

@martindurant

h5py File objects do not serialize, so they can only be shared between tasks with the threaded scheduler. da.to_hdf creates an h5py File and then calls store.
Naturally, this only makes sense if the workers are on the same machine or otherwise have access to the same network file-system. Producing many hdf files from one dataset does not necessarily make much sense for arrays as opposed to dataframes.

dataframe's to_hdf is already fixed in this respect, so presumably some of that logic could be copied here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions