Add a How-To guide on storing and loading array data

Right now the _only_ information on storing data in NumPy arrays on disk is the bare-bones function listing in https://numpy.org/devdocs/reference/routines.io.html. All it mentions is the functions available in NumPy itself.

It would be very useful to have a How-To guide (see [NEP 44](https://numpy.org/neps/nep-0044-restructuring-numpy-docs.html) for what a How-To guide is) on data storage covering:
- the options: NumPy built-in functionality (mainly the `.npy`/`.npz` format), Zarr, HDF5 & co, Bloscpack, pickling (anything else?)
- short summary of the storage model (e.g. Zarr chunked compressed n-D arrays, optionally in groups; HDF5 filesystem-like in one file)
- performance (I/O speed, size on disk)
- portability
- dependencies
- maturity
- recommendations for what to use when

My impression is that we should use `.npy` for the really simple cases, and direct people to Zarr for pretty much anything else (related: https://github.com/zarr-developers/community/issues/28).

Data source(s) to use: 
- preferably use real-world data (we plan to produce a datasets package that we can rely on for the NumPy docs). 
- should use a regular dtype (e.g. float64), but can also show a structured dtype and an object dtype (solutions may be different for those).

_TBD: how much example code should it have, and if we do add example code with Zarr and perhaps also Pytables, should we run the code in CI so we're sure that code remains working._

My tentative answer: yes let's add example code, yes let's run CI for any code we add, TBD can we do this sensibly with Sphinx or is this a good time to start with notebooks?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add a How-To guide on storing and loading array data #15760

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Add a How-To guide on storing and loading array data #15760

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions