-
Notifications
You must be signed in to change notification settings - Fork 32
Open
Labels
Description
I want to begin a discussion about standardizing
the .zmetadata format for consolidated metadata.
Suppose we have this Zarr container.
.zgroup -- of the root group
var1
.zarray -- for var1
subgroup1
.zgroup
var2
.zarray -- for var2
.zattrs -- for var2
This structure needs to be encoded as JSON in the .zmetadata object.
I can see two obvious encodings:
- nested encoding
{
".zgroup": {<contents of the .zgroup>},
"var1": {
".zarray": {<contents of .zarray>},
}
"subgroup1": {
".zgroup": {<contents of the .zgroup>},
"var2": {
".zarray": {<contents of .zarray>},
".zattrs": {<contents of .zattrs>},
}
}
}
- flat-key encoding
{
"/.zgroup": {<contents of the .zgroup>},
"/var1/.zarray": {<contents of .zarray>},
"/subgroup1/.zgroup": {<contents of the .zgroup>},
"/subgroup1/var2/.zarray": {<contents of .zarray>},
"/subgroup1/var2/.zattrs": {<contents of .zattr>},
}
My observations:
- The flat-key encoding should, as a rule, be slightly smaller than the
nested encode - The nested encoding would easier to process into internal data structures,
but that would depend on the implementation. It would be faster for netcdf-c,
but might not be for zarr-python. - Note that I have prefixed each key with "/", but that is just my choice; a decision is need about that.
- The one example I have seen in the wild uses flat-key encoding.
- The flat-key encoding has no entries for non-content bearing objects. So, for example, there is no "/subgroup1" key nor a "/subgroup1/var2" key. This seems reasonable since it would not add any useful information.