Skip to content

v2: Standardizing .zmetadata #113

@DennisHeimbigner

Description

@DennisHeimbigner

I want to begin a discussion about standardizing
the .zmetadata format for consolidated metadata.

Suppose we have this Zarr container.

.zgroup -- of the root group
var1
    .zarray -- for var1
subgroup1
    .zgroup
    var2
        .zarray -- for var2
        .zattrs  -- for var2    

This structure needs to be encoded as JSON in the .zmetadata object.
I can see two obvious encodings:

  1. nested encoding
{
".zgroup": {<contents of the .zgroup>},
"var1": {
    ".zarray": {<contents of .zarray>},
    }
"subgroup1": {
    ".zgroup": {<contents of the .zgroup>},
    "var2": {
        ".zarray": {<contents of .zarray>},
        ".zattrs": {<contents of .zattrs>},
        }
    }
}
  1. flat-key encoding
{
"/.zgroup": {<contents of the .zgroup>},
"/var1/.zarray": {<contents of .zarray>},
"/subgroup1/.zgroup": {<contents of the .zgroup>},
"/subgroup1/var2/.zarray": {<contents of .zarray>},
"/subgroup1/var2/.zattrs": {<contents of .zattr>},
}

My observations:

  • The flat-key encoding should, as a rule, be slightly smaller than the
    nested encode
  • The nested encoding would easier to process into internal data structures,
    but that would depend on the implementation. It would be faster for netcdf-c,
    but might not be for zarr-python.
  • Note that I have prefixed each key with "/", but that is just my choice; a decision is need about that.
  • The one example I have seen in the wild uses flat-key encoding.
  • The flat-key encoding has no entries for non-content bearing objects. So, for example, there is no "/subgroup1" key nor a "/subgroup1/var2" key. This seems reasonable since it would not add any useful information.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions