Is your feature request related to a problem? Please describe.
Be able to read files from numpy files and zarr in the anemoi-datasets style to create Graphs in the pytorch geometric HeteroData format.
These should create at least an encoder, processor, and decoder, but can be built to enable more flexible architectures down the line through configs.
The Edges should be flexible through classes and implement a KNN solution with different distance metrics and a Cutoff solution.
Describe the solution you'd like
Separate classes to build a
Nodes
- Zarr Node Builder
- Numpy Node Builder
Edge connection builders for
- KNN (using scikit-learn or similar)
- Cutoff (with a max radius)
Describe alternatives you've considered
Dictionaries, or custom DataClasses. HeteroData is more maintainable and implements additional functionality that is usable in different parts of training and Graph refinement.
Scikit-learn is a well-tested flexible framework we can easily depend on to build these in a preparation step of the training without introducing additional dependencies in either anemoi-models for inference.
Additional context
The solution should be flexible enough to build a graph that can be passed on to other functions and classes to transform the graph object further.
Organisation
ECMWF