Skip to content

Add classmethod to Hazard for reading raster-like data from NetCDF file #487

@peanutfun

Description

@peanutfun

The Hazard class offers several options to instantiate it from data files, e.g. from_raster, from_excel, etc. The classmethod from_raster, in particular, uses rasterio to open datasets and read their metadata, coordinates, and data. In this issue, I want to discuss if a general-purpose classmethod for reading data from a NetCDF file into a Hazard object might be useful, and how such a method could look like. A method implementing such a functionality to some extent can be found at climada_petals/blob/feature/wildfire/climada_petals/hazard/wildfire.py#L2247.

What the method should do

Use a single NetCDF file to load data for a consistent instance of Hazard, meaning that if data is missing, it will be set to a sensible default.

The minimal (i.e., essential) data supplied as variables in the file should be

  • hazard intensity data (2D or 3D dataset)
  • coordinates (1D dataset each)
  • time (1D dataset, if applicable)

Optional data could include:

  • hazard fraction data (same dimensions as intensity)
  • event frequency (1D)
  • event names (1D)
  • event IDs (1D)
  • coordinate system information (attributes/metadata)

Method signature

from_netcdf should take the following arguments:

  • data (path-like or xarray.Dataset, required): The dataset. Open the file if it is a path.
  • intensity_var (string, required): The name of the hazard intensity variable in the dataset
  • fraction_var (string, optional): The name of the hazard fraction variable in the dataset
  • coordinate_vars (dict, optional): A mapping from default coordinate names to the variables used as coords in the dataset, e.g. dict(longitude="lon", latitude="y")
  • tbd

Method outline

Suppose a netCDF file contains the following data:

  • intensity: 3D dataset (dims: "time", "longitude", "latitude")
  • 1D coordinate dataset for each dimension

Then the following code creates a consistent Hazard instance from this data:

import xarray as xr
from scipy.sparse import csr_matrix
from climada.hazard import Hazard
from climada.hazard.centroids.centr import Centroids

data = xr.open_dataset("...")
hazard = Hazard()

# Transpose the data so we flatten it with longitude running "fastest"
intensity = data["intensity"].transpose("time", "latitude", "longitude")
hazard.intensity = csr_matrix(intensity.values.reshape((data.sizes["time"], -1)))
hazard.intensity.eliminate_zeros()

# Build centroids
lat, lon = np.meshgrid(data["latitude"].values, data["longitude"].values, indexing="ij")
hazard.centroids = Centroids.from_lat_lon(lat.flatten(), lon.flatten())
hazard.centroids.set_lat_lon_to_meta()

# Consistent Hazard also needs
# hazard.fraction, hazard.event_id, hazard.event_name, hazard.frequency, hazard.date
# but these can be defaulted, e.g.
hazard.event_id = np.array(range(1, data.sizes["time"] + 1))

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions