Skip to content

Introducing a CF domain variable #301

@davidhassell

Description

@davidhassell

Introducing a CF domain variable

Moderator

@dblodgett-usgs

Moderator Status Review [last updated: 2020-10-15]

The proposal has been submitted and preliminarily reviewed by the moderator. Attention should be called to the potential for this proposal to subtly but fundamentally alter how CF-NetCDF data fields and domains are treated. Review from authors of CF-NetCDF client software is necessary here.

  1. By in large, the community is supportive of the proposal.
  2. There has been discussion of how to identify a domain variable: by cf_role: domain or by presence of a dimensions: "X Y Z ..." attribute. Presence of a dimensions attribute has won out for it's lack of redundancy.
  3. The title of section 5 will now be: "Coordinate systems and domain"
  4. There is some nuanced discussion of domain constructs for scalar (single-valued dimensionless / degenerate) coordinate variables. No major issues have been noted.
  5. There is discussion of multiple domain variables for a single domain. No major issues have been noted.
  6. The idea of adding a domain: domain_variable attribute on a data variable was suggested. Adding it would introduce redundancy and seems to be the wrong path.

As of 10-15-2020, discussion is slow but ongoing. I will check back in around the beginning of November.

Requirement Summary

The concept of a domain that describes data locations and cell properties is not currently mentioned in the CF conventions, because it does not correspond to any single entity in the netCDF file. Instead, the domain is stored implicitly in a number of other variables and attributes that are linked to the data variable in various ways defined by the conventions.

The domain is, however, well defined in the CF data model as an abstract concept (as opposed to a data model construct) that provides the linkage between the field construct and the metadata constructs that describe the relevant data locations and cell properties. There is currently no "domain construct" in the data model, since there is no corresponding CF-netCDF entity.

There is a need to be able to describe a domain independently of any data variables, which is currently not possible. Use cases include:

  • Curated data streaming services for which it is impractical to send very large domain descriptions with every file.

  • Storing time-dependent coordinates from remote sensing applications.

  • Storing geometries without any timeseries data.

For such use cases, it is not satisfactory to try to locate an appropriate multidimensional data variable that describes the required domain, nor to create a dummy data variable for this purpose, which has no physical meaning.

Therefore, the inclusion of CF-netCDF domain variables that can encode a domain independently of any data, and a corresponding data model domain construct, will enhance CF by meeting these use cases.

Technical Proposal Summary

NetCDF encoding

A new "domain variable" will be introduced that is of arbitrary type since it contains no data. This variable will act as a container to bind together other variables that collectively define a domain, in a similar manner to how a data variable performs the same task.

It will support the same CF attributes as are allowed on the data variable for describing a domain, with exactly the same meanings and syntaxes: cell_measures, coordinates, geometry, and grid_mapping. These will be indicated as domain variable attributes by the additional "Do" indicator (short for Domain) in the "Use" column of Appendix A: Attributes.

Any future CF attributes that a data variable may use to describe its domain will be similarly transferred to the domain variable, meaning that keeping the domain variable up to date with other enhancements will be a well defined and easy task.

There is no mechanism for referencing a domain variable from a data variable, i.e. a data variable must still encode its domain in the current, implicit manner. This is to preserve backwards compatibility with all existing software libraries that understand the current structure of a data variable; and to reduce redundancy or incompatibility issues that may arise if a data variable encodes its own domain and references a domain variable.

A domain variable may exist in a file with or without other data variables.

Data model

The domain in the data model will be transformed from an abstract concept into a "top-level" construct, i.e. one that can exist in the absence of any other constructs. Currently, the field construct (corresponding to a CF-netCDF data variable) is the only top-level construct.

The new domain construct will replace the current domain concept, replicating it every in every way apart from that it will be related to the field construct via an aggregation relationship, rather than by the current composition relationship of the abstract domain concept. This makes it clear that the domain construct can exist independently from the field construct.

It is of no consequence to the data model that a CF-netCDF data variable will not be able to explicitly reference a CF-netCDF domain variable. That is an encoding choice that does not affect the logical structure.

Location in the conventions document

  • The domain variable will be described in a new section: 5.8 Domain Variables

  • The following appendices will updated:

  • Appendix A: Attributes

  • Appendix I: The CF data model

  • CF Conformance Requirements and Recommendations

Benefits

All those who meet the use cases described in the Requirements summary will benefit from the new domain variable.

Status Quo

At present, a domain can only be encoded implicitly via a data variable, leading to ambiguities when retrieving a domain from a dataset.

Associated pull request

#302

Detailed Proposal

Conventions text has been proposed in chapter 5, appendices A and I, and the conformance document in pull request #302

Metadata

Metadata

Labels

enhancementProposals to add new capabilities, improve existing ones in the conventions, improve style or format

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions