Skip to content
This repository was archived by the owner on Dec 30, 2025. It is now read-only.
This repository was archived by the owner on Dec 30, 2025. It is now read-only.

Including resource-level metadata in the syntax #14

@eemeli

Description

@eemeli

The result of formatting a message depends on the environment in which that's done. With MF2, at least the following are relevant:

  • The locale of the message
  • The functions that are available as annotations
  • The meanings of any private-use annotations
  • Later, the MF2 version if any reserved annotations are defined

As these attributes are likely to be common to all messages in a single resource, it would probably make sense to include syntax or conventions for their declaration. These might not necessarily be used during the formatting runtime as then their values would be implicit, but would at least prove invaluable to translators and automated tools processing messages.

I'm aware of at least the following prior art that may be relevant to consider here:

  • A gettext header entry may include fields such as Language and Plural-Forms that apply to the entire resource. The header entry uses msgid "" to identify itself.
  • The browser extension messages.json files rely on being placed in a well-defined directory structure to identify the locale for their contents.
  • Java ResourceBundle file names encode the locale, so a base Resource.properties would use Resource_de_CH.properties for its de-CH locale.
  • An XLIFF 1.2 <file> element includes at least the source-language attribute, and its other attributes and <header> element may provide significantly more context about the resource.
  • YAML supports the %YAML directive, which defines the YAML version that's used by the document, e.g. %YAML 1.2.

Of the above, browser extensions and ResourceBundles stand out by incorporating their locale information within their file or directory name. I don't think this approach would work well for our purposes, given that not all of the relevant information is easily expressible via a locale identifier.

I think we should instead do something similar to the other formats, and incorporate metadata into the file using a syntax that's easy to parse (in particular for runtimes that don't care about the metadata), sufficiently expressive, but also extensible for later use cases that are not yet identifiable.

Of the fields I list above, I think the available functions and private-use annotations could be identified together via some "schema", for which we could use an identifier that references an external definition. With that we're left with key-value pairs that ought to each fit into a single line:

  • locale, obviously using a BCP47 identifier
  • schema, defined via URL or some other structured string identifier
  • version, with a numerical string like '2.0' identifying the spec version

This leaves a couple of open questions that ought to be answered:

  1. How should the information be encoded? Structured data within comments, messages using some predefined keys, or using some new syntax?
  2. Where and how are the schemas identified?
  3. Are there additional fields not yet under consideration that would not fit a simple key-value string shape?

Sidenote: One interesting possibility would be to use something like format instead of version, and to incorporate the content format in the value, via e.g. 'messageformat-2.0'. This would potentially allow for the resource format to also support other message formats.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions