-
Notifications
You must be signed in to change notification settings - Fork 1
Including resource-level metadata in the syntax #14
Description
The result of formatting a message depends on the environment in which that's done. With MF2, at least the following are relevant:
- The locale of the message
- The functions that are available as annotations
- The meanings of any private-use annotations
- Later, the MF2 version if any reserved annotations are defined
As these attributes are likely to be common to all messages in a single resource, it would probably make sense to include syntax or conventions for their declaration. These might not necessarily be used during the formatting runtime as then their values would be implicit, but would at least prove invaluable to translators and automated tools processing messages.
I'm aware of at least the following prior art that may be relevant to consider here:
- A gettext header entry may include fields such as
LanguageandPlural-Formsthat apply to the entire resource. The header entry usesmsgid ""to identify itself. - The browser extension
messages.jsonfiles rely on being placed in a well-defined directory structure to identify the locale for their contents. - Java ResourceBundle file names encode the locale, so a base
Resource.propertieswould useResource_de_CH.propertiesfor itsde-CHlocale. - An XLIFF 1.2
<file>element includes at least thesource-languageattribute, and its other attributes and<header>element may provide significantly more context about the resource. - YAML supports the
%YAMLdirective, which defines the YAML version that's used by the document, e.g.%YAML 1.2.
Of the above, browser extensions and ResourceBundles stand out by incorporating their locale information within their file or directory name. I don't think this approach would work well for our purposes, given that not all of the relevant information is easily expressible via a locale identifier.
I think we should instead do something similar to the other formats, and incorporate metadata into the file using a syntax that's easy to parse (in particular for runtimes that don't care about the metadata), sufficiently expressive, but also extensible for later use cases that are not yet identifiable.
Of the fields I list above, I think the available functions and private-use annotations could be identified together via some "schema", for which we could use an identifier that references an external definition. With that we're left with key-value pairs that ought to each fit into a single line:
locale, obviously using a BCP47 identifierschema, defined via URL or some other structured string identifierversion, with a numerical string like'2.0'identifying the spec version
This leaves a couple of open questions that ought to be answered:
- How should the information be encoded? Structured data within comments, messages using some predefined keys, or using some new syntax?
- Where and how are the schemas identified?
- Are there additional fields not yet under consideration that would not fit a simple key-value string shape?
Sidenote: One interesting possibility would be to use something like format instead of version, and to incorporate the content format in the value, via e.g. 'messageformat-2.0'. This would potentially allow for the resource format to also support other message formats.