Skip to content

Change witx unions to be tagged#220

Merged
pchickey merged 17 commits intomasterfrom
pch/tagged_unions
Feb 18, 2020
Merged

Change witx unions to be tagged#220
pchickey merged 17 commits intomasterfrom
pch/tagged_unions

Conversation

@pchickey
Copy link
Copy Markdown
Contributor

@pchickey pchickey commented Feb 5, 2020

Prior to this PR, witx's union was exactly like a C union - described a set of variants that occupy the same location in memory, but gave no information about which variant was present/valid. All uses of union in the WASI spec were found in structs directly following an enum that tagged the union (indicated which variant was present/valid). Doc comments described the mapping of enum tags to union variants.

This implicit specification of tagged unions is hard for our growing suite of automated code-generation tools to deal with automatically. Many programming languages do not have a concept of an untagged union, and even in those that do, its typically safer and easier to use a tagged union concept. Additionally, the concept of an untagged union is not compatible with the interface types proposal, where we expect to have ADTs available (a generalized form of tagged unions) but not untagged unions.

This PR changes the way witx union is defined to require a tag (enum) type identifier to be specified as the first argument:

(typename $t (enum $a $b))
(typename $u (union $t (field $a u8) (field $b f32)))

Each member of the tag enum must be given a corresponding field in the union. A new empty field syntax (empty $c) is provided for tags that do not have data associated with them.

This change to the witx syntax and semantics allows us to replace all of the implicit tagged union definitions in the WASI witx specs to be explicit tagged unions. All of the uses in the WASI specs are completely ABI-compatible with the new form. In other words, the changes in this PR do not change the functional specification of WASI, they just change the way in which we specify it. Old implementations of WASI snapshot 0 or 1 should continue to work against new implementations that use the new syntax.

This PR makes updates to all of the witx specs describing WASI phases in this repository, so that the existing $event_u, $subscription_u, and $prestat_u unions are annotated with the enums that tag them. In event_u and subscription_u, I had to make a separate variant (with the same type) for fd_read and fd_write rather than use the same fd_readwrite variant for each. This is a semantic difference that code generators can present to the programmer (i.e. the C union generated by wasi-headers will be slightly different) but at the ABI level, all fd_read and fd_write variants are represented in-memory exactly as before.

The use sites of unions were simple to translate. The prestat_u type was able to be renamed to prestat, and the existing prestat struct erased, because it was just a pair of tag and union. For event and subscription, I was able to erase the tag member from the struct and leave just the union. The memory layout specified for tagged unions in src/layout.rs is equivelant to the tag, followed by the untagged union, as two struct members in sequence.

Because ABI compatibility for existing WASI snapshots is critical to accepting this PR, I'm going to stage PRs on all of the witx-consuming tools in the ecosystem presently (wasi-headers in https://github.com/cranestation/wasi-libc, https://github.com/bytecodealliance/wasi, wig in https://github.com/bytecodealliance/wasmtime- please ping me if I'm missing any) that update them to use these changes, and test all of the new implementations to make sure they are indeed ABI compatible with the existing WASI ecosystem.

I will add comments and links to this PR as I complete work on those tools, and wait to merge this PR until all feedback has been addressed and everything is ready to go.

@pchickey
Copy link
Copy Markdown
Contributor Author

pchickey commented Feb 8, 2020

Unfortunately I discovered that I had overlooked the way the struct layout of event would change with this scheme. I don't believe there is any way to rescue this approach without making all the downstream consumers special-case C layout rules (which are complicated enough without special cases!) just for the sake of this one struct working across witx versions.

Instead, I'm going to rework this to make tagged unions a new sort of datatype, use those in ephemeral & snapshot 2 onwards, and set us on a path to deprecate untagged unions in witx once snapshot 1 is no longer in use.

@pchickey pchickey closed this Feb 8, 2020
@pchickey pchickey reopened this Feb 12, 2020
@pchickey
Copy link
Copy Markdown
Contributor Author

I found a way to maintain ABI compatibility and move forward with this approach:

The event struct originally contained a union event_u with just one variant fd_readwrite. This means we can eliminate the event_u union, replacing its use with the type of the fd_readwrite variant, and leave a note to users to ignore this field for the clock variants.

We can upgrade to using a tagged union for event_u in ephemeral, because we expect to extend event types in the future. We don't have to worry about ABI stability until that becomes a snapshot, so its fine to change the layout with this PR.

prior to this PR, event_u was a union with only one variant
fd_readwrite. the change into tagged union ended up changing the
memory layout of the event struct. so, rather than use tagged
unions in the event struct, we just use a struct that you ignore
in the case of the clock event, which has the same ABI as previously.

in ephemeral, we use a tagged union for events, because we dont have
any ABI stability guarantees to maintain there.
this is much more convenient for consumers, and it reflects the
requirement of the syntax to provide an Id
@pchickey
Copy link
Copy Markdown
Contributor Author

I now believe this work is complete. The PR comment on WebAssembly/wasi-libc#165 has been updated with a comparison of the layout of each union-related type before and after this change. The static assertions indicate that clang agrees with the layout calculations made in this PR, and manual inspection indicates that the layout is the same as prior to this PR.

@pchickey pchickey requested a review from sunfishcode February 14, 2020 01:21
Copy link
Copy Markdown
Member

@sunfishcode sunfishcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@pchickey pchickey merged commit 85df508 into master Feb 18, 2020
@pchickey pchickey deleted the pch/tagged_unions branch February 18, 2020 20:31
@pchickey pchickey mentioned this pull request Dec 1, 2020
yoshuawuyts pushed a commit to yoshuawuyts/WASI that referenced this pull request Nov 25, 2025
* unions are now tagged by an enum

* union fields may be empty. TODO: write tests

* unions: add more tests. compute layout.

* snapshot 0: translate untagged union syntax to tagged union syntax

* snapshot: update from untagged union to tagged union syntax

* snapshot: update docs

* snapshot 0: update docs

* ephemeral: translate untagged unions to tagged unions

* render: fix union output

* bump witx to 0.8.0 - this is definitely semver breaking

* snapshot: I forgot to delete the tag field `subscription.type`

* regen docs

* unionlayout: derive the usual suspects

* snapshot 1 and 0: put tag back in event struct, flatten event_u

prior to this PR, event_u was a union with only one variant
fd_readwrite. the change into tagged union ended up changing the
memory layout of the event struct. so, rather than use tagged
unions in the event struct, we just use a struct that you ignore
in the case of the clock event, which has the same ABI as previously.

in ephemeral, we use a tagged union for events, because we dont have
any ABI stability guarantees to maintain there.

* layout: refactor so we have impls for NamedType and Type too

* ast: change tag of UnionDatatype to be a NamedType

this is much more convenient for consumers, and it reflects the
requirement of the syntax to provide an Id

* fixup docs
yoshuawuyts pushed a commit to yoshuawuyts/WASI that referenced this pull request Nov 25, 2025
* unions are now tagged by an enum

* union fields may be empty. TODO: write tests

* unions: add more tests. compute layout.

* snapshot 0: translate untagged union syntax to tagged union syntax

* snapshot: update from untagged union to tagged union syntax

* snapshot: update docs

* snapshot 0: update docs

* ephemeral: translate untagged unions to tagged unions

* render: fix union output

* bump witx to 0.8.0 - this is definitely semver breaking

* snapshot: I forgot to delete the tag field `subscription.type`

* regen docs

* unionlayout: derive the usual suspects

* snapshot 1 and 0: put tag back in event struct, flatten event_u

prior to this PR, event_u was a union with only one variant
fd_readwrite. the change into tagged union ended up changing the
memory layout of the event struct. so, rather than use tagged
unions in the event struct, we just use a struct that you ignore
in the case of the clock event, which has the same ABI as previously.

in ephemeral, we use a tagged union for events, because we dont have
any ABI stability guarantees to maintain there.

* layout: refactor so we have impls for NamedType and Type too

* ast: change tag of UnionDatatype to be a NamedType

this is much more convenient for consumers, and it reflects the
requirement of the syntax to provide an Id

* fixup docs
yoshuawuyts pushed a commit to yoshuawuyts/WASI that referenced this pull request Nov 25, 2025
* unions are now tagged by an enum

* union fields may be empty. TODO: write tests

* unions: add more tests. compute layout.

* snapshot 0: translate untagged union syntax to tagged union syntax

* snapshot: update from untagged union to tagged union syntax

* snapshot: update docs

* snapshot 0: update docs

* ephemeral: translate untagged unions to tagged unions

* render: fix union output

* bump witx to 0.8.0 - this is definitely semver breaking

* snapshot: I forgot to delete the tag field `subscription.type`

* regen docs

* unionlayout: derive the usual suspects

* snapshot 1 and 0: put tag back in event struct, flatten event_u

prior to this PR, event_u was a union with only one variant
fd_readwrite. the change into tagged union ended up changing the
memory layout of the event struct. so, rather than use tagged
unions in the event struct, we just use a struct that you ignore
in the case of the clock event, which has the same ABI as previously.

in ephemeral, we use a tagged union for events, because we dont have
any ABI stability guarantees to maintain there.

* layout: refactor so we have impls for NamedType and Type too

* ast: change tag of UnionDatatype to be a NamedType

this is much more convenient for consumers, and it reflects the
requirement of the syntax to provide an Id

* fixup docs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants