owned this note
owned this note
Published
Linked with GitHub
# Reserved prefixes stabilization report
## Links
* [RFC](https://github.com/rust-lang/rfcs/pull/3101)
* [Tracking issue](https://github.com/rust-lang/rust/issues/84978)
* Rust reference PR -- missing?
* [Implementation PR](https://github.com/rust-lang/rust/issues/84599)
## Summary
- `any_identifier#`, `any_identifier"..."`, and `any_identifier'...'` are now reserved
syntax, and no longer tokenize.
- This is mostly relevant to macros. E.g. `quote!{ #a#b }` is no longer accepted.
- It doesn't treat keywords specially, so e.g. `match"..." {}` is no longer accepted.
- Insert whitespace between the identifier and the subsequent `#`, `"`, or `'`
to avoid errors.
- Edition migrations will help you insert whitespace in such cases.
## Details
To make space for new syntax in the future, we've decided to reserve syntax for prefixed identifiers and literals: `prefix#identifier`, `prefix"string"`, `prefix'c'`, and `prefix#123`, where `prefix` can be any identifier. (Except those prefixes that already have a meaning, such as `b'...'` (byte strings) and `r"..."` (raw strings).)
This provides syntax we can expand into in the future without requiring an edition boundary. We may use this for temporary syntax until the next edition, or for permanent syntax if appropriate.
Without an edition, this would be a breaking change, since macros can currently accept syntax such as `hello"world"`, which they will see as two separate tokens: `hello` and `"world"`. The (automatic) fix is simple though: just insert a space: `hello "world"`. Likewise, `prefix#ident` should become `prefix #ident`. Edition migrations will help with this fix.
Other than turning these into a tokenization error, [the RFC][10] does not attach a meaning to any prefix yet. Assigning meaning to specific prefixes is left to future proposals, which will now—thanks to reserving these prefixes—not be breaking changes.
Some new prefixes you might potentially see in the future (though we haven't
committed to any of them yet):
- `k#keyword` to allow writing keywords that don't exist yet in the current edition. For example, while `async` is not a keyword in edition 2015, this prefix would've allowed us to accept `k#async` in edition 2015 without having to wait for edition 2018 to reserve `async` as a keyword.
- `f""` as a short-hand for a format string. For example, `f"hello {name}"` as a short-hand for the equivalent `format!()` invocation.
- `s""` for `String` literals.
- `c""` or `z""` for null-terminated C strings.
[10]: https://github.com/rust-lang/rfcs/pull/3101
## How unresolved questions were resolved and other interesting developments
### Where and how to enforce prefixes
The biggest question was where to enforce the prefixes and emit errors. **We ultimately opted to emit errors in the lexer, which meant that the lexer had to become aware of the current edition.** There was an alternative of using "jointness" and enforcing the conditions in the parser. The idea was to leverage the fact that Rust tokens (at least some subset of them) record not only their content but whether they are separated by whitespace from the next token. This was intended to enable compound operators like `<<` to be parsed as two `<` tokens in some parts øf the parser (types) and as a single token elsewhere (expressions), without the lexer having to know what state the parser was in. This same approach could conceptually be used so that the lexer doesn't have to know the *edition*.
As [described in detail in this writeup](https://hackmd.io/YLe7viGLTu2PfE5sQO4v0w), however, the jointness approach had several downsides. For example, it meant that [lexing of literals was independent of prefix](https://hackmd.io/YLe7viGLTu2PfE5sQO4v0w?view#JOINTNESS-would-require-lexing-to-be-independent-of-prefix): we might like `f"{foo("bar")}"` to be lexed a a string, but that is not possible unless the lexer knows that an `f` string can contain embedded expressions. Similarly, which escape codes the lexer accepts depends on the prefix (e.g. \x for b""). (This is especially relevant for raw strings: whether `fr"\"` is accepted or not depends on what meaning we assign to `fr`.) Jointness also had [forwards compatbility hazards with macro arm ordering](https://hackmd.io/YLe7viGLTu2PfE5sQO4v0w?view#Forwards-compat-hazard-with-JOINTNESS-due-to-macro-rules-arm-ordering). Finally, the lexer-based approach can be converted to a jointness-based approach later, as it currently gives errors much earlier in the process.
There were also advantages to jointness: it would allow more procedural macro prototyping, and it means that the lexer would remain independent of edition.
### Edition used for procedural macro APIs
There are some procedural macro APIs that lex tokens from strings. Those APIs have not traditionally taken a span or other information from which an edition can be derived. Those APIs will be documented with the Edition that they use to do lexing. In the future we may wish to add new APIs that take a Span or other parameter and use that to derive the Edition.