Skip to content

Token split points #152398

@ehuss

Description

@ehuss

This issue is a survey of the different places where multi-character punctuation tokens can be split into multiple smaller tokens. Some of these have been implemented long ago, and some are not yet supported, and some are questionable. It is my understanding that it is intended for rustc to split in most of these cases.

This does not include unstable syntax. This is based on analysis of the Reference grammar.

There are some places where splitting is explicitly forbidden, and I am not considering them as potential split points. For example, a float literal 9. is not allowed to be followed by a . (or _ or XID_Start for that matter), which rules out many possible conflicts with ..., .., and ..=.

This survey is primarily focused on places where token splitting would actually be useful in some way. There are some situations where token splitting would not help with any valid syntax. For example, when parsing a . for field access or similar, it could split a .. token to obtain the first dot. But then the second dot would be nonsensical. I do not list those here.

I have included MacroRepOp, but I think it is questionable. It depends on how macro_rules parsing is modeled, and whether or not it is reasonable to expect it to split.

See also:

TODO

  • Investigate break_up_float

Tokens that could potentially cause splits

Token Can split
. ..., ..=, ..
.. ...
< <<=, <-, <<, <=
<< <<=
> >>=, >=, >>
>> >>=
! !=
% %=
& &=
* *=
+ +=
- -=, ->
/ /=
: ::
= ==, =>
^ ^=
| |=, ||

...

... is unusual because in expression position it more or less an immediate error. Thus token splits aren't really possible (and that seems fine to avoid confusion). That does mean there are expressions where whitespace or parentheses are required (like x.. .. or (x..).start).

In pattern position, due to the restrictions on where .. can appear, and that the bounds of range patterns are heavily restricted, there are no splits in pattern position, either.

..

.. does not have any splits. Nothing ends with a single dot or starts with a single dot.

..=

..= does not have any splits.

<<=

<<= does not have any splits.

!=

%=

%= does not have any splits.

&&

  • BorrowExpression

    &&x;
  • ReferencePattern

    let &&x: &&i32; // OK
  • ReferenceType

    let &&x: &&i32; // OK

&=

&= does not have any splits.

*=

  • MacroRepOp
    macro_rules! m {
        ($(x)*=) => {}; // ERROR, expected * + ?
    }
    
    macro_rules! m {
        ($(x)* =) => {}; // OK
    }

+=

  • TypeParamBounds

    // TypeParam
    struct Err<T: Clone += ()> { t: T } // OK
    
    // ConstParam
    // This only parses, otherwise invalid.
    struct S<const N: dyn Send += 1> {} // OK
    
    // ConstantItem
    // This only parses, otherwise invalid.
    const FOO: dyn Send += todo!(); // OK
    
    // StaticItem
    // This only parses, otherwise invalid.
    static S: dyn Send += todo!(); // OK
    
    // LetStatement
    // This only parses, otherwise invalid.
    let _: dyn Sync += todo!(); // OK
    
    // TypeAlias
    trait T {
        type O: Clone += i32; // OK
    }
  • MacroRepOp

    macro_rules! m {
        ($(x)+=) => {}; // ERROR, expected * + ?
    }
    
    macro_rules! m {
        ($(x)+ =) => {}; // OK
    }

-=

-= does not have any splits.

->

-> does not have any splits.

/=

/= does not have any splits.

::

  • ConstParam

    struct S<const N:::TypePath>{} // ERROR: Expected :
    struct S<const N: ::TypePath>{} // OK
  • FunctionParamPattern

    fn f([]:::TypePath) {} // ERROR: Expected : or |
    fn g([]: ::TypePath) {} // OK
  • ClosureParam

    |[]:::TypePath| {}; // ERROR: Expected , : or |
    |[]: ::TypePath| {}; // Ok
  • ConstantItem

    This has a strange error due to this check in the unstable generic const items.

    // Strange error
    const C:::TypePath = todo!(); // ERROR invalid path separator in function definition
    const C: ::TypePath = todo!(); // OK
  • LetStatement

    let []:::TypePath; // Expected : ; = or |
    let []: ::TypePath; // OK
  • StructField

    struct S {x:::std::primitive::i32}; // ERROR expected :
    struct S {x: ::std::primitive::i32}; // OK
  • TypedSelf

    self as a self param does not allow a :: to follow it, so this gets parsed as a PathPattern, which is then a parse error.

    // Remarking on this, as it is a little unusual.
    // This is parsing a PathPattern instead of a self parameter.
    fn f(self:::std::primitive::i32) {} // ERROR: Path separator must be a double colon
    fn g(self: ::std::primitive::i32) {} // OK
  • StaticItem

    static C:::TypePath = todo!(); // ERROR expected : ; or =
    static C: ::TypePath = todo!(); // OK
  • MaybeNamedParam (BareFunctionType)

    // Strange error
    type T1 = fn(_:::TypePath); // ERROR Expected identifier, found `:`
    type T2 = fn(_: ::TypePath); // OK

<-

  • GenericArgs

    S::<-9>; // OK

<<

  • GenericArgs

    trait Container {
        type Item;
    }
    impl<T> Container for Vec<T> {
        type Item = T;
    }
    // GenericArgs with QualifiedPathType inside
    let _: Option<<Vec<i32> as Container>::Item> = todo!(); // OK
  • QualifiedPathType

    trait Inner {
        type InnerType;
    }
    
    trait Outer {
        type OuterType;
    }
    
    impl<T> Inner for Vec<T> {
        type InnerType = T;
    }
    
    impl Outer for String {
        type OuterType = usize;
    }
    
    // QualifiedPathType inside a QualifiedPathType
    fn constrained_function<T>() -> <<Vec<T> as Inner>::InnerType as Outer>::OuterType
    where
        T: Outer,
        <Vec<T> as Inner>::InnerType: Outer,
        <<Vec<T> as Inner>::InnerType as Outer>::OuterType: Default,
    {
        Default::default()
    }

<=

<= does not have any splits.

==

== does not have any splits.

=>

=> does not have any splits.

>=

  • ConstParam

    // GenericArgs inside ConstParam
    struct S<const N: x::<>=1>; // OK
    struct S<const N: x::<> = 1>;
  • DualDirSpecExpression (asm)

    Splitting >= to = does not allow the ability to re-glue the = with the following >.

    let mut x = 1;
    let mut y = 1;
    unsafe { core::arch::asm!("inc {}", inout(reg) x::<>=>y); } // ERROR, expected expression, found >
    unsafe { core::arch::asm!("inc {}", inout(reg) x::<> => y); } // OK
  • MatchArms

    Splitting >= to = does not allow the ability to re-glue the = with the following >.

    // PathPattern with GenericArgs followed by =>
    match todo!() {
        a::<>=> 0, // ERROR, expected one of ! ( ... ..= .. :: => if { ar |}
        a::<> => 0, // OK
        _ => 1
    };
  • LetChainCondition

    if let a::<>=0 {} // OK
    if let a::<> = 0 {}
  • ConstantItem

    // ConstantItem with GenericArgs followed by =
    const C: Type::<>=0; // OK
    const C: Type::<> = 0;
  • StaticItem

    // StaticItem with GenericArgs followed by =
    const C: Type::<>=0; // OK
    const C: Type::<> = 0;
  • LetStatement

    // LetStatement with GenericArgs followed by =
    let x: t::<>= val; // OK
    let x: t::<> = val;
  • ComparisonExpression

    IIUC, what happens here is that >= is split into > =, and then assignment sees =1 on the RHS which does not parse correctly as an expression. The issue is that splitting a token does not allow the possibility of re-gluing a token with whatever follows (in this case into ==).

    // ComparisonExpression with GenericArgs followed by ==
    x::<>==1; // ERROR: expected expression, found =
    x::<> == 1; // OK
  • AssignmentExpression

    // AssignmentExpression with GenericArgs followed by =
    x::<>=1; // OK
    x::<> = 1;
  • GenericArgsBinding

    // GenericArgs in GenericArgsBinding
    fn f(iter: impl Iterator<Item::<>=u32>) {} // OK
    fn f(iter: impl Iterator<Item::<> = u32>) {}

>>=

  • GenericArgs

    // Generic args followed by >=
    x::<>>= y  // OK
    x::<> >= y

>>

  • GenericArgs and Gt

    // Generic args followed by >
    x::<>>3; // OK
    x::<> > 3;
  • GenericArgs and Right Shift

    Same problem with re-gluing as described in ComparisonExpression.

    x::<>>>3; // ERROR, expected expression, found `>`
    x::<> >> 3; // OK
  • QualifiedPathType

    <S as x::<T1>>::f(); // OK
    <S as x::<T1> >::f();
  • CompoundAssignmentExpression

    Same problem with re-gluing as described in ComparisonExpression.

      let mut x = 2;
      //    tokenizes as x :: < >> >= 1
      x::<>>>=1   // ERROR, expected expression, found >=
      x::<> >>=1; // OK

^=

^= does not have any splits.

|=

|= does not have any splits.

||

  • ClosureExpression (depending on if you consider || to be a distinct syntax, or it is two | joined).

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-grammarArea: The grammar of RustA-parserArea: The lexing & parsing of Rust source code to an ASTI-lang-radarItems that are on lang's radar and will need eventual work or consideration.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.T-langRelevant to the language teamT-specRelevant to the spec team.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions