You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue is a survey of the different places where multi-character punctuation tokens can be split into multiple smaller tokens. Some of these have been implemented long ago, and some are not yet supported, and some are questionable. It is my understanding that it is intended for rustc to split in most of these cases.
This does not include unstable syntax. This is based on analysis of the Reference grammar.
There are some places where splitting is explicitly forbidden, and I am not considering them as potential split points. For example, a float literal 9. is not allowed to be followed by a . (or _ or XID_Start for that matter), which rules out many possible conflicts with ..., .., and ..=.
This survey is primarily focused on places where token splitting would actually be useful in some way. There are some situations where token splitting would not help with any valid syntax. For example, when parsing a . for field access or similar, it could split a .. token to obtain the first dot. But then the second dot would be nonsensical. I do not list those here.
I have included MacroRepOp, but I think it is questionable. It depends on how macro_rules parsing is modeled, and whether or not it is reasonable to expect it to split.
... is unusual because in expression position it more or less an immediate error. Thus token splits aren't really possible (and that seems fine to avoid confusion). That does mean there are expressions where whitespace or parentheses are required (like x.. .. or (x..).start).
In pattern position, due to the restrictions on where .. can appear, and that the bounds of range patterns are heavily restricted, there are no splits in pattern position, either.
..
.. does not have any splits. Nothing ends with a single dot or starts with a single dot.
macro_rules! m {($(x)*=) => {};// ERROR, expected * + ?}macro_rules! m {($(x)* =) => {};// OK}
+=
TypeParamBounds
// TypeParamstructErr<T:Clone += ()>{t:T}// OK// ConstParam// This only parses, otherwise invalid.structS<constN:dynSend += 1>{}// OK// ConstantItem// This only parses, otherwise invalid.constFOO:dynSend += todo!();// OK// StaticItem// This only parses, otherwise invalid.staticS:dynSend += todo!();// OK// LetStatement// This only parses, otherwise invalid.let _:dynSync += todo!();// OK// TypeAliastraitT{typeO:Clone += i32;// OK}
MacroRepOp
macro_rules! m {($(x)+=) => {};// ERROR, expected * + ?}macro_rules! m {($(x)+ =) => {};// OK}
-=
-= does not have any splits.
->
-> does not have any splits.
/=
/= does not have any splits.
::
ConstParam
structS<constN:::TypePath>{}// ERROR: Expected :structS<constN:::TypePath>{}// OK
FunctionParamPattern
fnf([]:::TypePath){}// ERROR: Expected : or |fng([]:::TypePath){}// OK
ClosureParam
|[]:::TypePath| {};// ERROR: Expected , : or |
|[]:::TypePath| {};// Ok
ConstantItem
This has a strange error due to this check in the unstable generic const items.
// Strange errorconstC:::TypePath = todo!();// ERROR invalid path separator in function definitionconstC:::TypePath = todo!();// OK
LetStatement
let[]:::TypePath;// Expected : ; = or |let[]:::TypePath;// OK
StructField
structS{x:::std::primitive::i32};// ERROR expected :structS{x:::std::primitive::i32};// OK
TypedSelf
self as a self param does not allow a :: to follow it, so this gets parsed as a PathPattern, which is then a parse error.
// Remarking on this, as it is a little unusual.// This is parsing a PathPattern instead of a self parameter.fnf(self:::std::primitive::i32){}// ERROR: Path separator must be a double colonfng(self:::std::primitive::i32){}// OK
StaticItem
staticC:::TypePath = todo!();// ERROR expected : ; or =staticC:::TypePath = todo!();// OK
MaybeNamedParam (BareFunctionType)
// Strange errortypeT1 = fn(_:::TypePath);// ERROR Expected identifier, found `:`typeT2 = fn(_:::TypePath);// OK
<-
GenericArgs
S::<-9>;// OK
<<
GenericArgs
traitContainer{typeItem;}impl<T>ContainerforVec<T>{typeItem = T;}// GenericArgs with QualifiedPathType insidelet _:Option<<Vec<i32>asContainer>::Item> = todo!();// OK
Splitting >= to = does not allow the ability to re-glue the = with the following >.
letmut x = 1;letmut y = 1;unsafe{ core::arch::asm!("inc {}", inout(reg) x::<>=>y);}// ERROR, expected expression, found >unsafe{ core::arch::asm!("inc {}", inout(reg) x::<> => y);}// OK
MatchArms
Splitting >= to = does not allow the ability to re-glue the = with the following >.
// PathPattern with GenericArgs followed by =>matchtodo!(){
a::<>=> 0,// ERROR, expected one of ! ( ... ..= .. :: => if { ar |}
a::<> => 0,// OK
_ => 1};
LetChainCondition
iflet a::<>=0{}// OKiflet a::<> = 0{}
ConstantItem
// ConstantItem with GenericArgs followed by =constC:Type::<>=0;// OKconstC:Type::<> = 0;
StaticItem
// StaticItem with GenericArgs followed by =constC:Type::<>=0;// OKconstC:Type::<> = 0;
LetStatement
// LetStatement with GenericArgs followed by =let x:t::<>= val;// OKlet x:t::<> = val;
ComparisonExpression
IIUC, what happens here is that >= is split into >=, and then assignment sees =1 on the RHS which does not parse correctly as an expression. The issue is that splitting a token does not allow the possibility of re-gluing a token with whatever follows (in this case into ==).
// ComparisonExpression with GenericArgs followed by ==x::<>==1;// ERROR: expected expression, found =x::<> == 1;// OK
AssignmentExpression
// AssignmentExpression with GenericArgs followed by =x::<>=1;// OKx::<> = 1;
GenericArgsBinding
// GenericArgs in GenericArgsBindingfnf(iter:implIterator<Item::<>=u32>){}// OKfnf(iter:implIterator<Item::<> = u32>){}
>>=
GenericArgs
// Generic args followed by >=x::<>>= y // OK
x::<> >= y
>>
GenericArgs and Gt
// Generic args followed by >x::<>>3;// OKx::<> > 3;
GenericArgs and Right Shift
Same problem with re-gluing as described in ComparisonExpression.
x::<>>>3;// ERROR, expected expression, found `>`x::<> >> 3;// OK
QualifiedPathType
<Sasx::<T1>>::f();// OK
<Sasx::<T1> >::f();
CompoundAssignmentExpression
Same problem with re-gluing as described in ComparisonExpression.
letmut x = 2;// tokenizes as x :: < >> >= 1x::<>>>=1// ERROR, expected expression, found >=x::<> >>=1;// OK
^=
^= does not have any splits.
|=
|= does not have any splits.
||
ClosureExpression (depending on if you consider || to be a distinct syntax, or it is two | joined).
This issue is a survey of the different places where multi-character punctuation tokens can be split into multiple smaller tokens. Some of these have been implemented long ago, and some are not yet supported, and some are questionable. It is my understanding that it is intended for
rustcto split in most of these cases.This does not include unstable syntax. This is based on analysis of the Reference grammar.
There are some places where splitting is explicitly forbidden, and I am not considering them as potential split points. For example, a float literal
9.is not allowed to be followed by a.(or_or XID_Start for that matter), which rules out many possible conflicts with...,.., and..=.This survey is primarily focused on places where token splitting would actually be useful in some way. There are some situations where token splitting would not help with any valid syntax. For example, when parsing a
.for field access or similar, it could split a..token to obtain the first dot. But then the second dot would be nonsensical. I do not list those here.I have included MacroRepOp, but I think it is questionable. It depends on how macro_rules parsing is modeled, and whether or not it is reasonable to expect it to split.
See also:
TODO
break_up_floatTokens that could potentially cause splits
....,..=,.......<<<=,<-,<<,<=<<<<=>>>=,>=,>>>>>>=!!=%%=&&=**=++=--=,->//=:::===,=>^^=||=,||......is unusual because in expression position it more or less an immediate error. Thus token splits aren't really possible (and that seems fine to avoid confusion). That does mean there are expressions where whitespace or parentheses are required (likex.. ..or(x..).start).In pattern position, due to the restrictions on where
..can appear, and that the bounds of range patterns are heavily restricted, there are no splits in pattern position, either.....does not have any splits. Nothing ends with a single dot or starts with a single dot...=..=does not have any splits.<<=<<=does not have any splits.!=NeverType --- Opportunistically split
!=to successfully parse never type #145536%=%=does not have any splits.&&BorrowExpression
ReferencePattern
ReferenceType
&=&=does not have any splits.*=+=TypeParamBounds
MacroRepOp
-=-=does not have any splits.->->does not have any splits./=/=does not have any splits.::ConstParam
FunctionParamPattern
ClosureParam
ConstantItem
This has a strange error due to this check in the unstable generic const items.
LetStatement
StructField
TypedSelf
selfas a self param does not allow a::to follow it, so this gets parsed as a PathPattern, which is then a parse error.StaticItem
MaybeNamedParam (BareFunctionType)
<-GenericArgs
<<GenericArgs
QualifiedPathType
<=<=does not have any splits.====does not have any splits.=>=>does not have any splits.>=ConstParam
DualDirSpecExpression (asm)
Splitting
>=to=does not allow the ability to re-glue the=with the following>.MatchArms
Splitting
>=to=does not allow the ability to re-glue the=with the following>.LetChainCondition
ConstantItem
StaticItem
LetStatement
ComparisonExpression
IIUC, what happens here is that
>=is split into>=, and then assignment sees=1on the RHS which does not parse correctly as an expression. The issue is that splitting a token does not allow the possibility of re-gluing a token with whatever follows (in this case into==).AssignmentExpression
GenericArgsBinding
>>=GenericArgs
>>GenericArgs and Gt
GenericArgs and Right Shift
Same problem with re-gluing as described in ComparisonExpression.
QualifiedPathType
CompoundAssignmentExpression
Same problem with re-gluing as described in ComparisonExpression.
^=^=does not have any splits.|=|=does not have any splits.||||to be a distinct syntax, or it is two|joined).