-
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
Improve template tokenizing #13919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve template tokenizing #13919
Conversation
|
Build successful! You can test your changes in the REPL here: https://babeljs.io/repl/build/50065/ |
defe5fa to
96712cd
Compare
|
This pull request is automatically built and testable in CodeSandbox. To see build info of the built libraries, click here or the icon next to each commit SHA. Latest deployment of this branch, based on commit ce1440f:
|
| const elemStart = start + 1; | ||
| const elem = this.startNodeAt( | ||
| elemStart, | ||
| createPositionFromPosition(this.state.startLoc, 1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: it would be more clear if this function was called something like createPositionAfterOffset or getIncrementedPosition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe createPositionWithColumnOffset? I should add a note that this function does not consider whitespaces, so it does not work for any offsets for sure, and thus it should be only used when we create AST node inside the token boundaries, which should be rare scenario.
For this reason, I am also tempted to expand TemplateElement to match the actual token range in Babel 8. It can possibly further get rid of
babel/eslint/babel-eslint-parser/src/convert/convertAST.cjs
Lines 79 to 106 in 531db5d
| if (node.type === "TemplateLiteral") { | |
| for (let i = 0; i < node.quasis.length; i++) { | |
| const q = node.quasis[i]; | |
| q.range[0] -= 1; | |
| if (q.tail) { | |
| q.range[1] += 1; | |
| } else { | |
| q.range[1] += 2; | |
| } | |
| q.loc.start.column -= 1; | |
| if (q.tail) { | |
| q.loc.end.column += 1; | |
| } else { | |
| q.loc.end.column += 2; | |
| } | |
| if (ESLINT_VERSION >= 8) { | |
| q.start -= 1; | |
| if (q.tail) { | |
| q.end += 1; | |
| } else { | |
| q.end += 2; | |
| } | |
| } | |
| } | |
| } | |
| }, | |
| }; |
2f02b9a to
ce1440f
Compare
nicolo-ribaudo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mostly left some comments about naming (because I found that they made it harder to understand the code), but apart from that this PR looks good!
| backQuote: createToken("`", { startsExpr }), | ||
| dollarBraceL: createToken("${", { beforeExpr, startsExpr }), | ||
| templateTail: createToken("...`", { startsExpr }), | ||
| templateMiddle: createToken("...${", { beforeExpr, startsExpr }), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While reviewing, I found it quite confusing that templateMiddle can also be at the start of a template (I expected templateStart ŧemplateMiddle+ templateTail). I'd prefer if it was just called templateNonTail.
| if (!noCalls && this.eat(tt.doubleColon)) { | ||
| return this.parseBind(base, startPos, startLoc, noCalls, state); | ||
| } else if (this.match(tt.backQuote)) { | ||
| } else if (this.match(tt.templateMiddle) || this.match(tt.templateTail)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: since there are different places where we check this.match(tt.templateMiddle) || this.match(tt.templateTail), we could add a tokenIsTemplate function.
| } | ||
| } | ||
| if (type === tt.templateMiddle || type === tt.templateTail) { | ||
| if (!process.env.BABEL_8_BREAKING) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can swap these two ifs, since the outer one would be empty with Babel 8.
|
|
||
| // Reads template string tokens. | ||
|
|
||
| readTemplateMiddle(): void { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can return both a templateMiddle or templateTail? Maybe readTemplateAfterSubsitution, or readTemplateContinuation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I slightly prefer readTemplateContinuation because it is shorter. Will also rename readTmplToken to readTemplateToken for consistency.
ce1440f to
1859669
Compare
nicolo-ribaudo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The linting failure looks related.
|
Oh the lint error should be related. I almost forgot this PR, I will fix after rebase. |
660c848 to
ad60c64
Compare
BABEL_8_BREAKINGflagThis PR includes commits from #13891.
Let's start by an example. Currently Babel tokenizes a string template
`before${x}middle${y}after`into a token sequences
in order to check whether a backQuote
`starts a template and whether braceR}starts a template span, we have to introduce a context stack for backQuote and braceR. Unlike backQuote, a braceR can pair with more than one tokens: namely braceL{, dollarBraceL${and braceHashL#{. Since we lost track of what the braceR is paring with when tokenzing, we have to push new context for braceL and braceHashL even if they have no special semantics in string template. Given that braceL{is a common token in JavaScript, this approach introduced overhead that could have been avoided.In this PR we merge braceR and dollarBraceL with the template token. Two new template token types are introduced in order to differentiate between template middle
...${(ends with${) and template tail...`(ends with`). So now the example above is tokenized intoThe new token layout is terser, requires no context tracking and can still work with the parser. So we can remove
updateContextin the base parser and also simplify the JSX element which still depends onupdateContext.Another bonus is that the new token layout is almost identical to the
espreeone. So we can also remove some works in@babel/eslint-parser: In Babel 8 we don't need to merge multiple tokens into a template token.Benchmark Results
many-template-elements (15% faster)
many-nested-template-elements (25% faster)