perf(parser): introduce ParserConfig#19637
Conversation
Merging this PR will improve performance by 18.18%
Performance Changes
Comparing Footnotes
|
There was a problem hiding this comment.
Pull request overview
This PR introduces a ParserConfig trait to control whether the parser collects tokens at compile-time or runtime, addressing a performance regression from #19497. The change enables zero-cost abstractions for token collection by making it a compile-time decision.
Changes:
- Introduced
ParserConfigtrait with three implementations:NoTokensParserConfig(default),TokensParserConfig, andRuntimeParserConfig - Removed
collect_tokensfield fromParseOptionsand replaced it with the config system - Updated all parser and lexer implementations to be generic over the config type
- Migrated byte handler dispatch from a static array to per-config static arrays to enable better optimization
Reviewed changes
Copilot reviewed 34 out of 35 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| crates/oxc_parser/src/config.rs | New module defining ParserConfig and LexerConfig traits with three concrete implementations |
| crates/oxc_parser/src/lib.rs | Updated Parser struct to be generic over ParserConfig, added with_config method, removed collect_tokens from ParseOptions |
| crates/oxc_parser/src/lexer/mod.rs | Updated Lexer to be generic over LexerConfig, changed config field from bool to generic type |
| crates/oxc_parser/src/lexer/byte_handlers.rs | Converted static BYTE_HANDLERS array to per-config static arrays in byte_handler_tables module |
| crates/oxc_parser/src/js/*.rs | Added generic Config parameter to all ParserImpl implementations in JS parsing modules |
| crates/oxc_parser/src/ts/*.rs | Added generic Config parameter to all ParserImpl implementations in TS parsing modules |
| crates/oxc_parser/src/jsx/mod.rs | Added generic Config parameter to ParserImpl implementation for JSX |
| crates/oxc_parser/src/lexer/*.rs | Added generic Config parameter to all Lexer implementations in lexer submodules |
| tasks/coverage/src/tools.rs | Updated to use RuntimeParserConfig for token collection in coverage tests |
| tasks/benchmark/benches/lexer.rs | Updated to use NoTokensLexerConfig for benchmarks |
| napi/playground/src/lib.rs | Removed collect_tokens field from ParseOptions struct initialization |
| crates/oxc_formatter/src/service/mod.rs | Removed collect_tokens field from ParseOptions struct initialization |
|
@overlookmotel i think we should move this below #19497 so we can monitor the perf change more clearly? |
Yes, I agree that'd be preferable. I tried, but it was a bit of a nightmare because the 2 PRs touch all the same code. I've checked the numbers on CodSpeed and they're exactly back to where they were before the preceding PR. |
a62c8da to
fcc54a9
Compare
### 🚀 Features - 733d6dc parser: Report error on `infer` outside conditional type (#19879) (camc314) - c2a42f6 allocator: Add `Vec::into_bump_slice_mut` (#19895) (overlookmotel) - ee4982b parser: Add `VARIANTS` const to `Kind` via `fieldless_enum!` macro (#19877) (overlookmotel) - b3dceae data_structures: Add `fieldless_enum!` macro (#19876) (overlookmotel) - 12b841e parser: Make all `Kind::is_*` methods `const` (#19874) (overlookmotel) - 25c2e25 estree/tokens: Add function to update tokens in place (#19856) (overlookmotel) - f78e6df parser: Add `mutate_tokens` Cargo feature (#19853) (overlookmotel) - 5036bb6 parser: Report error on `for await` in static blocks (#19844) (camc314) - 42bd431 parser: Report error for missing initializer in using decl (#19824) (camc314) - a2f58e5 parser: Report error for `implements` clause in non-ts files (#19820) (Cameron) - b25228a estree: Add `IS_COMPACT` const to `Formatter` trait (#19787) (overlookmotel) - e2a1b79 estree: Expose buffer and formatter of serializers (#19773) (overlookmotel) - 4699498 data_structures: Add `CodeBuffer::print_strs_array` (#19760) (overlookmotel) - 233f947 estree: `oxc_estree` crate export config and formatter types (#19724) (overlookmotel) - 5937a32 semantic: Introduce `symbol_declarations` method (#19609) (camc314) - ea6b796 parser: Add `LexerConfig::TOKENS_METHOD_IS_STATIC` const (#19683) (overlookmotel) - 655c38f semantic: Add "did you mean?" suggestions to undefined name errors (#19102) (copilot-swe-agent) - 9e11dc6 parser,estree,coverage: Collect tokens in parser and convert to ESTree format (#19497) (camc314) - c4a3677 parser: Report error for initializer in ambient context (#19187) (camc314) ### 🐛 Bug Fixes - abc7e19 codegen: Improve parenthesised checks when printing types (#19880) (camc314) - 017de5d parser: Update error code for type annotation in `for...in` statement (#19882) (camc314) - 7682e5a linter/plugins: Decode escapes in identifier tokens (#19838) (overlookmotel) - 06767ed estree/tokens: Convert `this` tokens in `TSTypeName` (#19815) (overlookmotel) - ef798af parser: Use TS8037 for satisfies expression in JS files diagnostic (#19819) (camc314) - 98ea5c5 parser: Use TS8016 for type assertions in JS files diagnostic (#19818) (camc314) - 1710f56 codegen: Remove double indentation for enum inside namespace (#19775) (Dunqing) - 9e4995c codegen: Print type annotation on `CatchParameter` (#19790) (camc314) - 297b2bb codegen: Wrap `TSConditionalType` in parens when necessary (#19788) (camc314) - cec7878 codegen: Print `definite` property on AccessorProperty (#19786) (camc314) - 6f395cf codegen: Print `definite` property on PropertyDefinition (#19785) (camc314) - b749373 codegen: Correctly parenthesise TSArrayType (#19784) (camc314) - 876dc1b codegen: Print object property `this` param (#19783) (camc314) - 93bb861 formatter: Trim trailing whitespace before breaking line (#19740) (leaysgur) - ed17bbf codegen: Print `override` keyword for method and property definitions (#19753) (Dunqing) - 6a59a76 parser: Improve error recovery for private identifiers in property names (#19710) (Boshen) - 3b96f41 codegen: Print comments in JSX expression containers and spread attributes (#19701) (Boshen) - f5694ce estree/tokens: Reverse field order of `regex` object in tokens (#19679) (overlookmotel) - b2b7a55 estree/tokens: Generate tokens for files with BOM (#19535) (overlookmotel) - 50a7514 estree: Fix tokens for JSX (#19524) (overlookmotel) - a35063e minifier: Preserve side effects for meta property url reads (#19668) (Boshen) - 8ad3430 semantic/jsdoc: Handle even-numbered backtick sequences in JSDoc parsing (#19664) (Boshen) ### ⚡ Performance - 05ccf9f linter/plugins: Transfer tokens via raw transfer (#19893) (overlookmotel) - c1bfdcf estree/tokens: Preallocate sufficient space for tokens JSON (#19851) (overlookmotel) - 4b0611a estree/tokens: Introduce `ESTreeTokenConfig` trait (#19842) (overlookmotel) - 81bab90 estree/tokens: Do not JSON-encode keyword, punctuator, etc tokens (#19814) (overlookmotel) - 6260ddd estree/tokens: Do not JSON-encode `this` identifiers (#19813) (overlookmotel) - b378f4a estree/tokens: Do not JSON-encode JSX identifiers (#19812) (overlookmotel) - 5016d92 estree/tokens: Handle regex tokens separately (#19796) (overlookmotel) - 780a68e estree/tokens: Use strings from AST for identifier tokens (#19744) (overlookmotel) - dc9c2e3 estree: Use `CodeBuffer::print_strs_array` to reduce bounds checks (#19766) (overlookmotel) - 845da35 estree: Use `CodeBuffer::print_indent` (#19727) (overlookmotel) - ec88f6a estree/tokens: Serialize tokens while visiting AST (#19726) (overlookmotel) - bc6507f estree/tokens: Serialize with `ESTree` not `serde` (#19725) (overlookmotel) - ec24859 estree/tokens: Do not branch on presence of override twice (#19721) (overlookmotel) - dac14be estree/tokens: Replace hash map with `Vec` (#19718) (overlookmotel) - b9d2443 estree/tokens: Replace multiple hash sets into a single hash map (#19716) (overlookmotel) - 7233548 parser: Remove branches from `finish_next_inner` (#19695) (overlookmotel) - b5d9845 parser: Remove const generic param from `finish_next_inner` (#19684) (overlookmotel) - 8940f66 estree/tokens: Serialize tokens to compact JSON (#19572) (overlookmotel) - 136e39b parser/tokens: Pre-allocate capacity for tokens (#19543) (overlookmotel) - 6a6513c linter/plugins: Use Oxc tokens in plugins (#19498) (camc314) - b3b2d30 parser: Introduce `ParserConfig` (#19637) (overlookmotel) ### 📚 Documentation - b2b7a64 estree/tokens: Correct comment (#19873) (overlookmotel) - 0399311 estree/tokens: Improve comments (#19836) (overlookmotel) - 1b392de minifier: Add `Function.prototype.toString` assumption (#19758) (sapphi-red) - 75c9cd8 parser: Improve doc comments for `ParserConfig` and `LexerConfig` (#19682) (overlookmotel) - 2fa936f README.md: Map npm package links to npmx.dev (#19666) (Boshen) Co-authored-by: Boshen <[email protected]>
# Oxlint ### 🚀 Features - 2e0e1d0 linter/no-unused-vars: Add experimental fix mode controls (off|suggestion|fix) (#19774) (camc314) - f34f6fa linter: Introduce typeCheck config option (#19764) (camc314) - 694be7d linter: Introduce typeAware as config options (#19614) (camc314) - 655c38f semantic: Add "did you mean?" suggestions to undefined name errors (#19102) (copilot-swe-agent) - e97a57e linter/id-length: Use serde to deserialize rule options (#19636) (camc314) - c4a3677 parser: Report error for initializer in ambient context (#19187) (camc314) - 346045a linter/id-length: Add `checkGeneric` option (#19634) (camc314) ### 🐛 Bug Fixes - 1b7a937 linter: Correct double-comparisons fix with swapped operands (#19846) (camc314) - c308857 linter/consistent_type_imports: Add missing help and notes to diagnostics (#19827) (Daniel Osmond) - 7682e5a linter/plugins: Decode escapes in identifier tokens (#19838) (overlookmotel) - f368fcd linter/consistent_type_assertions: Add missing with_help and with_note to diagnostics (#19826) (Daniel Osmond) - 04e6223 npm: Add `preferUnplugged` for Yarn PnP compatibility (#19829) (Boshen) - 86d5037 linter: Add help text to no-extend-native, no-useless-backreference (#19733) (Anthony Amaro) - 50e8eff linter: Add .with_help() to operator-assignment, no-nonoctal-decimal-escape (#19732) (Anthony Amaro) - 1417bdc linter/no-wrapper-object-types: Add help messages to missing diagnostics (#19771) (Daniel Osmond) - 0838477 linter/ban_ts_comment: Add help and notes to missing diagnostics (#19781) (Daniel Osmond) - e8c77cf linter/adjacent_overload_signatures: Add missing diagnostics (#19780) (Daniel Osmond) - 28834ac linter/ban_types: Add missing help and note to diagnostics (#19782) (Daniel Osmond) - fd938d3 linter/prefer-enum-initializers: Add help messages to missing diagnostics (#19772) (Daniel Osmond) - eb928ee linter/no-dynamic-delete: Add help messages to missing diagnostics (#19768) (Daniel Osmond) - a985666 linter/no-empty-interface: Add help messages to missing diagnostics (#19769) (Daniel Osmond) - 2dc0ceb linter/no-extra-non-null-assertion: Add help messages to missing diagnostics (#19770) (Daniel Osmond) - 95d5d66 linter/no-dupe-keys: Handle `__proto__` proto setters in (#19762) (camc314) - 24ff0db linter/exhaustive-deps: False positive for member expressions in IIFEs (#19751) (Dennis Chen) - 7243a58 linter/no-use-before-define: Honor `ignoreTypeReferences` when value and type name collisions (#19747) (Dimava) - eefd818 linter/explicit-module-boundary-types: Add help messages to missing diagnostics (#19736) (Daniel Osmond) - 0440e9a linter: Add help text to no_control_regex, no_fallthrough, no_param_reassign (#19655) (Anthony Amaro) - e84cb2f react/display-name: Handle merged type+value context symbols (#19608) (camc314) - ce7e253 linter/prefer-object-from-entries: Require exact path match in unicorn helper (#19687) (camc314) - f5694ce estree/tokens: Reverse field order of `regex` object in tokens (#19679) (overlookmotel) - b2b7a55 estree/tokens: Generate tokens for files with BOM (#19535) (overlookmotel) - 0722721 linter/jsx-curly-brace-presence: False positive with prop & mixed quotes (#19674) (camc314) - 3496acd linter: Enhance diagnostic help messages for eslint rules (#19653) (Anthony Amaro) - e384e94 linter: Enhance help diagnostic messages for more eslint rules (#19658) (Anthony Amaro) - a4d5b34 linter: Avoid non-promise catch false positives (#19574) (camc314) - 5706f38 linter: `unicorn/no-array-callback-reference` skip `Effect.*` array-like methods name. (#19633) (Said Atrahouch) ### ⚡ Performance - 05ccf9f linter/plugins: Transfer tokens via raw transfer (#19893) (overlookmotel) - 4b0611a estree/tokens: Introduce `ESTreeTokenConfig` trait (#19842) (overlookmotel) - ec88f6a estree/tokens: Serialize tokens while visiting AST (#19726) (overlookmotel) - d4dcf26 linter/plugins: Remove `typescript` from bundle (#19531) (overlookmotel) - 6a6513c linter/plugins: Use Oxc tokens in plugins (#19498) (camc314) ### 📚 Documentation - d86f59e linter: Improve docs for no-useless-concat, mark as pending fixer. (#19859) (connorshea) - caa091d linter/plugins: Correct doc comments for `initTokens` (#19530) (overlookmotel) - 2fa936f README.md: Map npm package links to npmx.dev (#19666) (Boshen) - dc0ff73 linter/no-useless-constructor: Warn for parameter properties as well (#19638) (Ole Asteo) # Oxfmt ### 🚀 Features - 5141bc2 formatter: Support trailing ignore comments (#19304) (Andreas Lubbe) - 4888a99 oxfmt/lsp: Support other schemes beside `file://` and `untitled://` (#19872) (Sysix) - 14a0181 oxfmt: Support `graphql()` variant for gql-in-js (#19703) (leaysgur) - ca68ea6 oxfmt: Support gql-in-js substitution (#19670) (leaysgur) - 035933c formatter,oxfmt: Support js-in-vue (partially) (#19514) (leaysgur) - 9e11dc6 parser,estree,coverage: Collect tokens in parser and convert to ESTree format (#19497) (camc314) ### 🐛 Bug Fixes - 8e3842d oxfmt: Avoid embedded TSFN crash by returning errors as data (take2) (#19806) (Yuji Sugiura) - 04e6223 npm: Add `preferUnplugged` for Yarn PnP compatibility (#19829) (Boshen) - e540585 oxfmt: Support tailwind sort for CSS/LESS/SCSS (#19803) (leaysgur) - 93bb861 formatter: Trim trailing whitespace before breaking line (#19740) (leaysgur) - b85f97b formatter: Drop blank line between terminal call and first chain member (#19659) (Dunqing) ### ⚡ Performance - b3b2d30 parser: Introduce `ParserConfig` (#19637) (overlookmotel) ### 📚 Documentation - 2fa936f README.md: Map npm package links to npmx.dev (#19666) (Boshen) Co-authored-by: Boshen <[email protected]>

What this PR does
Introduce
ParserConfigtrait (another try at #16785).The aim is to remove the large performance regression in parser that #19497 created, by making whether the parser generates tokens or not a compile-time option.
ParserConfig::tokensmethod replacesParseOptions::collect_tokensproperty. The former can be const-folded at compile time, where the latter couldn't.3 options
This PR also introduces 3 different config types that users can pass to the parser:
NoTokensParserConfig(default)TokensParserConfigRuntimeParserConfigThe first 2 set whether tokens are collected or not at compile time. The last sets it at runtime.
All 3 implement
ParserConfig.NoTokensParserConfigis the default, and is what's used in compiler pipeline. It switches tokens off in the parser, and makes all the tokens-related code dead code, which the compiler eliminates. This makes the ability of the parser to generate tokens zero cost when it's not used (in the compiler pipeline).TokensParserConfigis the one to use where you always want tokens. This is probably the config that linter will use.RuntimeParserConfigis the one to use when an application decides whether to generate tokens or not at runtime. This config avoids compiling the parser twice, at the cost of runtime checks. This is what NAPI parser package will use.Future extension
Supporting additional features
In future we intend to build the UTF-8 to UTF-16 offsets conversion table in the parser. This will be more performant than searching through the source text for unicode characters in a 2nd pass later on. But this feature is only required for uses of the parser where we're interacting with JS side (NAPI parser package, linter with JS plugins).
ParserConfigcan be extended to toggle this feature on/off at compile time or runtime, in the same way as you toggle on/off tokens.Options and configs
This PR introduces
ParserConfigbut leavesParseOptionsas it is. So we now have 2 sets of options, passed toParserwithwith_options(...)andwith_config(...). This is confusing.We could merge the 2 by making
ParseOptionsimplementParserConfig, so then you'd define all options with onewith_optionscall.This would have the side effect of making all other parser options (e.g.
preserve_parens) able to be set at either runtime or compile time, depending on the use case.For users consuming
oxc_parseras a library, with specific needs, they could also configureParserto their needs e.g. create a parser which only handles plain JS code with all the code paths for JSX and TS shaken out as dead code. This would likely improve parsing speed significantly for these use cases.Implementation details
Why a trait instead of a cargo feature?
IMO a trait is preferable for the following reasons:
#[cfg_attr(feature = "whatever", expect(clippy::unused_async))]etc.The introduction of a trait does not seem to significantly affect compile time:
Measured on Mac Mini M4 Pro,
cargo cleanrun before each. The difference appears to be mostly within the noise threshold.