fix(regular_expression): preserve UnicodeEscape CharacterKind in string literals#18998
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes a bug where unicode escape sequences in RegExp string literals (e.g., RegExp("[A\\u0301]")) were incorrectly identified as CharacterKind::Symbol instead of CharacterKind::UnicodeEscape. The fix adds escape kind tracking through the parsing pipeline to preserve information about how characters were written in source code.
Changes:
- Added
EscapeKindenum to track unicode (\uXXXX,\u{XXXX}) and hexadecimal (\xXX) escapes through parsing - Modified string literal and template literal parsers to track escape kinds
- Updated pattern parser to use escape kind information when creating Character AST nodes
- Added tests to verify the fix
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
crates/oxc_regular_expression/src/parser/reader/ast.rs |
Adds EscapeKind enum and escape_kind field to CodePoint struct |
crates/oxc_regular_expression/src/parser/reader/string_literal_parser/parser_impl.rs |
Updates string literal parser to track unicode and hex escapes via EscapeKind |
crates/oxc_regular_expression/src/parser/reader/template_literal_parser/parser_impl.rs |
Updates template literal parser to track unicode and hex escapes via EscapeKind |
crates/oxc_regular_expression/src/parser/reader/reader_impl.rs |
Adds peek_escape_kind() method to access escape kind information |
crates/oxc_regular_expression/src/parser/pattern_parser/pattern_parser_impl.rs |
Adds conversion helper and updates all non-escaped character creation sites to use escape kind |
crates/oxc_regular_expression/src/parser/reader/mod.rs |
Adds test for escape kind tracking behavior |
crates/oxc_regular_expression/src/parser/mod.rs |
Adds integration test verifying the fix for issue #13660 |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
CodSpeed Performance ReportMerging this PR will not alter performanceComparing Summary
Footnotes
|
81e896f to
66881ba
Compare
How to use the Graphite Merge QueueAdd either label to this PR to merge it via the merge queue:
You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has enabled the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. This stack of pull requests is managed by Graphite. Learn more about stacking. |
Merge activity
|
…ng literals (#18998) ## Summary When parsing regex patterns from string literals (e.g., `RegExp("[A\\u0301]")`), unicode escape sequences were incorrectly identified as `CharacterKind::Symbol` instead of `CharacterKind::UnicodeEscape`. **Before:** ``` RegExp("[A\u0301]") // Character { value: 769, kind: Symbol } // Wrong! ``` **After:** ``` RegExp("[A\u0301]") // Character { value: 769, kind: UnicodeEscape } // Correct! ``` The fix adds escape kind tracking through the parsing pipeline: - Added `EscapeKind` enum to `CodePoint` to track how characters were written in source - `StringLiteralParser` and `TemplateLiteralParser` now track unicode (`\uXXXX`, `\u{XXXX}`) and hex (`\xXX`) escapes - `PatternParser` uses this information when assigning `CharacterKind` Closes #13660 🤖 Generated with [Claude Code](https://claude.ai/code)
66881ba to
e3609e3
Compare
after #18998 the tests are now passing now
after #18998 the tests are now passing now
### 💥 BREAKING CHANGES - 2bf7293 mangler: [**BREAKING**] Enable `top_level` by default for modules and commonjs (#18278) (sapphi-red) - 48b0542 span: [**BREAKING**] SourceType::ts should set module to unambigious (#18873) (Boshen) ### 🚀 Features - 500d071 minifier: Local traverse ctx and generated minifier traverse (#19106) (Boshen) - 142a1be parser: Detect binary files with TS1490 error (#19047) (Boshen) - e316857 allocator/bitset: Add `Ones` iterator to `BitSet` (#19027) (sapphi-red) - 742ad3f minifier: Default `invalid_import_side_effects` to `false` (#18916) (sapphi-red) - 0eff6be parser: Error JSX-like type assertions and generics in `.mts`/`.cts` (#18910) (Boshen) - 18320c6 span: Store file extension in `SourceType` (#18893) (Boshen) ### 🐛 Bug Fixes - a7514e4 isolated-declarations: Preserve const context in literal type inference (#19178) (camc314) - 312e756 isolated-declarations: Preserve readonly literal initializers (#19177) (camc314) - d0ca8d0 isolated-declarations: Skip parenthesis when inferring type (#19176) (camc314) - 110c300 oxc_ecmascript: `+[false]` and `+[true]` should evaluate to `NaN` (#19174) (copilot-swe-agent) - f32ea19 semantic: Report redeclaration error for import bindings conflicting with value declarations (#19068) (Boshen) - 3aeba7a semantic: Report redeclaration error for `function a() {} var a` in module mode (#19041) (Boshen) - 35e32c6 coverage: Match Babel's options.json inheritance for test fixtures (#19002) (Boshen) - 463d60d semantic: Skip TS2391 for standalone computed-name class methods (#19025) (Boshen) - 56c086b parser: Add modifier ordering validation (TS1029) (#19024) (Boshen) - 6067a49 linter/jsdoc: False positive in `check-tag-names` for `@` in email addresses and npm scopes (#19021) (Boshen) - b13bb70 semantic/jsdoc: Inline tags like `{@link}` break jsdoc parsing (#19019) (Boshen) - e3609e3 regular_expression: Preserve UnicodeEscape CharacterKind in string literals (#18998) (Boshen) - 57917ee parser: Parse decorators on rest parameters (#18938) (Boshen) - 487601b napi: Disable mimalloc on Windows to fix worker_threads crash (#18923) (Boshen) - 1f6b193 parser: Validate TypeScript import type options (#18889) (Boshen) - 1663184 parser: Allow conditional types in function type parameters (#18886) (Boshen) - 5758046 parser: Error on property access after instantiation expression (#18887) (Boshen) - 5eb4a94 parser: Handle `<<` as two `<` tokens in type argument contexts (#18885) (Boshen) ### ⚡ Performance - ed8c054 oxc_str: Add precomputed hash to Ident for fast HashMap lookups (#19143) (Boshen) - d4a0867 transformer_plugins: Switch ReplaceGlobalDefines from Traverse to VisitMut (#19146) (Boshen) - 9eb16b3 syntax: Pack ASCII identifier tables into single bitflag table (#19088) (Boshen) - e7595d1 mangler: Use BitSet for exported symbols set (#19023) (sapphi-red) - 2537924 semantic: Optimize scope resolution with fast paths and inlining (#19029) (Boshen) - 69a8d85 mangler: Use BitSet for keep_names symbols set (#19028) (sapphi-red) - f78c525 parser: Try hybrid parsing for jsx children and closing element/fragments (#18789) (camchenry) Co-authored-by: camc314 <[email protected]>
### 💥 BREAKING CHANGES - 2bf7293 mangler: [**BREAKING**] Enable `top_level` by default for modules and commonjs (oxc-project#18278) (sapphi-red) - 48b0542 span: [**BREAKING**] SourceType::ts should set module to unambigious (oxc-project#18873) (Boshen) ### 🚀 Features - 500d071 minifier: Local traverse ctx and generated minifier traverse (oxc-project#19106) (Boshen) - 142a1be parser: Detect binary files with TS1490 error (oxc-project#19047) (Boshen) - e316857 allocator/bitset: Add `Ones` iterator to `BitSet` (oxc-project#19027) (sapphi-red) - 742ad3f minifier: Default `invalid_import_side_effects` to `false` (oxc-project#18916) (sapphi-red) - 0eff6be parser: Error JSX-like type assertions and generics in `.mts`/`.cts` (oxc-project#18910) (Boshen) - 18320c6 span: Store file extension in `SourceType` (oxc-project#18893) (Boshen) ### 🐛 Bug Fixes - a7514e4 isolated-declarations: Preserve const context in literal type inference (oxc-project#19178) (camc314) - 312e756 isolated-declarations: Preserve readonly literal initializers (oxc-project#19177) (camc314) - d0ca8d0 isolated-declarations: Skip parenthesis when inferring type (oxc-project#19176) (camc314) - 110c300 oxc_ecmascript: `+[false]` and `+[true]` should evaluate to `NaN` (oxc-project#19174) (copilot-swe-agent) - f32ea19 semantic: Report redeclaration error for import bindings conflicting with value declarations (oxc-project#19068) (Boshen) - 3aeba7a semantic: Report redeclaration error for `function a() {} var a` in module mode (oxc-project#19041) (Boshen) - 35e32c6 coverage: Match Babel's options.json inheritance for test fixtures (oxc-project#19002) (Boshen) - 463d60d semantic: Skip TS2391 for standalone computed-name class methods (oxc-project#19025) (Boshen) - 56c086b parser: Add modifier ordering validation (TS1029) (oxc-project#19024) (Boshen) - 6067a49 linter/jsdoc: False positive in `check-tag-names` for `@` in email addresses and npm scopes (oxc-project#19021) (Boshen) - b13bb70 semantic/jsdoc: Inline tags like `{@link}` break jsdoc parsing (oxc-project#19019) (Boshen) - e3609e3 regular_expression: Preserve UnicodeEscape CharacterKind in string literals (oxc-project#18998) (Boshen) - 57917ee parser: Parse decorators on rest parameters (oxc-project#18938) (Boshen) - 487601b napi: Disable mimalloc on Windows to fix worker_threads crash (oxc-project#18923) (Boshen) - 1f6b193 parser: Validate TypeScript import type options (oxc-project#18889) (Boshen) - 1663184 parser: Allow conditional types in function type parameters (oxc-project#18886) (Boshen) - 5758046 parser: Error on property access after instantiation expression (oxc-project#18887) (Boshen) - 5eb4a94 parser: Handle `<<` as two `<` tokens in type argument contexts (oxc-project#18885) (Boshen) ### ⚡ Performance - ed8c054 oxc_str: Add precomputed hash to Ident for fast HashMap lookups (oxc-project#19143) (Boshen) - d4a0867 transformer_plugins: Switch ReplaceGlobalDefines from Traverse to VisitMut (oxc-project#19146) (Boshen) - 9eb16b3 syntax: Pack ASCII identifier tables into single bitflag table (oxc-project#19088) (Boshen) - e7595d1 mangler: Use BitSet for exported symbols set (oxc-project#19023) (sapphi-red) - 2537924 semantic: Optimize scope resolution with fast paths and inlining (oxc-project#19029) (Boshen) - 69a8d85 mangler: Use BitSet for keep_names symbols set (oxc-project#19028) (sapphi-red) - f78c525 parser: Try hybrid parsing for jsx children and closing element/fragments (oxc-project#18789) (camchenry) Co-authored-by: camc314 <[email protected]>
…project#19005) after oxc-project#18998 the tests are now passing now
### 💥 BREAKING CHANGES - 2bf7293 mangler: [**BREAKING**] Enable `top_level` by default for modules and commonjs (oxc-project#18278) (sapphi-red) - 48b0542 span: [**BREAKING**] SourceType::ts should set module to unambigious (oxc-project#18873) (Boshen) ### 🚀 Features - 500d071 minifier: Local traverse ctx and generated minifier traverse (oxc-project#19106) (Boshen) - 142a1be parser: Detect binary files with TS1490 error (oxc-project#19047) (Boshen) - e316857 allocator/bitset: Add `Ones` iterator to `BitSet` (oxc-project#19027) (sapphi-red) - 742ad3f minifier: Default `invalid_import_side_effects` to `false` (oxc-project#18916) (sapphi-red) - 0eff6be parser: Error JSX-like type assertions and generics in `.mts`/`.cts` (oxc-project#18910) (Boshen) - 18320c6 span: Store file extension in `SourceType` (oxc-project#18893) (Boshen) ### 🐛 Bug Fixes - a7514e4 isolated-declarations: Preserve const context in literal type inference (oxc-project#19178) (camc314) - 312e756 isolated-declarations: Preserve readonly literal initializers (oxc-project#19177) (camc314) - d0ca8d0 isolated-declarations: Skip parenthesis when inferring type (oxc-project#19176) (camc314) - 110c300 oxc_ecmascript: `+[false]` and `+[true]` should evaluate to `NaN` (oxc-project#19174) (copilot-swe-agent) - f32ea19 semantic: Report redeclaration error for import bindings conflicting with value declarations (oxc-project#19068) (Boshen) - 3aeba7a semantic: Report redeclaration error for `function a() {} var a` in module mode (oxc-project#19041) (Boshen) - 35e32c6 coverage: Match Babel's options.json inheritance for test fixtures (oxc-project#19002) (Boshen) - 463d60d semantic: Skip TS2391 for standalone computed-name class methods (oxc-project#19025) (Boshen) - 56c086b parser: Add modifier ordering validation (TS1029) (oxc-project#19024) (Boshen) - 6067a49 linter/jsdoc: False positive in `check-tag-names` for `@` in email addresses and npm scopes (oxc-project#19021) (Boshen) - b13bb70 semantic/jsdoc: Inline tags like `{@link}` break jsdoc parsing (oxc-project#19019) (Boshen) - e3609e3 regular_expression: Preserve UnicodeEscape CharacterKind in string literals (oxc-project#18998) (Boshen) - 57917ee parser: Parse decorators on rest parameters (oxc-project#18938) (Boshen) - 487601b napi: Disable mimalloc on Windows to fix worker_threads crash (oxc-project#18923) (Boshen) - 1f6b193 parser: Validate TypeScript import type options (oxc-project#18889) (Boshen) - 1663184 parser: Allow conditional types in function type parameters (oxc-project#18886) (Boshen) - 5758046 parser: Error on property access after instantiation expression (oxc-project#18887) (Boshen) - 5eb4a94 parser: Handle `<<` as two `<` tokens in type argument contexts (oxc-project#18885) (Boshen) ### ⚡ Performance - ed8c054 oxc_str: Add precomputed hash to Ident for fast HashMap lookups (oxc-project#19143) (Boshen) - d4a0867 transformer_plugins: Switch ReplaceGlobalDefines from Traverse to VisitMut (oxc-project#19146) (Boshen) - 9eb16b3 syntax: Pack ASCII identifier tables into single bitflag table (oxc-project#19088) (Boshen) - e7595d1 mangler: Use BitSet for exported symbols set (oxc-project#19023) (sapphi-red) - 2537924 semantic: Optimize scope resolution with fast paths and inlining (oxc-project#19029) (Boshen) - 69a8d85 mangler: Use BitSet for keep_names symbols set (oxc-project#19028) (sapphi-red) - f78c525 parser: Try hybrid parsing for jsx children and closing element/fragments (oxc-project#18789) (camchenry) Co-authored-by: camc314 <[email protected]>

Summary
When parsing regex patterns from string literals (e.g.,
RegExp("[A\\u0301]")), unicode escape sequences were incorrectly identified asCharacterKind::Symbolinstead ofCharacterKind::UnicodeEscape.Before:
After:
The fix adds escape kind tracking through the parsing pipeline:
EscapeKindenum toCodePointto track how characters were written in sourceStringLiteralParserandTemplateLiteralParsernow track unicode (\uXXXX,\u{XXXX}) and hex (\xXX) escapesPatternParseruses this information when assigningCharacterKindCloses #13660
🤖 Generated with Claude Code