Skip to content

fix(regular_expression): preserve UnicodeEscape CharacterKind in string literals#18998

Merged
graphite-app[bot] merged 1 commit intomainfrom
fix/regexp-unicode-escape-kind
Feb 6, 2026
Merged

fix(regular_expression): preserve UnicodeEscape CharacterKind in string literals#18998
graphite-app[bot] merged 1 commit intomainfrom
fix/regexp-unicode-escape-kind

Conversation

@Boshen
Copy link
Member

@Boshen Boshen commented Feb 5, 2026

Summary

When parsing regex patterns from string literals (e.g., RegExp("[A\\u0301]")), unicode escape sequences were incorrectly identified as CharacterKind::Symbol instead of CharacterKind::UnicodeEscape.

Before:

RegExp("[A\u0301]")
// Character { value: 769, kind: Symbol }  // Wrong!

After:

RegExp("[A\u0301]")
// Character { value: 769, kind: UnicodeEscape }  // Correct!

The fix adds escape kind tracking through the parsing pipeline:

  • Added EscapeKind enum to CodePoint to track how characters were written in source
  • StringLiteralParser and TemplateLiteralParser now track unicode (\uXXXX, \u{XXXX}) and hex (\xXX) escapes
  • PatternParser uses this information when assigning CharacterKind

Closes #13660

🤖 Generated with Claude Code

@Boshen Boshen requested a review from leaysgur as a code owner February 5, 2026 15:15
Copilot AI review requested due to automatic review settings February 5, 2026 15:15
@github-actions github-actions bot added the C-bug Category - Bug label Feb 5, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a bug where unicode escape sequences in RegExp string literals (e.g., RegExp("[A\\u0301]")) were incorrectly identified as CharacterKind::Symbol instead of CharacterKind::UnicodeEscape. The fix adds escape kind tracking through the parsing pipeline to preserve information about how characters were written in source code.

Changes:

  • Added EscapeKind enum to track unicode (\uXXXX, \u{XXXX}) and hexadecimal (\xXX) escapes through parsing
  • Modified string literal and template literal parsers to track escape kinds
  • Updated pattern parser to use escape kind information when creating Character AST nodes
  • Added tests to verify the fix

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file
File Description
crates/oxc_regular_expression/src/parser/reader/ast.rs Adds EscapeKind enum and escape_kind field to CodePoint struct
crates/oxc_regular_expression/src/parser/reader/string_literal_parser/parser_impl.rs Updates string literal parser to track unicode and hex escapes via EscapeKind
crates/oxc_regular_expression/src/parser/reader/template_literal_parser/parser_impl.rs Updates template literal parser to track unicode and hex escapes via EscapeKind
crates/oxc_regular_expression/src/parser/reader/reader_impl.rs Adds peek_escape_kind() method to access escape kind information
crates/oxc_regular_expression/src/parser/pattern_parser/pattern_parser_impl.rs Adds conversion helper and updates all non-escaped character creation sites to use escape kind
crates/oxc_regular_expression/src/parser/reader/mod.rs Adds test for escape kind tracking behavior
crates/oxc_regular_expression/src/parser/mod.rs Adds integration test verifying the fix for issue #13660

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codspeed-hq
Copy link

codspeed-hq bot commented Feb 5, 2026

CodSpeed Performance Report

Merging this PR will not alter performance

Comparing fix/regexp-unicode-escape-kind (66881ba) with main (384abae)

Summary

✅ 46 untouched benchmarks
⏩ 3 skipped benchmarks1

Footnotes

  1. 3 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@Sysix Sysix force-pushed the fix/regexp-unicode-escape-kind branch from 81e896f to 66881ba Compare February 5, 2026 17:45
Copy link
Member

Sysix commented Feb 5, 2026


How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • 0-merge - adds this PR to the back of the merge queue
  • hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@leaysgur leaysgur added the 0-merge Merge with Graphite Merge Queue label Feb 6, 2026
Copy link
Member

leaysgur commented Feb 6, 2026

Merge activity

…ng literals (#18998)

## Summary

When parsing regex patterns from string literals (e.g., `RegExp("[A\\u0301]")`), unicode escape sequences were incorrectly identified as `CharacterKind::Symbol` instead of `CharacterKind::UnicodeEscape`.

**Before:**
```
RegExp("[A\u0301]")
// Character { value: 769, kind: Symbol }  // Wrong!
```

**After:**
```
RegExp("[A\u0301]")
// Character { value: 769, kind: UnicodeEscape }  // Correct!
```

The fix adds escape kind tracking through the parsing pipeline:
- Added `EscapeKind` enum to `CodePoint` to track how characters were written in source
- `StringLiteralParser` and `TemplateLiteralParser` now track unicode (`\uXXXX`, `\u{XXXX}`) and hex (`\xXX`) escapes
- `PatternParser` uses this information when assigning `CharacterKind`

Closes #13660

🤖 Generated with [Claude Code](https://claude.ai/code)
@graphite-app graphite-app bot force-pushed the fix/regexp-unicode-escape-kind branch from 66881ba to e3609e3 Compare February 6, 2026 00:37
@graphite-app graphite-app bot merged commit e3609e3 into main Feb 6, 2026
21 checks passed
@graphite-app graphite-app bot deleted the fix/regexp-unicode-escape-kind branch February 6, 2026 00:43
graphite-app bot pushed a commit that referenced this pull request Feb 6, 2026
graphite-app bot pushed a commit that referenced this pull request Feb 6, 2026
camc314 added a commit that referenced this pull request Feb 10, 2026
### 💥 BREAKING CHANGES

- 2bf7293 mangler: [**BREAKING**] Enable `top_level` by default for
modules and commonjs (#18278) (sapphi-red)
- 48b0542 span: [**BREAKING**] SourceType::ts should set module to
unambigious (#18873) (Boshen)

### 🚀 Features

- 500d071 minifier: Local traverse ctx and generated minifier traverse
(#19106) (Boshen)
- 142a1be parser: Detect binary files with TS1490 error (#19047)
(Boshen)
- e316857 allocator/bitset: Add `Ones` iterator to `BitSet` (#19027)
(sapphi-red)
- 742ad3f minifier: Default `invalid_import_side_effects` to `false`
(#18916) (sapphi-red)
- 0eff6be parser: Error JSX-like type assertions and generics in
`.mts`/`.cts` (#18910) (Boshen)
- 18320c6 span: Store file extension in `SourceType` (#18893) (Boshen)

### 🐛 Bug Fixes

- a7514e4 isolated-declarations: Preserve const context in literal type
inference (#19178) (camc314)
- 312e756 isolated-declarations: Preserve readonly literal initializers
(#19177) (camc314)
- d0ca8d0 isolated-declarations: Skip parenthesis when inferring type
(#19176) (camc314)
- 110c300 oxc_ecmascript: `+[false]` and `+[true]` should evaluate to
`NaN` (#19174) (copilot-swe-agent)
- f32ea19 semantic: Report redeclaration error for import bindings
conflicting with value declarations (#19068) (Boshen)
- 3aeba7a semantic: Report redeclaration error for `function a() {} var
a` in module mode (#19041) (Boshen)
- 35e32c6 coverage: Match Babel's options.json inheritance for test
fixtures (#19002) (Boshen)
- 463d60d semantic: Skip TS2391 for standalone computed-name class
methods (#19025) (Boshen)
- 56c086b parser: Add modifier ordering validation (TS1029) (#19024)
(Boshen)
- 6067a49 linter/jsdoc: False positive in `check-tag-names` for `@` in
email addresses and npm scopes (#19021) (Boshen)
- b13bb70 semantic/jsdoc: Inline tags like `{@link}` break jsdoc parsing
(#19019) (Boshen)
- e3609e3 regular_expression: Preserve UnicodeEscape CharacterKind in
string literals (#18998) (Boshen)
- 57917ee parser: Parse decorators on rest parameters (#18938) (Boshen)
- 487601b napi: Disable mimalloc on Windows to fix worker_threads crash
(#18923) (Boshen)
- 1f6b193 parser: Validate TypeScript import type options (#18889)
(Boshen)
- 1663184 parser: Allow conditional types in function type parameters
(#18886) (Boshen)
- 5758046 parser: Error on property access after instantiation
expression (#18887) (Boshen)
- 5eb4a94 parser: Handle `<<` as two `<` tokens in type argument
contexts (#18885) (Boshen)

### ⚡ Performance

- ed8c054 oxc_str: Add precomputed hash to Ident for fast HashMap
lookups (#19143) (Boshen)
- d4a0867 transformer_plugins: Switch ReplaceGlobalDefines from Traverse
to VisitMut (#19146) (Boshen)
- 9eb16b3 syntax: Pack ASCII identifier tables into single bitflag table
(#19088) (Boshen)
- e7595d1 mangler: Use BitSet for exported symbols set (#19023)
(sapphi-red)
- 2537924 semantic: Optimize scope resolution with fast paths and
inlining (#19029) (Boshen)
- 69a8d85 mangler: Use BitSet for keep_names symbols set (#19028)
(sapphi-red)
- f78c525 parser: Try hybrid parsing for jsx children and closing
element/fragments (#18789) (camchenry)

Co-authored-by: camc314 <[email protected]>
owjs3901 pushed a commit to owjs3901/oxc that referenced this pull request Feb 11, 2026
### 💥 BREAKING CHANGES

- 2bf7293 mangler: [**BREAKING**] Enable `top_level` by default for
modules and commonjs (oxc-project#18278) (sapphi-red)
- 48b0542 span: [**BREAKING**] SourceType::ts should set module to
unambigious (oxc-project#18873) (Boshen)

### 🚀 Features

- 500d071 minifier: Local traverse ctx and generated minifier traverse
(oxc-project#19106) (Boshen)
- 142a1be parser: Detect binary files with TS1490 error (oxc-project#19047)
(Boshen)
- e316857 allocator/bitset: Add `Ones` iterator to `BitSet` (oxc-project#19027)
(sapphi-red)
- 742ad3f minifier: Default `invalid_import_side_effects` to `false`
(oxc-project#18916) (sapphi-red)
- 0eff6be parser: Error JSX-like type assertions and generics in
`.mts`/`.cts` (oxc-project#18910) (Boshen)
- 18320c6 span: Store file extension in `SourceType` (oxc-project#18893) (Boshen)

### 🐛 Bug Fixes

- a7514e4 isolated-declarations: Preserve const context in literal type
inference (oxc-project#19178) (camc314)
- 312e756 isolated-declarations: Preserve readonly literal initializers
(oxc-project#19177) (camc314)
- d0ca8d0 isolated-declarations: Skip parenthesis when inferring type
(oxc-project#19176) (camc314)
- 110c300 oxc_ecmascript: `+[false]` and `+[true]` should evaluate to
`NaN` (oxc-project#19174) (copilot-swe-agent)
- f32ea19 semantic: Report redeclaration error for import bindings
conflicting with value declarations (oxc-project#19068) (Boshen)
- 3aeba7a semantic: Report redeclaration error for `function a() {} var
a` in module mode (oxc-project#19041) (Boshen)
- 35e32c6 coverage: Match Babel's options.json inheritance for test
fixtures (oxc-project#19002) (Boshen)
- 463d60d semantic: Skip TS2391 for standalone computed-name class
methods (oxc-project#19025) (Boshen)
- 56c086b parser: Add modifier ordering validation (TS1029) (oxc-project#19024)
(Boshen)
- 6067a49 linter/jsdoc: False positive in `check-tag-names` for `@` in
email addresses and npm scopes (oxc-project#19021) (Boshen)
- b13bb70 semantic/jsdoc: Inline tags like `{@link}` break jsdoc parsing
(oxc-project#19019) (Boshen)
- e3609e3 regular_expression: Preserve UnicodeEscape CharacterKind in
string literals (oxc-project#18998) (Boshen)
- 57917ee parser: Parse decorators on rest parameters (oxc-project#18938) (Boshen)
- 487601b napi: Disable mimalloc on Windows to fix worker_threads crash
(oxc-project#18923) (Boshen)
- 1f6b193 parser: Validate TypeScript import type options (oxc-project#18889)
(Boshen)
- 1663184 parser: Allow conditional types in function type parameters
(oxc-project#18886) (Boshen)
- 5758046 parser: Error on property access after instantiation
expression (oxc-project#18887) (Boshen)
- 5eb4a94 parser: Handle `<<` as two `<` tokens in type argument
contexts (oxc-project#18885) (Boshen)

### ⚡ Performance

- ed8c054 oxc_str: Add precomputed hash to Ident for fast HashMap
lookups (oxc-project#19143) (Boshen)
- d4a0867 transformer_plugins: Switch ReplaceGlobalDefines from Traverse
to VisitMut (oxc-project#19146) (Boshen)
- 9eb16b3 syntax: Pack ASCII identifier tables into single bitflag table
(oxc-project#19088) (Boshen)
- e7595d1 mangler: Use BitSet for exported symbols set (oxc-project#19023)
(sapphi-red)
- 2537924 semantic: Optimize scope resolution with fast paths and
inlining (oxc-project#19029) (Boshen)
- 69a8d85 mangler: Use BitSet for keep_names symbols set (oxc-project#19028)
(sapphi-red)
- f78c525 parser: Try hybrid parsing for jsx children and closing
element/fragments (oxc-project#18789) (camchenry)

Co-authored-by: camc314 <[email protected]>
OskarLebuda pushed a commit to OskarLebuda/oxc that referenced this pull request Feb 17, 2026
OskarLebuda pushed a commit to OskarLebuda/oxc that referenced this pull request Feb 17, 2026
### 💥 BREAKING CHANGES

- 2bf7293 mangler: [**BREAKING**] Enable `top_level` by default for
modules and commonjs (oxc-project#18278) (sapphi-red)
- 48b0542 span: [**BREAKING**] SourceType::ts should set module to
unambigious (oxc-project#18873) (Boshen)

### 🚀 Features

- 500d071 minifier: Local traverse ctx and generated minifier traverse
(oxc-project#19106) (Boshen)
- 142a1be parser: Detect binary files with TS1490 error (oxc-project#19047)
(Boshen)
- e316857 allocator/bitset: Add `Ones` iterator to `BitSet` (oxc-project#19027)
(sapphi-red)
- 742ad3f minifier: Default `invalid_import_side_effects` to `false`
(oxc-project#18916) (sapphi-red)
- 0eff6be parser: Error JSX-like type assertions and generics in
`.mts`/`.cts` (oxc-project#18910) (Boshen)
- 18320c6 span: Store file extension in `SourceType` (oxc-project#18893) (Boshen)

### 🐛 Bug Fixes

- a7514e4 isolated-declarations: Preserve const context in literal type
inference (oxc-project#19178) (camc314)
- 312e756 isolated-declarations: Preserve readonly literal initializers
(oxc-project#19177) (camc314)
- d0ca8d0 isolated-declarations: Skip parenthesis when inferring type
(oxc-project#19176) (camc314)
- 110c300 oxc_ecmascript: `+[false]` and `+[true]` should evaluate to
`NaN` (oxc-project#19174) (copilot-swe-agent)
- f32ea19 semantic: Report redeclaration error for import bindings
conflicting with value declarations (oxc-project#19068) (Boshen)
- 3aeba7a semantic: Report redeclaration error for `function a() {} var
a` in module mode (oxc-project#19041) (Boshen)
- 35e32c6 coverage: Match Babel's options.json inheritance for test
fixtures (oxc-project#19002) (Boshen)
- 463d60d semantic: Skip TS2391 for standalone computed-name class
methods (oxc-project#19025) (Boshen)
- 56c086b parser: Add modifier ordering validation (TS1029) (oxc-project#19024)
(Boshen)
- 6067a49 linter/jsdoc: False positive in `check-tag-names` for `@` in
email addresses and npm scopes (oxc-project#19021) (Boshen)
- b13bb70 semantic/jsdoc: Inline tags like `{@link}` break jsdoc parsing
(oxc-project#19019) (Boshen)
- e3609e3 regular_expression: Preserve UnicodeEscape CharacterKind in
string literals (oxc-project#18998) (Boshen)
- 57917ee parser: Parse decorators on rest parameters (oxc-project#18938) (Boshen)
- 487601b napi: Disable mimalloc on Windows to fix worker_threads crash
(oxc-project#18923) (Boshen)
- 1f6b193 parser: Validate TypeScript import type options (oxc-project#18889)
(Boshen)
- 1663184 parser: Allow conditional types in function type parameters
(oxc-project#18886) (Boshen)
- 5758046 parser: Error on property access after instantiation
expression (oxc-project#18887) (Boshen)
- 5eb4a94 parser: Handle `<<` as two `<` tokens in type argument
contexts (oxc-project#18885) (Boshen)

### ⚡ Performance

- ed8c054 oxc_str: Add precomputed hash to Ident for fast HashMap
lookups (oxc-project#19143) (Boshen)
- d4a0867 transformer_plugins: Switch ReplaceGlobalDefines from Traverse
to VisitMut (oxc-project#19146) (Boshen)
- 9eb16b3 syntax: Pack ASCII identifier tables into single bitflag table
(oxc-project#19088) (Boshen)
- e7595d1 mangler: Use BitSet for exported symbols set (oxc-project#19023)
(sapphi-red)
- 2537924 semantic: Optimize scope resolution with fast paths and
inlining (oxc-project#19029) (Boshen)
- 69a8d85 mangler: Use BitSet for keep_names symbols set (oxc-project#19028)
(sapphi-red)
- f78c525 parser: Try hybrid parsing for jsx children and closing
element/fragments (oxc-project#18789) (camchenry)

Co-authored-by: camc314 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

0-merge Merge with Graphite Merge Queue C-bug Category - Bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

regexp: UnicodeEscape Character inside StringLiteral is not correctly detected

3 participants

Comments