fix(traverse): fix panic when truncating non-ASCII variable names#16265
Conversation
How to use the Graphite Merge QueueAdd either label to this PR to merge it via the merge queue:
You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has enabled the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. |
There was a problem hiding this comment.
Pull request overview
This PR fixes a panic that occurred when get_var_name_from_node truncated variable names containing multi-byte UTF-8 characters (e.g., Korean, Greek, CJK). The original implementation used byte-based truncate(20) which could split multi-byte characters, causing an is_char_boundary assertion failure. The fix changes to character-based truncation using chars().take(20).collect() to match Babel's id.slice(0, 20) behavior.
Key Changes:
- Changed truncation logic from byte-based (
truncate(20)) to character-based (chars().take(20).collect()) - Added comprehensive test coverage for UTF-8 scenarios including Greek, Korean, mixed ASCII/multi-byte strings, and edge cases
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.
CodSpeed Performance ReportMerging #16265 will not alter performanceComparing Summary
Footnotes
|
8f23bbf to
c4efa62
Compare
There was a problem hiding this comment.
Thanks for discovering this and for making a fix (and for adding lots of tests).
Chars iterator is pretty slow. I would suggest this instead:
- Check if 20th byte is ASCII.
- If it is, truncate to 20 bytes.
- If it isn't, find start byte of the character which byte 20 may be in middle of, and cut before that byte instead (i.e. maybe truncate to 19, 18, or 17 instead).
This is a slight divergence from Babel, but I think that's fine. The point is not to replicate the exact truncation strategy, but just not to produce massively long variable names.
Also, please add a test for 4-byte UTF-8 characters. That'll complete the set.
c4efa62 to
99bc668
Compare
|
@overlookmotel |
`get_var_name_from_node` was using byte-based `truncate(20)` which could cut in the middle of multi-byte UTF-8 characters (e.g., Korean, CJK), causing a panic on `is_char_boundary` assertion. Changed to character-based truncation using `chars().take(20)` to match Babel's `id.slice(0, 20)` behavior.
99bc668 to
7650483
Compare
### 💥 BREAKING CHANGES - 74cf572 ast: [**BREAKING**] Make `source` field of `TSImportType` a `StringLiteral` (#16114) (copilot-swe-agent) - 43156ae ast: [**BREAKING**] Rename `TSImportType` `argument` field to `source` (#16110) (overlookmotel) - 934d873 napi: [**BREAKING**] Drop `armv7-unknown-linux-musleabihf` support (#16105) (Boshen) ### 🚀 Features - 669afe0 ast: Add `Expression::is_jsx` method (#16154) (Dunqing) - 17a8caa parser: Add diagnostic for JSX identifiers with hyphens (#16133) (camchenry) - 0549ae5 parser: Add diagnostic for expected ident after optional chain (#16132) (camchenry) - db839ae parser: Improve diagnostic for unexpected optional declarations (#16131) (camchenry) - bab4bc8 napi/parser: Add type annotations to parse-raw-worker test (#15998) (camc314) ### 🐛 Bug Fixes - 35ed36c traverse: Fix panic when truncating non-ASCII variable names (#16265) (peter) - 9149a26 linter/plugins, napi/parser: Deep freeze visitor keys (#16293) (overlookmotel) - 6b54dab minifier: Incorrect non-null object condition simplification with `&&` and `||` (#16161) (sapphi-red) - 9cc20a1 minifier: Avoid merging side effectful expressions to next assignment statement if the side effect may change the left hand side reference (#16165) (sapphi-red) - 91eb3f2 ast/estree: Convert `TSImportType` `argument` field to `Literal` (#16109) (overlookmotel) - 1199cee parser: Reject invalid modifiers on parameter properties with binding patterns (#16083) (camc314) - f376325 traverse: Remove `console.log` from build script (#16049) (overlookmotel) ### ⚡ Performance - 82d784f lexer: Reduce bounds checks in `Lexer::get_string` (#16317) (overlookmotel) - cc2f352 span: Add `#[inline]` to `Atom` methods (#16311) (overlookmotel) - ffca070 span: Add `#[repr(transparent)]` to `Atom` (#16310) (overlookmotel) - 02bdf90 linter/plugins, napi/parser: Reuse arrays in visitor keys (#16294) (overlookmotel) ### 📚 Documentation - 891e0b4 parser: Add note about falling back to parse TSType in TSImportType (#16119) (camc314) Co-authored-by: Boshen <[email protected]>
…c-project#16265) ## Summary Fix panic when `get_var_name_from_node` processes non-ASCII variable names (Korean, CJK, Greek, etc.). ## Problem When transpiling code with non-ASCII variable names like Korean: ```typescript const [서비스이용약관_query, 개인정보수집동의_query] = useQueries({...}); ``` The following panic occurs: `thread 'tokio-runtime-worker' panicked at oxc_traverse/src/ast_operations/gather_node_parts.rs:26:14: assertion failed: self.is_char_boundary(new_len)` ## Cause get_var_name_from_node was using byte-based truncate(20): name.truncate(20); // Cuts at byte 20, not character 20 Korean characters are 3 bytes in UTF-8, so "서비스이용약관" (7 chars = 21 bytes) gets cut in the middle of '관', causing the panic. ## Solution Changed to character-based truncation to match https://github.com/babel/babel/blob/419644f27c5c59deb19e71aaabd417a3bc5483ca/packages/babel-traverse/src/scope/index.ts#L210: ```typescript if name.len() > 20 { name = name.chars().take(20).collect(); } ``` The len() > 20 check avoids unnecessary allocation when string is already short enough (20 bytes guarantees ≤20 chars). ## Test Plan Added tests for: - 2-byte UTF-8 (Greek letters) - 3-byte UTF-8 (Korean characters) - 4-byte UTF-8 - Mixed ASCII + multi-byte Co-authored-by: peter <[email protected]>
### 💥 BREAKING CHANGES - 74cf572 ast: [**BREAKING**] Make `source` field of `TSImportType` a `StringLiteral` (oxc-project#16114) (copilot-swe-agent) - 43156ae ast: [**BREAKING**] Rename `TSImportType` `argument` field to `source` (oxc-project#16110) (overlookmotel) - 934d873 napi: [**BREAKING**] Drop `armv7-unknown-linux-musleabihf` support (oxc-project#16105) (Boshen) ### 🚀 Features - 669afe0 ast: Add `Expression::is_jsx` method (oxc-project#16154) (Dunqing) - 17a8caa parser: Add diagnostic for JSX identifiers with hyphens (oxc-project#16133) (camchenry) - 0549ae5 parser: Add diagnostic for expected ident after optional chain (oxc-project#16132) (camchenry) - db839ae parser: Improve diagnostic for unexpected optional declarations (oxc-project#16131) (camchenry) - bab4bc8 napi/parser: Add type annotations to parse-raw-worker test (oxc-project#15998) (camc314) ### 🐛 Bug Fixes - 35ed36c traverse: Fix panic when truncating non-ASCII variable names (oxc-project#16265) (peter) - 9149a26 linter/plugins, napi/parser: Deep freeze visitor keys (oxc-project#16293) (overlookmotel) - 6b54dab minifier: Incorrect non-null object condition simplification with `&&` and `||` (oxc-project#16161) (sapphi-red) - 9cc20a1 minifier: Avoid merging side effectful expressions to next assignment statement if the side effect may change the left hand side reference (oxc-project#16165) (sapphi-red) - 91eb3f2 ast/estree: Convert `TSImportType` `argument` field to `Literal` (oxc-project#16109) (overlookmotel) - 1199cee parser: Reject invalid modifiers on parameter properties with binding patterns (oxc-project#16083) (camc314) - f376325 traverse: Remove `console.log` from build script (oxc-project#16049) (overlookmotel) ### ⚡ Performance - 82d784f lexer: Reduce bounds checks in `Lexer::get_string` (oxc-project#16317) (overlookmotel) - cc2f352 span: Add `#[inline]` to `Atom` methods (oxc-project#16311) (overlookmotel) - ffca070 span: Add `#[repr(transparent)]` to `Atom` (oxc-project#16310) (overlookmotel) - 02bdf90 linter/plugins, napi/parser: Reuse arrays in visitor keys (oxc-project#16294) (overlookmotel) ### 📚 Documentation - 891e0b4 parser: Add note about falling back to parse TSType in TSImportType (oxc-project#16119) (camc314) Co-authored-by: Boshen <[email protected]>
Summary
Fix panic when
get_var_name_from_nodeprocesses non-ASCII variable names (Korean, CJK, Greek, etc.).Problem
When transpiling code with non-ASCII variable names like Korean:
The following panic occurs:
thread 'tokio-runtime-worker' panicked at oxc_traverse/src/ast_operations/gather_node_parts.rs:26:14: assertion failed: self.is_char_boundary(new_len)Cause
get_var_name_from_node was using byte-based truncate(20):
name.truncate(20); // Cuts at byte 20, not character 20
Korean characters are 3 bytes in UTF-8, so "서비스이용약관" (7 chars = 21 bytes) gets cut in the middle of '관', causing the panic.
Solution
Changed to character-based truncation to match https://github.com/babel/babel/blob/419644f27c5c59deb19e71aaabd417a3bc5483ca/packages/babel-traverse/src/scope/index.ts#L210:
The len() > 20 check avoids unnecessary allocation when string is already short enough (20 bytes guarantees ≤20 chars).
Test Plan
Added tests for: