Skip to content

Comments

fix(traverse): fix panic when truncating non-ASCII variable names#16265

Merged
overlookmotel merged 1 commit intooxc-project:mainfrom
tbvjaos510:fix/traverse-var-name-truncation
Nov 30, 2025
Merged

fix(traverse): fix panic when truncating non-ASCII variable names#16265
overlookmotel merged 1 commit intooxc-project:mainfrom
tbvjaos510:fix/traverse-var-name-truncation

Conversation

@tbvjaos510
Copy link
Contributor

@tbvjaos510 tbvjaos510 commented Nov 29, 2025

Summary

Fix panic when get_var_name_from_node processes non-ASCII variable names (Korean, CJK, Greek, etc.).

Problem

When transpiling code with non-ASCII variable names like Korean:

const [서비스이용약관_query, 개인정보수집동의_query] = useQueries({...});

The following panic occurs:

thread 'tokio-runtime-worker' panicked at oxc_traverse/src/ast_operations/gather_node_parts.rs:26:14: assertion failed: self.is_char_boundary(new_len)

Cause

get_var_name_from_node was using byte-based truncate(20):

name.truncate(20); // Cuts at byte 20, not character 20

Korean characters are 3 bytes in UTF-8, so "서비스이용약관" (7 chars = 21 bytes) gets cut in the middle of '관', causing the panic.

Solution

Changed to character-based truncation to match https://github.com/babel/babel/blob/419644f27c5c59deb19e71aaabd417a3bc5483ca/packages/babel-traverse/src/scope/index.ts#L210:

if name.len() > 20 {
    name = name.chars().take(20).collect();
}

The len() > 20 check avoids unnecessary allocation when string is already short enough (20 bytes guarantees ≤20 chars).

Test Plan

Added tests for:

  • 2-byte UTF-8 (Greek letters)
  • 3-byte UTF-8 (Korean characters)
  • 4-byte UTF-8
  • Mixed ASCII + multi-byte

Copilot AI review requested due to automatic review settings November 29, 2025 06:00
@graphite-app
Copy link
Contributor

graphite-app bot commented Nov 29, 2025

How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • 0-merge - adds this PR to the back of the merge queue
  • hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a panic that occurred when get_var_name_from_node truncated variable names containing multi-byte UTF-8 characters (e.g., Korean, Greek, CJK). The original implementation used byte-based truncate(20) which could split multi-byte characters, causing an is_char_boundary assertion failure. The fix changes to character-based truncation using chars().take(20).collect() to match Babel's id.slice(0, 20) behavior.

Key Changes:

  • Changed truncation logic from byte-based (truncate(20)) to character-based (chars().take(20).collect())
  • Added comprehensive test coverage for UTF-8 scenarios including Greek, Korean, mixed ASCII/multi-byte strings, and edge cases

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

@codspeed-hq
Copy link

codspeed-hq bot commented Nov 29, 2025

CodSpeed Performance Report

Merging #16265 will not alter performance

Comparing tbvjaos510:fix/traverse-var-name-truncation (7650483) with main (6a641f9)1

Summary

✅ 38 untouched
⏩ 7 skipped2

Footnotes

  1. No successful run was found on main (41129ab) during the generation of this report, so 6a641f9 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

  2. 7 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@tbvjaos510 tbvjaos510 force-pushed the fix/traverse-var-name-truncation branch from 8f23bbf to c4efa62 Compare November 29, 2025 06:12
Copy link
Member

@overlookmotel overlookmotel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for discovering this and for making a fix (and for adding lots of tests).

Chars iterator is pretty slow. I would suggest this instead:

  • Check if 20th byte is ASCII.
  • If it is, truncate to 20 bytes.
  • If it isn't, find start byte of the character which byte 20 may be in middle of, and cut before that byte instead (i.e. maybe truncate to 19, 18, or 17 instead).

This is a slight divergence from Babel, but I think that's fine. The point is not to replicate the exact truncation strategy, but just not to produce massively long variable names.


Also, please add a test for 4-byte UTF-8 characters. That'll complete the set.

@tbvjaos510 tbvjaos510 force-pushed the fix/traverse-var-name-truncation branch from c4efa62 to 99bc668 Compare November 29, 2025 15:19
@tbvjaos510
Copy link
Contributor Author

@overlookmotel
Thanks for the suggestion! Updated to use byte-based truncation instead of Chars iterator. Also added a test for 4-byte UTF-8 characters (CJK Extension B).

Copy link
Member

@overlookmotel overlookmotel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

`get_var_name_from_node` was using byte-based `truncate(20)` which could
cut in the middle of multi-byte UTF-8 characters (e.g., Korean, CJK),
causing a panic on `is_char_boundary` assertion.

Changed to character-based truncation using `chars().take(20)` to match
Babel's `id.slice(0, 20)` behavior.
@overlookmotel overlookmotel force-pushed the fix/traverse-var-name-truncation branch from 99bc668 to 7650483 Compare November 30, 2025 14:03
@overlookmotel overlookmotel merged commit 35ed36c into oxc-project:main Nov 30, 2025
28 checks passed
overlookmotel pushed a commit that referenced this pull request Dec 1, 2025
### 💥 BREAKING CHANGES

- 74cf572 ast: [**BREAKING**] Make `source` field of `TSImportType` a
`StringLiteral` (#16114) (copilot-swe-agent)
- 43156ae ast: [**BREAKING**] Rename `TSImportType` `argument` field to
`source` (#16110) (overlookmotel)
- 934d873 napi: [**BREAKING**] Drop `armv7-unknown-linux-musleabihf`
support (#16105) (Boshen)

### 🚀 Features

- 669afe0 ast: Add `Expression::is_jsx` method (#16154) (Dunqing)
- 17a8caa parser: Add diagnostic for JSX identifiers with hyphens
(#16133) (camchenry)
- 0549ae5 parser: Add diagnostic for expected ident after optional chain
(#16132) (camchenry)
- db839ae parser: Improve diagnostic for unexpected optional
declarations (#16131) (camchenry)
- bab4bc8 napi/parser: Add type annotations to parse-raw-worker test
(#15998) (camc314)

### 🐛 Bug Fixes

- 35ed36c traverse: Fix panic when truncating non-ASCII variable names
(#16265) (peter)
- 9149a26 linter/plugins, napi/parser: Deep freeze visitor keys (#16293)
(overlookmotel)
- 6b54dab minifier: Incorrect non-null object condition simplification
with `&&` and `||` (#16161) (sapphi-red)
- 9cc20a1 minifier: Avoid merging side effectful expressions to next
assignment statement if the side effect may change the left hand side
reference (#16165) (sapphi-red)
- 91eb3f2 ast/estree: Convert `TSImportType` `argument` field to
`Literal` (#16109) (overlookmotel)
- 1199cee parser: Reject invalid modifiers on parameter properties with
binding patterns (#16083) (camc314)
- f376325 traverse: Remove `console.log` from build script (#16049)
(overlookmotel)

### ⚡ Performance

- 82d784f lexer: Reduce bounds checks in `Lexer::get_string` (#16317)
(overlookmotel)
- cc2f352 span: Add `#[inline]` to `Atom` methods (#16311)
(overlookmotel)
- ffca070 span: Add `#[repr(transparent)]` to `Atom` (#16310)
(overlookmotel)
- 02bdf90 linter/plugins, napi/parser: Reuse arrays in visitor keys
(#16294) (overlookmotel)

### 📚 Documentation

- 891e0b4 parser: Add note about falling back to parse TSType in
TSImportType (#16119) (camc314)

Co-authored-by: Boshen <[email protected]>
taearls pushed a commit to taearls/oxc that referenced this pull request Dec 11, 2025
…c-project#16265)

## Summary

Fix panic when `get_var_name_from_node` processes non-ASCII variable
names (Korean, CJK, Greek, etc.).

## Problem

When transpiling code with non-ASCII variable names like Korean:

```typescript
const [서비스이용약관_query, 개인정보수집동의_query] = useQueries({...});
```
The following panic occurs:

`thread 'tokio-runtime-worker' panicked at
oxc_traverse/src/ast_operations/gather_node_parts.rs:26:14:
assertion failed: self.is_char_boundary(new_len)`

## Cause

get_var_name_from_node was using byte-based truncate(20):

name.truncate(20);  // Cuts at byte 20, not character 20

Korean characters are 3 bytes in UTF-8, so "서비스이용약관" (7 chars = 21
bytes) gets cut in the middle of '관', causing the panic.

## Solution

Changed to character-based truncation to match
https://github.com/babel/babel/blob/419644f27c5c59deb19e71aaabd417a3bc5483ca/packages/babel-traverse/src/scope/index.ts#L210:
```typescript
if name.len() > 20 {
    name = name.chars().take(20).collect();
}
```
The len() > 20 check avoids unnecessary allocation when string is
already short enough (20 bytes guarantees ≤20 chars).

## Test Plan

Added tests for:
- 2-byte UTF-8 (Greek letters)
- 3-byte UTF-8 (Korean characters)
- 4-byte UTF-8
- Mixed ASCII + multi-byte

Co-authored-by: peter <[email protected]>
taearls pushed a commit to taearls/oxc that referenced this pull request Dec 11, 2025
### 💥 BREAKING CHANGES

- 74cf572 ast: [**BREAKING**] Make `source` field of `TSImportType` a
`StringLiteral` (oxc-project#16114) (copilot-swe-agent)
- 43156ae ast: [**BREAKING**] Rename `TSImportType` `argument` field to
`source` (oxc-project#16110) (overlookmotel)
- 934d873 napi: [**BREAKING**] Drop `armv7-unknown-linux-musleabihf`
support (oxc-project#16105) (Boshen)

### 🚀 Features

- 669afe0 ast: Add `Expression::is_jsx` method (oxc-project#16154) (Dunqing)
- 17a8caa parser: Add diagnostic for JSX identifiers with hyphens
(oxc-project#16133) (camchenry)
- 0549ae5 parser: Add diagnostic for expected ident after optional chain
(oxc-project#16132) (camchenry)
- db839ae parser: Improve diagnostic for unexpected optional
declarations (oxc-project#16131) (camchenry)
- bab4bc8 napi/parser: Add type annotations to parse-raw-worker test
(oxc-project#15998) (camc314)

### 🐛 Bug Fixes

- 35ed36c traverse: Fix panic when truncating non-ASCII variable names
(oxc-project#16265) (peter)
- 9149a26 linter/plugins, napi/parser: Deep freeze visitor keys (oxc-project#16293)
(overlookmotel)
- 6b54dab minifier: Incorrect non-null object condition simplification
with `&&` and `||` (oxc-project#16161) (sapphi-red)
- 9cc20a1 minifier: Avoid merging side effectful expressions to next
assignment statement if the side effect may change the left hand side
reference (oxc-project#16165) (sapphi-red)
- 91eb3f2 ast/estree: Convert `TSImportType` `argument` field to
`Literal` (oxc-project#16109) (overlookmotel)
- 1199cee parser: Reject invalid modifiers on parameter properties with
binding patterns (oxc-project#16083) (camc314)
- f376325 traverse: Remove `console.log` from build script (oxc-project#16049)
(overlookmotel)

### ⚡ Performance

- 82d784f lexer: Reduce bounds checks in `Lexer::get_string` (oxc-project#16317)
(overlookmotel)
- cc2f352 span: Add `#[inline]` to `Atom` methods (oxc-project#16311)
(overlookmotel)
- ffca070 span: Add `#[repr(transparent)]` to `Atom` (oxc-project#16310)
(overlookmotel)
- 02bdf90 linter/plugins, napi/parser: Reuse arrays in visitor keys
(oxc-project#16294) (overlookmotel)

### 📚 Documentation

- 891e0b4 parser: Add note about falling back to parse TSType in
TSImportType (oxc-project#16119) (camc314)

Co-authored-by: Boshen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C-bug Category - Bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants