Skip to content

feat(parser): detect binary files with TS1490 error#19047

Merged
graphite-app[bot] merged 1 commit intomainfrom
feat/binary-file-detection
Feb 6, 2026
Merged

feat(parser): detect binary files with TS1490 error#19047
graphite-app[bot] merged 1 commit intomainfrom
feat/binary-file-detection

Conversation

@Boshen
Copy link
Member

@Boshen Boshen commented Feb 6, 2026

Summary

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings February 6, 2026 09:46
@github-actions github-actions bot added A-parser Area - Parser C-enhancement Category - New feature or request labels Feb 6, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR ports TypeScript's binary file detection feature (TS1490 error) to the parser. When the scanner encounters U+FFFD (Unicode replacement character) as a standalone token, it emits a "File appears to be binary" error and stops parsing. The U+FFFD character typically appears when binary files are incorrectly decoded as UTF-8. The implementation correctly ensures that U+FFFD characters inside strings, comments, and template literals are not affected by this detection.

Changes:

  • Added TS1490 diagnostic for binary file detection
  • Modified lexer's Unicode character handler to detect standalone U+FFFD characters
  • Added test coverage for binary file detection and U+FFFD in strings
  • Updated snapshot to reflect new error message

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 1 comment.

File Description
crates/oxc_parser/src/diagnostics.rs Adds file_appears_to_be_binary() diagnostic function that creates a TS1490 error
crates/oxc_parser/src/lexer/unicode.rs Adds binary file detection in unicode_char_handler() when U+FFFD appears as standalone token
crates/oxc_parser/src/lib.rs Adds test cases for binary file detection and U+FFFD in strings
tasks/coverage/snapshots/parser_typescript.snap Updates error message from "Invalid Character" to "File appears to be binary" for corrupted.ts test case

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codspeed-hq
Copy link

codspeed-hq bot commented Feb 6, 2026

Merging this PR will not alter performance

✅ 47 untouched benchmarks
⏩ 3 skipped benchmarks1


Comparing feat/binary-file-detection (54b409b) with main (3aeba7a)2

Open in CodSpeed

Footnotes

  1. 3 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

  2. No successful run was found on main (776eb19) during the generation of this report, so 3aeba7a was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

@Boshen Boshen added the 0-merge Merge with Graphite Merge Queue label Feb 6, 2026
Copy link
Member Author

Boshen commented Feb 6, 2026

Merge activity

  • Feb 6, 10:02 AM UTC: The merge label '0-merge' was detected. This PR will be added to the Graphite merge queue once it meets the requirements.
  • Feb 6, 10:02 AM UTC: This pull request can not be added to the Graphite merge queue. Please try rebasing and resubmitting to merge when ready.
  • Feb 6, 10:02 AM UTC: Graphite disabled "merge when ready" on this PR due to: a merge conflict with the target branch; resolve the conflict and try again..
  • Feb 6, 10:02 AM UTC: The merge label '0-merge' was detected. This PR will be added to the Graphite merge queue once it meets the requirements.
  • Feb 6, 10:26 AM UTC: Boshen added this pull request to the Graphite merge queue.
  • Feb 6, 10:33 AM UTC: Merged by the Graphite merge queue.

@Boshen Boshen force-pushed the feat/binary-file-detection branch from 02ec82e to 4c320a1 Compare February 6, 2026 10:19
## Summary

- Port TypeScript's binary file detection (`TS1490: File appears to be binary.`)
- When the scanner encounters U+FFFD (replacement character) as a standalone token, emit the error and stop parsing
- U+FFFD inside strings, comments, and templates is unaffected
- Reference: https://github.com/microsoft/TypeScript/blob/main/src/compiler/scanner.ts

🤖 Generated with [Claude Code](https://claude.com/claude-code)
@graphite-app graphite-app bot force-pushed the feat/binary-file-detection branch from 4c320a1 to 142a1be Compare February 6, 2026 10:26
@graphite-app graphite-app bot merged commit 142a1be into main Feb 6, 2026
22 checks passed
@graphite-app graphite-app bot deleted the feat/binary-file-detection branch February 6, 2026 10:33
@graphite-app graphite-app bot removed the 0-merge Merge with Graphite Merge Queue label Feb 6, 2026
camc314 added a commit that referenced this pull request Feb 10, 2026
### 💥 BREAKING CHANGES

- 2bf7293 mangler: [**BREAKING**] Enable `top_level` by default for
modules and commonjs (#18278) (sapphi-red)
- 48b0542 span: [**BREAKING**] SourceType::ts should set module to
unambigious (#18873) (Boshen)

### 🚀 Features

- 500d071 minifier: Local traverse ctx and generated minifier traverse
(#19106) (Boshen)
- 142a1be parser: Detect binary files with TS1490 error (#19047)
(Boshen)
- e316857 allocator/bitset: Add `Ones` iterator to `BitSet` (#19027)
(sapphi-red)
- 742ad3f minifier: Default `invalid_import_side_effects` to `false`
(#18916) (sapphi-red)
- 0eff6be parser: Error JSX-like type assertions and generics in
`.mts`/`.cts` (#18910) (Boshen)
- 18320c6 span: Store file extension in `SourceType` (#18893) (Boshen)

### 🐛 Bug Fixes

- a7514e4 isolated-declarations: Preserve const context in literal type
inference (#19178) (camc314)
- 312e756 isolated-declarations: Preserve readonly literal initializers
(#19177) (camc314)
- d0ca8d0 isolated-declarations: Skip parenthesis when inferring type
(#19176) (camc314)
- 110c300 oxc_ecmascript: `+[false]` and `+[true]` should evaluate to
`NaN` (#19174) (copilot-swe-agent)
- f32ea19 semantic: Report redeclaration error for import bindings
conflicting with value declarations (#19068) (Boshen)
- 3aeba7a semantic: Report redeclaration error for `function a() {} var
a` in module mode (#19041) (Boshen)
- 35e32c6 coverage: Match Babel's options.json inheritance for test
fixtures (#19002) (Boshen)
- 463d60d semantic: Skip TS2391 for standalone computed-name class
methods (#19025) (Boshen)
- 56c086b parser: Add modifier ordering validation (TS1029) (#19024)
(Boshen)
- 6067a49 linter/jsdoc: False positive in `check-tag-names` for `@` in
email addresses and npm scopes (#19021) (Boshen)
- b13bb70 semantic/jsdoc: Inline tags like `{@link}` break jsdoc parsing
(#19019) (Boshen)
- e3609e3 regular_expression: Preserve UnicodeEscape CharacterKind in
string literals (#18998) (Boshen)
- 57917ee parser: Parse decorators on rest parameters (#18938) (Boshen)
- 487601b napi: Disable mimalloc on Windows to fix worker_threads crash
(#18923) (Boshen)
- 1f6b193 parser: Validate TypeScript import type options (#18889)
(Boshen)
- 1663184 parser: Allow conditional types in function type parameters
(#18886) (Boshen)
- 5758046 parser: Error on property access after instantiation
expression (#18887) (Boshen)
- 5eb4a94 parser: Handle `<<` as two `<` tokens in type argument
contexts (#18885) (Boshen)

### ⚡ Performance

- ed8c054 oxc_str: Add precomputed hash to Ident for fast HashMap
lookups (#19143) (Boshen)
- d4a0867 transformer_plugins: Switch ReplaceGlobalDefines from Traverse
to VisitMut (#19146) (Boshen)
- 9eb16b3 syntax: Pack ASCII identifier tables into single bitflag table
(#19088) (Boshen)
- e7595d1 mangler: Use BitSet for exported symbols set (#19023)
(sapphi-red)
- 2537924 semantic: Optimize scope resolution with fast paths and
inlining (#19029) (Boshen)
- 69a8d85 mangler: Use BitSet for keep_names symbols set (#19028)
(sapphi-red)
- f78c525 parser: Try hybrid parsing for jsx children and closing
element/fragments (#18789) (camchenry)

Co-authored-by: camc314 <[email protected]>
owjs3901 pushed a commit to owjs3901/oxc that referenced this pull request Feb 11, 2026
### 💥 BREAKING CHANGES

- 2bf7293 mangler: [**BREAKING**] Enable `top_level` by default for
modules and commonjs (oxc-project#18278) (sapphi-red)
- 48b0542 span: [**BREAKING**] SourceType::ts should set module to
unambigious (oxc-project#18873) (Boshen)

### 🚀 Features

- 500d071 minifier: Local traverse ctx and generated minifier traverse
(oxc-project#19106) (Boshen)
- 142a1be parser: Detect binary files with TS1490 error (oxc-project#19047)
(Boshen)
- e316857 allocator/bitset: Add `Ones` iterator to `BitSet` (oxc-project#19027)
(sapphi-red)
- 742ad3f minifier: Default `invalid_import_side_effects` to `false`
(oxc-project#18916) (sapphi-red)
- 0eff6be parser: Error JSX-like type assertions and generics in
`.mts`/`.cts` (oxc-project#18910) (Boshen)
- 18320c6 span: Store file extension in `SourceType` (oxc-project#18893) (Boshen)

### 🐛 Bug Fixes

- a7514e4 isolated-declarations: Preserve const context in literal type
inference (oxc-project#19178) (camc314)
- 312e756 isolated-declarations: Preserve readonly literal initializers
(oxc-project#19177) (camc314)
- d0ca8d0 isolated-declarations: Skip parenthesis when inferring type
(oxc-project#19176) (camc314)
- 110c300 oxc_ecmascript: `+[false]` and `+[true]` should evaluate to
`NaN` (oxc-project#19174) (copilot-swe-agent)
- f32ea19 semantic: Report redeclaration error for import bindings
conflicting with value declarations (oxc-project#19068) (Boshen)
- 3aeba7a semantic: Report redeclaration error for `function a() {} var
a` in module mode (oxc-project#19041) (Boshen)
- 35e32c6 coverage: Match Babel's options.json inheritance for test
fixtures (oxc-project#19002) (Boshen)
- 463d60d semantic: Skip TS2391 for standalone computed-name class
methods (oxc-project#19025) (Boshen)
- 56c086b parser: Add modifier ordering validation (TS1029) (oxc-project#19024)
(Boshen)
- 6067a49 linter/jsdoc: False positive in `check-tag-names` for `@` in
email addresses and npm scopes (oxc-project#19021) (Boshen)
- b13bb70 semantic/jsdoc: Inline tags like `{@link}` break jsdoc parsing
(oxc-project#19019) (Boshen)
- e3609e3 regular_expression: Preserve UnicodeEscape CharacterKind in
string literals (oxc-project#18998) (Boshen)
- 57917ee parser: Parse decorators on rest parameters (oxc-project#18938) (Boshen)
- 487601b napi: Disable mimalloc on Windows to fix worker_threads crash
(oxc-project#18923) (Boshen)
- 1f6b193 parser: Validate TypeScript import type options (oxc-project#18889)
(Boshen)
- 1663184 parser: Allow conditional types in function type parameters
(oxc-project#18886) (Boshen)
- 5758046 parser: Error on property access after instantiation
expression (oxc-project#18887) (Boshen)
- 5eb4a94 parser: Handle `<<` as two `<` tokens in type argument
contexts (oxc-project#18885) (Boshen)

### ⚡ Performance

- ed8c054 oxc_str: Add precomputed hash to Ident for fast HashMap
lookups (oxc-project#19143) (Boshen)
- d4a0867 transformer_plugins: Switch ReplaceGlobalDefines from Traverse
to VisitMut (oxc-project#19146) (Boshen)
- 9eb16b3 syntax: Pack ASCII identifier tables into single bitflag table
(oxc-project#19088) (Boshen)
- e7595d1 mangler: Use BitSet for exported symbols set (oxc-project#19023)
(sapphi-red)
- 2537924 semantic: Optimize scope resolution with fast paths and
inlining (oxc-project#19029) (Boshen)
- 69a8d85 mangler: Use BitSet for keep_names symbols set (oxc-project#19028)
(sapphi-red)
- f78c525 parser: Try hybrid parsing for jsx children and closing
element/fragments (oxc-project#18789) (camchenry)

Co-authored-by: camc314 <[email protected]>
OskarLebuda pushed a commit to OskarLebuda/oxc that referenced this pull request Feb 17, 2026
## Summary

- Port TypeScript's binary file detection (`TS1490: File appears to be binary.`)
- When the scanner encounters U+FFFD (replacement character) as a standalone token, emit the error and stop parsing
- U+FFFD inside strings, comments, and templates is unaffected
- Reference: https://github.com/microsoft/TypeScript/blob/main/src/compiler/scanner.ts

🤖 Generated with [Claude Code](https://claude.com/claude-code)
OskarLebuda pushed a commit to OskarLebuda/oxc that referenced this pull request Feb 17, 2026
### 💥 BREAKING CHANGES

- 2bf7293 mangler: [**BREAKING**] Enable `top_level` by default for
modules and commonjs (oxc-project#18278) (sapphi-red)
- 48b0542 span: [**BREAKING**] SourceType::ts should set module to
unambigious (oxc-project#18873) (Boshen)

### 🚀 Features

- 500d071 minifier: Local traverse ctx and generated minifier traverse
(oxc-project#19106) (Boshen)
- 142a1be parser: Detect binary files with TS1490 error (oxc-project#19047)
(Boshen)
- e316857 allocator/bitset: Add `Ones` iterator to `BitSet` (oxc-project#19027)
(sapphi-red)
- 742ad3f minifier: Default `invalid_import_side_effects` to `false`
(oxc-project#18916) (sapphi-red)
- 0eff6be parser: Error JSX-like type assertions and generics in
`.mts`/`.cts` (oxc-project#18910) (Boshen)
- 18320c6 span: Store file extension in `SourceType` (oxc-project#18893) (Boshen)

### 🐛 Bug Fixes

- a7514e4 isolated-declarations: Preserve const context in literal type
inference (oxc-project#19178) (camc314)
- 312e756 isolated-declarations: Preserve readonly literal initializers
(oxc-project#19177) (camc314)
- d0ca8d0 isolated-declarations: Skip parenthesis when inferring type
(oxc-project#19176) (camc314)
- 110c300 oxc_ecmascript: `+[false]` and `+[true]` should evaluate to
`NaN` (oxc-project#19174) (copilot-swe-agent)
- f32ea19 semantic: Report redeclaration error for import bindings
conflicting with value declarations (oxc-project#19068) (Boshen)
- 3aeba7a semantic: Report redeclaration error for `function a() {} var
a` in module mode (oxc-project#19041) (Boshen)
- 35e32c6 coverage: Match Babel's options.json inheritance for test
fixtures (oxc-project#19002) (Boshen)
- 463d60d semantic: Skip TS2391 for standalone computed-name class
methods (oxc-project#19025) (Boshen)
- 56c086b parser: Add modifier ordering validation (TS1029) (oxc-project#19024)
(Boshen)
- 6067a49 linter/jsdoc: False positive in `check-tag-names` for `@` in
email addresses and npm scopes (oxc-project#19021) (Boshen)
- b13bb70 semantic/jsdoc: Inline tags like `{@link}` break jsdoc parsing
(oxc-project#19019) (Boshen)
- e3609e3 regular_expression: Preserve UnicodeEscape CharacterKind in
string literals (oxc-project#18998) (Boshen)
- 57917ee parser: Parse decorators on rest parameters (oxc-project#18938) (Boshen)
- 487601b napi: Disable mimalloc on Windows to fix worker_threads crash
(oxc-project#18923) (Boshen)
- 1f6b193 parser: Validate TypeScript import type options (oxc-project#18889)
(Boshen)
- 1663184 parser: Allow conditional types in function type parameters
(oxc-project#18886) (Boshen)
- 5758046 parser: Error on property access after instantiation
expression (oxc-project#18887) (Boshen)
- 5eb4a94 parser: Handle `<<` as two `<` tokens in type argument
contexts (oxc-project#18885) (Boshen)

### ⚡ Performance

- ed8c054 oxc_str: Add precomputed hash to Ident for fast HashMap
lookups (oxc-project#19143) (Boshen)
- d4a0867 transformer_plugins: Switch ReplaceGlobalDefines from Traverse
to VisitMut (oxc-project#19146) (Boshen)
- 9eb16b3 syntax: Pack ASCII identifier tables into single bitflag table
(oxc-project#19088) (Boshen)
- e7595d1 mangler: Use BitSet for exported symbols set (oxc-project#19023)
(sapphi-red)
- 2537924 semantic: Optimize scope resolution with fast paths and
inlining (oxc-project#19029) (Boshen)
- 69a8d85 mangler: Use BitSet for keep_names symbols set (oxc-project#19028)
(sapphi-red)
- f78c525 parser: Try hybrid parsing for jsx children and closing
element/fragments (oxc-project#18789) (camchenry)

Co-authored-by: camc314 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-parser Area - Parser C-enhancement Category - New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments