feat(parser): detect binary files with TS1490 error#19047
feat(parser): detect binary files with TS1490 error#19047graphite-app[bot] merged 1 commit intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR ports TypeScript's binary file detection feature (TS1490 error) to the parser. When the scanner encounters U+FFFD (Unicode replacement character) as a standalone token, it emits a "File appears to be binary" error and stops parsing. The U+FFFD character typically appears when binary files are incorrectly decoded as UTF-8. The implementation correctly ensures that U+FFFD characters inside strings, comments, and template literals are not affected by this detection.
Changes:
- Added TS1490 diagnostic for binary file detection
- Modified lexer's Unicode character handler to detect standalone U+FFFD characters
- Added test coverage for binary file detection and U+FFFD in strings
- Updated snapshot to reflect new error message
Reviewed changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| crates/oxc_parser/src/diagnostics.rs | Adds file_appears_to_be_binary() diagnostic function that creates a TS1490 error |
| crates/oxc_parser/src/lexer/unicode.rs | Adds binary file detection in unicode_char_handler() when U+FFFD appears as standalone token |
| crates/oxc_parser/src/lib.rs | Adds test cases for binary file detection and U+FFFD in strings |
| tasks/coverage/snapshots/parser_typescript.snap | Updates error message from "Invalid Character" to "File appears to be binary" for corrupted.ts test case |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Merging this PR will not alter performance
Comparing Footnotes
|
Merge activity
|
02ec82e to
4c320a1
Compare
## Summary - Port TypeScript's binary file detection (`TS1490: File appears to be binary.`) - When the scanner encounters U+FFFD (replacement character) as a standalone token, emit the error and stop parsing - U+FFFD inside strings, comments, and templates is unaffected - Reference: https://github.com/microsoft/TypeScript/blob/main/src/compiler/scanner.ts 🤖 Generated with [Claude Code](https://claude.com/claude-code)
4c320a1 to
142a1be
Compare
### 💥 BREAKING CHANGES - 2bf7293 mangler: [**BREAKING**] Enable `top_level` by default for modules and commonjs (#18278) (sapphi-red) - 48b0542 span: [**BREAKING**] SourceType::ts should set module to unambigious (#18873) (Boshen) ### 🚀 Features - 500d071 minifier: Local traverse ctx and generated minifier traverse (#19106) (Boshen) - 142a1be parser: Detect binary files with TS1490 error (#19047) (Boshen) - e316857 allocator/bitset: Add `Ones` iterator to `BitSet` (#19027) (sapphi-red) - 742ad3f minifier: Default `invalid_import_side_effects` to `false` (#18916) (sapphi-red) - 0eff6be parser: Error JSX-like type assertions and generics in `.mts`/`.cts` (#18910) (Boshen) - 18320c6 span: Store file extension in `SourceType` (#18893) (Boshen) ### 🐛 Bug Fixes - a7514e4 isolated-declarations: Preserve const context in literal type inference (#19178) (camc314) - 312e756 isolated-declarations: Preserve readonly literal initializers (#19177) (camc314) - d0ca8d0 isolated-declarations: Skip parenthesis when inferring type (#19176) (camc314) - 110c300 oxc_ecmascript: `+[false]` and `+[true]` should evaluate to `NaN` (#19174) (copilot-swe-agent) - f32ea19 semantic: Report redeclaration error for import bindings conflicting with value declarations (#19068) (Boshen) - 3aeba7a semantic: Report redeclaration error for `function a() {} var a` in module mode (#19041) (Boshen) - 35e32c6 coverage: Match Babel's options.json inheritance for test fixtures (#19002) (Boshen) - 463d60d semantic: Skip TS2391 for standalone computed-name class methods (#19025) (Boshen) - 56c086b parser: Add modifier ordering validation (TS1029) (#19024) (Boshen) - 6067a49 linter/jsdoc: False positive in `check-tag-names` for `@` in email addresses and npm scopes (#19021) (Boshen) - b13bb70 semantic/jsdoc: Inline tags like `{@link}` break jsdoc parsing (#19019) (Boshen) - e3609e3 regular_expression: Preserve UnicodeEscape CharacterKind in string literals (#18998) (Boshen) - 57917ee parser: Parse decorators on rest parameters (#18938) (Boshen) - 487601b napi: Disable mimalloc on Windows to fix worker_threads crash (#18923) (Boshen) - 1f6b193 parser: Validate TypeScript import type options (#18889) (Boshen) - 1663184 parser: Allow conditional types in function type parameters (#18886) (Boshen) - 5758046 parser: Error on property access after instantiation expression (#18887) (Boshen) - 5eb4a94 parser: Handle `<<` as two `<` tokens in type argument contexts (#18885) (Boshen) ### ⚡ Performance - ed8c054 oxc_str: Add precomputed hash to Ident for fast HashMap lookups (#19143) (Boshen) - d4a0867 transformer_plugins: Switch ReplaceGlobalDefines from Traverse to VisitMut (#19146) (Boshen) - 9eb16b3 syntax: Pack ASCII identifier tables into single bitflag table (#19088) (Boshen) - e7595d1 mangler: Use BitSet for exported symbols set (#19023) (sapphi-red) - 2537924 semantic: Optimize scope resolution with fast paths and inlining (#19029) (Boshen) - 69a8d85 mangler: Use BitSet for keep_names symbols set (#19028) (sapphi-red) - f78c525 parser: Try hybrid parsing for jsx children and closing element/fragments (#18789) (camchenry) Co-authored-by: camc314 <[email protected]>
### 💥 BREAKING CHANGES - 2bf7293 mangler: [**BREAKING**] Enable `top_level` by default for modules and commonjs (oxc-project#18278) (sapphi-red) - 48b0542 span: [**BREAKING**] SourceType::ts should set module to unambigious (oxc-project#18873) (Boshen) ### 🚀 Features - 500d071 minifier: Local traverse ctx and generated minifier traverse (oxc-project#19106) (Boshen) - 142a1be parser: Detect binary files with TS1490 error (oxc-project#19047) (Boshen) - e316857 allocator/bitset: Add `Ones` iterator to `BitSet` (oxc-project#19027) (sapphi-red) - 742ad3f minifier: Default `invalid_import_side_effects` to `false` (oxc-project#18916) (sapphi-red) - 0eff6be parser: Error JSX-like type assertions and generics in `.mts`/`.cts` (oxc-project#18910) (Boshen) - 18320c6 span: Store file extension in `SourceType` (oxc-project#18893) (Boshen) ### 🐛 Bug Fixes - a7514e4 isolated-declarations: Preserve const context in literal type inference (oxc-project#19178) (camc314) - 312e756 isolated-declarations: Preserve readonly literal initializers (oxc-project#19177) (camc314) - d0ca8d0 isolated-declarations: Skip parenthesis when inferring type (oxc-project#19176) (camc314) - 110c300 oxc_ecmascript: `+[false]` and `+[true]` should evaluate to `NaN` (oxc-project#19174) (copilot-swe-agent) - f32ea19 semantic: Report redeclaration error for import bindings conflicting with value declarations (oxc-project#19068) (Boshen) - 3aeba7a semantic: Report redeclaration error for `function a() {} var a` in module mode (oxc-project#19041) (Boshen) - 35e32c6 coverage: Match Babel's options.json inheritance for test fixtures (oxc-project#19002) (Boshen) - 463d60d semantic: Skip TS2391 for standalone computed-name class methods (oxc-project#19025) (Boshen) - 56c086b parser: Add modifier ordering validation (TS1029) (oxc-project#19024) (Boshen) - 6067a49 linter/jsdoc: False positive in `check-tag-names` for `@` in email addresses and npm scopes (oxc-project#19021) (Boshen) - b13bb70 semantic/jsdoc: Inline tags like `{@link}` break jsdoc parsing (oxc-project#19019) (Boshen) - e3609e3 regular_expression: Preserve UnicodeEscape CharacterKind in string literals (oxc-project#18998) (Boshen) - 57917ee parser: Parse decorators on rest parameters (oxc-project#18938) (Boshen) - 487601b napi: Disable mimalloc on Windows to fix worker_threads crash (oxc-project#18923) (Boshen) - 1f6b193 parser: Validate TypeScript import type options (oxc-project#18889) (Boshen) - 1663184 parser: Allow conditional types in function type parameters (oxc-project#18886) (Boshen) - 5758046 parser: Error on property access after instantiation expression (oxc-project#18887) (Boshen) - 5eb4a94 parser: Handle `<<` as two `<` tokens in type argument contexts (oxc-project#18885) (Boshen) ### ⚡ Performance - ed8c054 oxc_str: Add precomputed hash to Ident for fast HashMap lookups (oxc-project#19143) (Boshen) - d4a0867 transformer_plugins: Switch ReplaceGlobalDefines from Traverse to VisitMut (oxc-project#19146) (Boshen) - 9eb16b3 syntax: Pack ASCII identifier tables into single bitflag table (oxc-project#19088) (Boshen) - e7595d1 mangler: Use BitSet for exported symbols set (oxc-project#19023) (sapphi-red) - 2537924 semantic: Optimize scope resolution with fast paths and inlining (oxc-project#19029) (Boshen) - 69a8d85 mangler: Use BitSet for keep_names symbols set (oxc-project#19028) (sapphi-red) - f78c525 parser: Try hybrid parsing for jsx children and closing element/fragments (oxc-project#18789) (camchenry) Co-authored-by: camc314 <[email protected]>
## Summary - Port TypeScript's binary file detection (`TS1490: File appears to be binary.`) - When the scanner encounters U+FFFD (replacement character) as a standalone token, emit the error and stop parsing - U+FFFD inside strings, comments, and templates is unaffected - Reference: https://github.com/microsoft/TypeScript/blob/main/src/compiler/scanner.ts 🤖 Generated with [Claude Code](https://claude.com/claude-code)
### 💥 BREAKING CHANGES - 2bf7293 mangler: [**BREAKING**] Enable `top_level` by default for modules and commonjs (oxc-project#18278) (sapphi-red) - 48b0542 span: [**BREAKING**] SourceType::ts should set module to unambigious (oxc-project#18873) (Boshen) ### 🚀 Features - 500d071 minifier: Local traverse ctx and generated minifier traverse (oxc-project#19106) (Boshen) - 142a1be parser: Detect binary files with TS1490 error (oxc-project#19047) (Boshen) - e316857 allocator/bitset: Add `Ones` iterator to `BitSet` (oxc-project#19027) (sapphi-red) - 742ad3f minifier: Default `invalid_import_side_effects` to `false` (oxc-project#18916) (sapphi-red) - 0eff6be parser: Error JSX-like type assertions and generics in `.mts`/`.cts` (oxc-project#18910) (Boshen) - 18320c6 span: Store file extension in `SourceType` (oxc-project#18893) (Boshen) ### 🐛 Bug Fixes - a7514e4 isolated-declarations: Preserve const context in literal type inference (oxc-project#19178) (camc314) - 312e756 isolated-declarations: Preserve readonly literal initializers (oxc-project#19177) (camc314) - d0ca8d0 isolated-declarations: Skip parenthesis when inferring type (oxc-project#19176) (camc314) - 110c300 oxc_ecmascript: `+[false]` and `+[true]` should evaluate to `NaN` (oxc-project#19174) (copilot-swe-agent) - f32ea19 semantic: Report redeclaration error for import bindings conflicting with value declarations (oxc-project#19068) (Boshen) - 3aeba7a semantic: Report redeclaration error for `function a() {} var a` in module mode (oxc-project#19041) (Boshen) - 35e32c6 coverage: Match Babel's options.json inheritance for test fixtures (oxc-project#19002) (Boshen) - 463d60d semantic: Skip TS2391 for standalone computed-name class methods (oxc-project#19025) (Boshen) - 56c086b parser: Add modifier ordering validation (TS1029) (oxc-project#19024) (Boshen) - 6067a49 linter/jsdoc: False positive in `check-tag-names` for `@` in email addresses and npm scopes (oxc-project#19021) (Boshen) - b13bb70 semantic/jsdoc: Inline tags like `{@link}` break jsdoc parsing (oxc-project#19019) (Boshen) - e3609e3 regular_expression: Preserve UnicodeEscape CharacterKind in string literals (oxc-project#18998) (Boshen) - 57917ee parser: Parse decorators on rest parameters (oxc-project#18938) (Boshen) - 487601b napi: Disable mimalloc on Windows to fix worker_threads crash (oxc-project#18923) (Boshen) - 1f6b193 parser: Validate TypeScript import type options (oxc-project#18889) (Boshen) - 1663184 parser: Allow conditional types in function type parameters (oxc-project#18886) (Boshen) - 5758046 parser: Error on property access after instantiation expression (oxc-project#18887) (Boshen) - 5eb4a94 parser: Handle `<<` as two `<` tokens in type argument contexts (oxc-project#18885) (Boshen) ### ⚡ Performance - ed8c054 oxc_str: Add precomputed hash to Ident for fast HashMap lookups (oxc-project#19143) (Boshen) - d4a0867 transformer_plugins: Switch ReplaceGlobalDefines from Traverse to VisitMut (oxc-project#19146) (Boshen) - 9eb16b3 syntax: Pack ASCII identifier tables into single bitflag table (oxc-project#19088) (Boshen) - e7595d1 mangler: Use BitSet for exported symbols set (oxc-project#19023) (sapphi-red) - 2537924 semantic: Optimize scope resolution with fast paths and inlining (oxc-project#19029) (Boshen) - 69a8d85 mangler: Use BitSet for keep_names symbols set (oxc-project#19028) (sapphi-red) - f78c525 parser: Try hybrid parsing for jsx children and closing element/fragments (oxc-project#18789) (camchenry) Co-authored-by: camc314 <[email protected]>
Summary
TS1490: File appears to be binary.)🤖 Generated with Claude Code