-
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
Output minimal strings by default #11384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Reading #4477, it seems that it was done not for compatibility with old engines but for non-utf8 encodings. Is it safe to output non-ASCII characters by default? cc @mathiasbynens (Sorry, you are my Unicode expert 😛) |
|
I think there's a bit of compatibility risk for lone surrogates, which aren't transformed into
In strings, yes. The risk is that users will have improperly setup servers (not serving with Also, we still retain the ability to output ASCII only, you just need to configure it. The default should probably match what the user typed. And for foreign language programmers, they're probably typing stings containing non-ASCII chars. Being able to actually read the output is probably going to much nicer trying to understand whatever |
|
Yep, @jridgewell nailed it. Note that folks who cannot alter their server configuration to add |
|
I think what you really want is to output non-ASCII symbols as-is unless they're lone surrogates or unless they're whitespace or non-printable. |
Is there a |
|
No, but we could add one :) |
|
Or, we could tweak the existing |
| jsescOption: { | ||
| quotes: "double", | ||
| wrap: true, | ||
| minimal: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that sometimes user may prefer to preserve the escaped form, especially for the space characters. i.e.
"\u{00a0}" // NBSP
"\u{2005}" // 1/4 em
I suggest we plug in the minimal option only when there is no escape sequence inside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it would be better to implement this in jsesc (#11384 (comment) / #11384 (comment)), since it would also benefit non-Babel users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're only able to detect what the user actually typed via the raw value, which is handled before printing the cooked value. If we only have the cooked value, we can only guess what the user intended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the whitespaces, I think we can add them to the whitelist of the minimal option.
Since it changes the default print behaviour, I consider this as a breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The raw value already captures this appropriately, though. We only run the cooked value through jsesc when there's not a raw value. At that point, we have no idea what the user actually wrote in the source file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. I do think we want to always escape things like NBSP though.
Maybe we should special-case all non-ASCII entries in require('unicode-13.0.0/General_Category/Space_Separator/symbols.js'); in jsesc’s minimal option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[ '0x20',
'0xa0',
'0x1680',
'0x2000',
'0x2001',
'0x2002',
'0x2003',
'0x2004',
'0x2005',
'0x2006',
'0x2007',
'0x2008',
'0x2009',
'0x200a',
'0x202f',
'0x205f',
'0x3000' ]
I think we should continue to print regular space. I could understand escaping the rest, though. @mathiasbynens, are you open to doing that in jsesc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed we wouldn't want to escape any ASCII spaces. We should take non-ASCII entries only.
Definitely open to a patch that does this in jsesc's minimal option. We could use src/data.js to export a list of special-cased symbols, based on reading require('unicode-13.0.0/General_Category/Space_Separator/symbols.js'); (or /code-points.js) and filtering out all ASCII entries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to v3.0.1, which now escapes non-ASCII whitespace.
|
|
||
| /*#__PURE__*/ | ||
| React.createElement("div", null, "w \xA0 w"); | ||
| React.createElement("div", null, "w w"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output became less useful in this case, IMHO.
|
|
||
| /*#__PURE__*/ | ||
| React.createElement("div", null, "this should parse as nbsp: \xA0 "); | ||
| React.createElement("div", null, "this should parse as nbsp: "); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
| jsescOption: { | ||
| quotes: "double", | ||
| wrap: true, | ||
| minimal: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. I do think we want to always escape things like NBSP though.
Maybe we should special-case all non-ASCII entries in require('unicode-13.0.0/General_Category/Space_Separator/symbols.js'); in jsesc’s minimal option.
|
|
7292b10 to
b1d27af
Compare
JLHwung
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it is a breaking change, can you rebase onto next-8-dev?
Though technically bumping jsesc to version 3 is also a breaking change. I think it is okay to split this change into a dedicated PR so we can merge it in next minor. AFAIK the non-minimal (defaults in babel v7) behaviour is unchanged between 2 and 3, isn't it?
Yes. But it might just be easier to land everything on next-8-dev. If we were to upgrade the v3 and a user were using |
This changes our behavior from always outputting ASCII-safe strings to outputting Unicode. It's smaller, supported since at least ES5, and maybe foreign language programmers will be less surprised with the output.
Co-Authored-By: Huáng Jùnliàng <[email protected]>
b1d27af to
f18297f
Compare
f18297f to
2433cc7
Compare
| recordAndTupleSyntaxType: opts.recordAndTupleSyntaxType, | ||
| }; | ||
|
|
||
| if (opts.jsonCompatibleStrings) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This option has been removed: #9958
* Output minimal strings by default This changes our behavior from always outputting ASCII-safe strings to outputting Unicode. It's smaller, supported since at least ES5, and maybe foreign language programmers will be less surprised with the output. * Update packages/babel-generator/src/index.js Co-Authored-By: Huáng Jùnliàng <[email protected]> * Escape lone surrogates until jsesc does it for us * Use regex's unicode mode * Upgrade jsesc to v3 * Update to jsesc 3.0.1 * Remove logic for removed option Co-authored-by: Huáng Jùnliàng <[email protected]> Co-authored-by: Nicolò Ribaudo <[email protected]>
* Output minimal strings by default This changes our behavior from always outputting ASCII-safe strings to outputting Unicode. It's smaller, supported since at least ES5, and maybe foreign language programmers will be less surprised with the output. * Update packages/babel-generator/src/index.js Co-Authored-By: Huáng Jùnliàng <[email protected]> * Escape lone surrogates until jsesc does it for us * Use regex's unicode mode * Upgrade jsesc to v3 * Update to jsesc 3.0.1 * Remove logic for removed option Co-authored-by: Huáng Jùnliàng <[email protected]> Co-authored-by: Nicolò Ribaudo <[email protected]>
* Output minimal strings by default This changes our behavior from always outputting ASCII-safe strings to outputting Unicode. It's smaller, supported since at least ES5, and maybe foreign language programmers will be less surprised with the output. * Update packages/babel-generator/src/index.js Co-Authored-By: Huáng Jùnliàng <[email protected]> * Escape lone surrogates until jsesc does it for us * Use regex's unicode mode * Upgrade jsesc to v3 * Update to jsesc 3.0.1 * Remove logic for removed option Co-authored-by: Huáng Jùnliàng <[email protected]> Co-authored-by: Nicolò Ribaudo <[email protected]>
* Output minimal strings by default This changes our behavior from always outputting ASCII-safe strings to outputting Unicode. It's smaller, supported since at least ES5, and maybe foreign language programmers will be less surprised with the output. * Update packages/babel-generator/src/index.js Co-Authored-By: Huáng Jùnliàng <[email protected]> * Escape lone surrogates until jsesc does it for us * Use regex's unicode mode * Upgrade jsesc to v3 * Update to jsesc 3.0.1 * Remove logic for removed option Co-authored-by: Huáng Jùnliàng <[email protected]> Co-authored-by: Nicolò Ribaudo <[email protected]>
* Output minimal strings by default This changes our behavior from always outputting ASCII-safe strings to outputting Unicode. It's smaller, supported since at least ES5, and maybe foreign language programmers will be less surprised with the output. * Update packages/babel-generator/src/index.js Co-Authored-By: Huáng Jùnliàng <[email protected]> * Escape lone surrogates until jsesc does it for us * Use regex's unicode mode * Upgrade jsesc to v3 * Update to jsesc 3.0.1 * Remove logic for removed option Co-authored-by: Huáng Jùnliàng <[email protected]> Co-authored-by: Nicolò Ribaudo <[email protected]>
* Output minimal strings by default This changes our behavior from always outputting ASCII-safe strings to outputting Unicode. It's smaller, supported since at least ES5, and maybe foreign language programmers will be less surprised with the output. * Update packages/babel-generator/src/index.js Co-Authored-By: Huáng Jùnliàng <[email protected]> * Escape lone surrogates until jsesc does it for us * Use regex's unicode mode * Upgrade jsesc to v3 * Update to jsesc 3.0.1 * Remove logic for removed option Co-authored-by: Huáng Jùnliàng <[email protected]> Co-authored-by: Nicolò Ribaudo <[email protected]>
* Output minimal strings by default This changes our behavior from always outputting ASCII-safe strings to outputting Unicode. It's smaller, supported since at least ES5, and maybe foreign language programmers will be less surprised with the output. * Update packages/babel-generator/src/index.js Co-Authored-By: Huáng Jùnliàng <[email protected]> * Escape lone surrogates until jsesc does it for us * Use regex's unicode mode * Upgrade jsesc to v3 * Update to jsesc 3.0.1 * Remove logic for removed option Co-authored-by: Huáng Jùnliàng <[email protected]> Co-authored-by: Nicolò Ribaudo <[email protected]>
* Output minimal strings by default This changes our behavior from always outputting ASCII-safe strings to outputting Unicode. It's smaller, supported since at least ES5, and maybe foreign language programmers will be less surprised with the output. * Update packages/babel-generator/src/index.js Co-Authored-By: Huáng Jùnliàng <[email protected]> * Escape lone surrogates until jsesc does it for us * Use regex's unicode mode * Upgrade jsesc to v3 * Update to jsesc 3.0.1 * Remove logic for removed option Co-authored-by: Huáng Jùnliàng <[email protected]> Co-authored-by: Nicolò Ribaudo <[email protected]>
* Output minimal strings by default This changes our behavior from always outputting ASCII-safe strings to outputting Unicode. It's smaller, supported since at least ES5, and maybe foreign language programmers will be less surprised with the output. * Update packages/babel-generator/src/index.js Co-Authored-By: Huáng Jùnliàng <[email protected]> * Escape lone surrogates until jsesc does it for us * Use regex's unicode mode * Upgrade jsesc to v3 * Update to jsesc 3.0.1 * Remove logic for removed option Co-authored-by: Huáng Jùnliàng <[email protected]> Co-authored-by: Nicolò Ribaudo <[email protected]>
* Output minimal strings by default This changes our behavior from always outputting ASCII-safe strings to outputting Unicode. It's smaller, supported since at least ES5, and maybe foreign language programmers will be less surprised with the output. * Update packages/babel-generator/src/index.js Co-Authored-By: Huáng Jùnliàng <[email protected]> * Escape lone surrogates until jsesc does it for us * Use regex's unicode mode * Upgrade jsesc to v3 * Update to jsesc 3.0.1 * Remove logic for removed option Co-authored-by: Huáng Jùnliàng <[email protected]> Co-authored-by: Nicolò Ribaudo <[email protected]>
* Output minimal strings by default This changes our behavior from always outputting ASCII-safe strings to outputting Unicode. It's smaller, supported since at least ES5, and maybe foreign language programmers will be less surprised with the output. * Update packages/babel-generator/src/index.js Co-Authored-By: Huáng Jùnliàng <[email protected]> * Escape lone surrogates until jsesc does it for us * Use regex's unicode mode * Upgrade jsesc to v3 * Update to jsesc 3.0.1 * Remove logic for removed option Co-authored-by: Huáng Jùnliàng <[email protected]> Co-authored-by: Nicolò Ribaudo <[email protected]>
* Output minimal strings by default This changes our behavior from always outputting ASCII-safe strings to outputting Unicode. It's smaller, supported since at least ES5, and maybe foreign language programmers will be less surprised with the output. * Update packages/babel-generator/src/index.js Co-Authored-By: Huáng Jùnliàng <[email protected]> * Escape lone surrogates until jsesc does it for us * Use regex's unicode mode * Upgrade jsesc to v3 * Update to jsesc 3.0.1 * Remove logic for removed option Co-authored-by: Huáng Jùnliàng <[email protected]> Co-authored-by: Nicolò Ribaudo <[email protected]>
* Output minimal strings by default This changes our behavior from always outputting ASCII-safe strings to outputting Unicode. It's smaller, supported since at least ES5, and maybe foreign language programmers will be less surprised with the output. * Update packages/babel-generator/src/index.js Co-Authored-By: Huáng Jùnliàng <[email protected]> * Escape lone surrogates until jsesc does it for us * Use regex's unicode mode * Upgrade jsesc to v3 * Update to jsesc 3.0.1 * Remove logic for removed option Co-authored-by: Huáng Jùnliàng <[email protected]> Co-authored-by: Nicolò Ribaudo <[email protected]>
This changes our behavior from always outputting ASCII-safe strings to outputting Unicode. It's smaller, supported since at least ES5, and maybe foreign language programmers will be less surprised with the output.
Maybe we want to add a simple
asciioption to toggle the behavior back?