More String escape sequence improvements (F#)#13257
Merged
cartermp merged 2 commits intodotnet:masterfrom Jul 5, 2019
Merged
Conversation
In my previous update I seem to have added an extra `0` to the `0010FFFF` at the end of the first paragraph in the **Remarks** section.
1. Added 5 missing sequences: `\a`, `\f`, `v`, `\x`, and `\DDD`. The following example code ran in LINQPad 5 (Language = "F# Program"):
```fsharp
printfn "Decimal (NOT Octal) \\DDD requires 3 digits: TAB\9TAB\09TAB\009TAB";
printfn "\\DDD notation is ISO-8859-1 (U+0000 - U+00FF): {\128-\129-\144-\152-\160-\161}";
printfn "CHAR for \\DDD = (DDD %% 256); Max = \\999 (U+00E7): {\365-\621-\6210-\176-\100-\999-\1000}";
printfn "---------------------";
printfn "\\x only works with two hex digits: TAB\x9TAB\x090TAB";
printfn "\\x is ISO-8859-1: 0x80 = \x80, 0x81 = \x81, 0x90 = \x90, 0x9A = \x9A, 0x9F = \x9F";
printfn "\\x is _not_ creating UTF-8: \xE0\xBC\x82"; // UTF-8 bytes for U+0F02
printfn "---------------------";printfn "Test \\a: \a";
printfn "Test \\f: \f";
printfn "Test \\v: \v";
```
It can also be found in the source code on GitHub:
* Defined here:
* https://github.com/dotnet/fsharp/blob/master/src/fsharp/lex.fsl#L209
* Processed here:
* https://github.com/dotnet/fsharp/blob/master/src/fsharp/lexhelp.fs#L142
* https://github.com/dotnet/fsharp/blob/master/src/fsharp/lexhelp.fs#L179
2. Broke `\u` and `\U` sequences into separate entries.
3. Provided range and an example for each Unicode character sequence.
4. Added "Important" note regarding `\DDD` being decimal, not octal, notation.
5. Added note regarding `\DDD` and `\xx` effectively being ISO-8859-1 (which is the first 256 Unicode code points)
6. **NOTE:** I change the `X`s into `H`s for the `\u` and `\U` sequences due to adding the `\x` sequence and not wanting to have `\xXX` as I feel that is less readable, and I wanted to be consistent between all of them regarding what represented a hex digit (and "H" meaning hex also helps distinguish it from the `D`s used for the newly added `\DDD` sequence).
Please see https://sqlquantumleap.com/2019/06/26/unicode-escape-sequences-across-various-languages-and-platforms-including-supplementary-characters/#fsharp for more details.
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
"Literals" page
In my previous update I seem to have added an extra
0to the0010FFFFat the end of the first paragraph in the Remarks section."Strings" page
Added 5 missing sequences:
\a,\f,v,\x, and\DDD. The following example code ran in LINQPad 5 (Language = "F# Program"):They can also be found in the source code on GitHub:
Broke
\uand\Usequences into separate entries.Provided range and an example for each Unicode character sequence.
Added "Important" note regarding
\DDDbeing decimal, not octal, notation.Added note regarding
\DDDand\xxeffectively being ISO-8859-1 (which is the first 256 Unicode code points), including a link to the WikiPedia article for ISO-8859-1.NOTE: I changed the
Xs intoHs for the\uand\Usequences due to adding the\xsequence and not wanting to have\xXXas I feel that is less readable, and I wanted to be consistent between all of them regarding what represented a hex digit (and "H" meaning hex also helps distinguish it from theDs used for the newly added\DDDsequence). If anyone feels strongly that it should remain asX, then it can be changed back.Please see Unicode Escape Sequences Across Various Languages and Platforms (including Supplementary Characters) for more details.