System.Text.Rune ref APIs and unit tests#33395
System.Text.Rune ref APIs and unit tests#33395GrabYourPitchforks merged 3 commits intodotnet:masterfrom
Conversation
| public static bool TryCreate(uint value, out Rune result) { throw null; } | ||
| public bool TryEncode(Span<char> destination, out int charsWritten) { throw null; } | ||
| public static bool TryGetRuneAt(string input, int index, out Rune value) { throw null; } | ||
| public static double GetNumericValue(Rune value) { throw null; } |
There was a problem hiding this comment.
Does returning double enable any specific use-cases that e.g. long does not?
There was a problem hiding this comment.
Unicode defines fractional values, such as U+00BD (½) for which a non-integral value is returned.
There was a problem hiding this comment.
We pull the data from https://www.unicode.org/Public/UCD/latest/ucd/extracted/DerivedNumericValues.txt.
Our test file has a specific test for the U+00BD case:
yield return new object[] { new UnicodeInfoTestData { ScalarValue = (Rune)(0xBD), UnicodeCategory = UnicodeCategory.OtherNumber, NumericValue = 0.5, IsControl = false, IsDigit = false, IsLetter = false, IsLetterOrDigit = false, IsLower = false, IsNumber = true, IsPunctuation = false, IsSeparator = false, IsSymbol = false, IsUpper = false, IsWhiteSpace = false } };There was a problem hiding this comment.
It gets even more fun, since some scalars can correspond to negative numeric values! :D
For example, U+0F33 TIBETAN DIGIT HALF ZERO ('༳') has a numeric value of -0.5.
There was a problem hiding this comment.
Interesting. Is this API as generally useful as to be exposed directly on the Rune type?
There was a problem hiding this comment.
To be honest I don't personally have a valid use case for it. It's there for parity with System.Char and System.Text.CharUnicodeInfo.
| FormKC = 5, | ||
| FormKD = 6, | ||
| } | ||
| public readonly struct Rune : IComparable<Rune>, IEquatable<Rune> |
There was a problem hiding this comment.
See if you can use the ref generation tool to auto-gen the reference so it has correct sort ordering and fully qualified type names (like System.IEquatable<Rune>).
msbuild /t:GenerateReferenceSource
However, for System.Runtime, there are some warts with the auto-gen tool, so you may have to undo some of the changes that it makes (to other types).
There was a problem hiding this comment.
There's no documentation for how to run that tool or what parameters to give it. If somebody creates docs I can run it. Otherwise I'll stick with doing this by hand because at least I know by hand works. :)
There was a problem hiding this comment.
There's no documentation for how to run that tool or what parameters to give it.
Agreed. Here is how I tend to do it (for System.Runtime):
- Build coreclr release
- Build corefx release with coreclr bits
- Run
msbuild /t:GenerateReferenceSource /p:ConfigurationGroup=Releasefrom the System.Runtime/ref directory - Filter out all unrelated changes and extract the changes you care about (ignore certain attributes being removed).
Updated the ref here: #33512
| public static explicit operator Rune(int value) { throw null; } | ||
| public bool IsAscii { get { throw null; } } | ||
| public bool IsBmp { get { throw null; } } | ||
| public int Plane { get { throw null; } } |
There was a problem hiding this comment.
Should this be more descriptive: UnicodePlane?
There was a problem hiding this comment.
I think current version should be fine considering the whole class is related to the Unicode code point
| public bool IsAscii { get { throw null; } } | ||
| public bool IsBmp { get { throw null; } } | ||
| public int Plane { get { throw null; } } | ||
| public static Rune ReplacementChar { get { throw null; } } |
There was a problem hiding this comment.
U+FFFD REPLACEMENT CHARACTER is the official Unicode name for this value. We can expand "Char" -> "Character", but we shouldn't use the name Rune here.
|
try to build all configurations to ensure building the packages as I expect the api compat will fail for .Net Native as the APIs wouldn't should up there yet and you'll need to add some entries to the baseline file. |
904718e to
415cd7c
Compare
|
The latest commit is just in case we change the name |
|
My 2c: @GrabYourPitchforks I understood the |
|
@krwq Unicode Scalar is right term here. it is different than code point. The code point is: while Unicode scalar is: This type is about Unicode Scalar |
|
Thanks @tarekgh I can see in fact tests confirming the spec definition. I concur with the name in this case :-) |
|
The coreclr implementation is now checked in. I'll rebase this PR when the train makes its way to corefx, and hopefully everything will just pass. :) (fingers crossed) |
|
@dotnet-bot test this please |
|
The train from coreclr into corefx is delayed to a build infrastructure issue. The issue appears to be resolved now but there's a small backlog. I'm hoping the train will arrive in a few hours, and then I can rebase on top of it and get this to pass. |
be02a24 to
c7646c4
Compare
Commit migrated from dotnet/corefx@7626a26
Reference assembly APIs and unit tests for dotnet/coreclr#20935
This PR will not pass CI until the underlying coreclr change makes its way over.