-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Add ReadOnlySpan string-like ToLower/ToUpper API with globalization support #16379
Conversation
| public static class Span | ||
| { | ||
| // s_invariantMode is defined for the perf reason as accessing the instance field is faster than access the static property GlobalizationMode.Invariant | ||
| private static readonly bool s_invariantMode = GlobalizationMode.Invariant; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cache does not make sense. GlobalizationMode.Invariant property should have the exact same perf as you local cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was borrowed from CompareInfo:
| // _invariantMode is defined for the perf reason as accessing the instance field is faster than access the static property GlobalizationMode.Invariant |
Should it be removed from there too?
Would calling the Invariant property result in this function call every time?
| internal static bool GetInvariantSwitchValue() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would calling the Invariant property result in this function call every time?
No, the syntax:
internal static bool Invariant { get; } = GetGlobalizationInvariantMode();is equivalent to:
private readonly static bool backingFieldInvariant = GetGlobalizationInvariantMode();
internal static bool Invariant => backingFieldInvariant;There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was borrowed from CompareInfo:
The one in CompareInfo is instance field. This one is static field. The performance characteristics of instance and static fields are quite different (instance fields are faster).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be removed from there too?
No, in CompareInfo it is an instance field and not static.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The performance characteristics of instance and static fields are quite different (instance fields are faster).
Good to know. I missed the detail of it being an instance field vs. a static field here.
No, the syntax... is equivalent to...
Ah I see. Thanks for clarifying.
|
|
||
| if (s_invariantMode) | ||
| { | ||
| CultureInfo.CurrentCulture.TextInfo.ToLowerAsciiInvariant(source, destination); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should use the passed in culture; not CurrentCulture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Same in all other places.)
| if (s_invariantMode) | ||
| { | ||
| CultureInfo.CurrentCulture.TextInfo.ToLowerAsciiInvariant(source, destination); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to return here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Same in all other places.)
| { | ||
| fixed (char* pResult = &MemoryMarshal.GetReference(destination)) | ||
| { | ||
| #if CORECLR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This CORECLR ifdef is not necessary.
| else | ||
| #endif | ||
| { | ||
| ChangeCase(pSource, source.Length, pResult, destination.Length, toUpper); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was there a conclusion on whether it is safe to depend on 1:1 mapping between lower and upper case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that the current String APIs make this assumption, but that assumption does not affect their public shape - if needed, we can fix them without changing their public signature.
It is not true for the Span APIs. If there is no 1:1 mapping, the Span APIs need to start returning number of characters copied to output buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a reasonable suggestion for span (and maybe in the future we can support it for strings too).
@ahsonkhan could you discuss this with the design committee to update such APIs? I know we mentioned we are not supporting this today but this doesn't prevent us to start supporting it with the spans.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. If we change our API to return the copied characters, though redundant atm, it will be flexible to possible future implementation changes.
To be clear, is the new proposal for the APIs as follows (including the Invariant versions)?
public static int ToLower(this ReadOnlySpan<char> source, Span<char> destination, CultureInfo culture);
public static int ToLowerInvariant(this ReadOnlySpan<char> source, Span<char> destination);
public static int ToUpper(this ReadOnlySpan<char> source, Span<char> destination, CultureInfo culture);
public static int ToUpperInvariant(this ReadOnlySpan<char> source, Span<char> destination);There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, is the new proposal for the APIs as follows (including the Invariant versions)?
Yes, Invariant is a culture :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question, what we'll do if the destination span is not enough? we throw? or return -1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer that we throw just like we do atm for destination.Length < source.Length since the assumption is that the length required can be statically known at compile time prior to calling the method.
However, if the 1:1 mapping does not hold in the future, then my assumption breaks and throwing may not be the best solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we throw only because of the length? if so, I then prefer not throw and return -1 in case not enough buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these APIs, should we enable partial processing (and ability to restart), like we do with some of our Base64 APIs, and return OperationStatus with charactersConsumed and charactersWritten out parameters? Or is that too much complexity for such a low level API, especially since there is a 1:1 mapping atm?
For example:
public static OperationStatus ToLower(this ReadOnlySpan<char>, Span<char> destination, CultureInfo culture, out char charactersConsumed, out char charactersWritten);There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we may have a simple one which just returns -1 and we can have the overload that takes more parameters if we find a demand for that.
| destination[i] = (char)(source[i] | 0x20); | ||
| i++; | ||
|
|
||
| while (i < source.Length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The string-based implementation of this has two separate loops like this because it wants to optimize the path where nothing needs to change and thus nothing needs to allocate. But here with span, we always need to look at every character and copy every character, potentially modified, to the output destination. Can we just consolidate it all to a single loop then? e.g.
internal void ToLowerAsciiInvariant(ReadOnlySpan<char> source, Span<char> destination)
{
Debug.Assert(destination.Length >= source.Length);
for (int i = 0; i < source.Length; i++)
{
destination[i] = ToLowerAsciiInvariant(source[i]);
}
}or is there some reason that's not as good?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the optimization which avoids having to do the or by 0x20 operation where not necessary (i.e when ((uint)(source[i] - 'A') > ('Z' - 'A'))?
I suspect the extra CopyTo probably hurts performance more, so I will stick to the simpler implementation. I will measure the perf and compare.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the optimization
That's in the char method as well:
https://github.com/dotnet/coreclr/pull/16379/files/59903ebba9bd23fa952f834e41c87bfe3b726b85#diff-64100f763acd6161cac733b7534a9301R381
We'd just want to make sure that method gets inlined.
| { | ||
| // Assuming that changing case does not affect length | ||
| if (destination.Length < source.Length) | ||
| return -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even in a hypothetical future, will invariant ever result in changing case affecting length?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is possible. The sorting and casing can change for the Invariant :-(
| if (GlobalizationMode.Invariant) | ||
| CultureInfo.InvariantCulture.TextInfo.ToLowerAsciiInvariant(source, destination); | ||
| else | ||
| CultureInfo.InvariantCulture.TextInfo.ChangeCase(source, destination, toUpper: false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tarekgh, is there a difference between these two operations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there is a difference. ToLowerAsciiInvariant is just trying to be optimized for the Ascii but ChangeCase is not. for example, if you have a Turkish text would be handled wrong if you do it with ToLowerAsciiInvariant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this naming confusing then? Turning on the "Invariant" mode potentially produces different results than using the InvariantCulture?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, Invariant mode means we'll use the Invariant culture data all the time (without going to call the underlying OS) this will include doing all sorting/casing operations as ordinal. I believe whoever introduced ToLowerAsciiInvariant meant to have this operation mainly for Ascii strings and at the same time not using a culture like Turkish to avoid the Turkish 'I' problem. but in general, you are right for the statement, the invariant mode can produce different results than invariant culture in the sorting and casing area.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Thanks.
| char* b = bp; | ||
|
|
||
| while (length != 0 && (*a < 0x80) && (*b < 0x80) && (!HighCharTable[*a]) && (!HighCharTable[*b])) | ||
| while (length != 0 && (*a < 0x80) && (*b < 0x80) && (!s_highCharTable[*a]) && (!s_highCharTable[*b])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the JIT successfully avoid bounds checks on s_highCharTable on each iteration of the loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like it is (tested on windows). We have two compares to 80h.
while (length != 0 && (*a < 0x80) && (*b < 0x80) && (!s_highCharTable[*a]) && (!s_highCharTable[*b]))
00007FF993402A25 test edx,edx
00007FF993402A27 je 00007FF993402A7C
00007FF993402A29 movzx r10d,word ptr [rcx]
00007FF993402A2D cmp r10d,80h
00007FF993402A34 jge 00007FF993402A7C
00007FF993402A36 movzx r11d,word ptr [rax]
00007FF993402A3A cmp r11d,80h
00007FF993402A41 jge 00007FF993402A7C
00007FF993402A43 mov rsi,23530CE28B0h
00007FF993402A4D mov rsi,qword ptr [rsi]
00007FF993402A50 mov rdi,rsi
00007FF993402A53 mov ebx,r10d
00007FF993402A56 mov ebp,dword ptr [rdi+8]
00007FF993402A59 cmp ebx,ebp
00007FF993402A5B jae 00007FF993402AC0
00007FF993402A5D movsxd rbx,ebx
00007FF993402A60 cmp byte ptr [rdi+rbx+10h],0
00007FF993402A65 jne 00007FF993402A7C
00007FF993402A67 mov edi,r11d
00007FF993402A6A cmp edi,ebp
00007FF993402A6C jae 00007FF993402AC0
00007FF993402A6E movsxd rdi,edi
00007FF993402A71 cmp byte ptr [rsi+rdi+10h],0
00007FF993402A76 je 00007FF9934029DD
cc @AndyAyersMS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if the code would result in better codegen if s_highCharTable were stored into a local before the loop...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't get rid of the bounds checks.
We save a couple of instructions at the prologue/epilogue with pushing to/popping off the stack.
|
|
||
| for (int i = 0; i < source.Length; i++) | ||
| { | ||
| destination[i] = ToLowerAsciiInvariant(source[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were you able to confirm that ToLower/UpperAsciiInvariant gets inlined? Seems like it should, but it'd be good to double check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it does.
for (int i = 0; i < source.Length; i++)
00007FF9934127D0 sub rsp,28h
00007FF9934127D4 mov rax,qword ptr [rcx]
00007FF9934127D7 mov ecx,dword ptr [rcx+8]
00007FF9934127DA xor r8d,r8d
00007FF9934127DD test ecx,ecx
00007FF9934127DF jle 00007FF993412814
{
destination[i] = ToLowerAsciiInvariant(source[i]);
00007FF9934127E1 cmp r8d,dword ptr [rdx+8]
00007FF9934127E5 jae 00007FF993412819
00007FF9934127E7 mov r9,qword ptr [rdx]
00007FF9934127EA movsxd r10,r8d
00007FF9934127ED lea r9,[r9+r10*2]
00007FF9934127F1 movzx r10d,word ptr [rax+r10*2]
00007FF9934127F6 lea r11d,[r10-41h]
00007FF9934127FA cmp r11d,19h
00007FF9934127FE ja 00007FF993412808
00007FF993412800 or r10d,20h
00007FF993412804 movzx r10d,r10w
00007FF993412808 mov word ptr [r9],r10w
for (int i = 0; i < source.Length; i++)
00007FF99341280C inc r8d
00007FF99341280F cmp r8d,ecx
00007FF993412812 jl 00007FF9934127E1
00007FF993412814 add rsp,28h
00007FF993412818 ret
00007FF993412819 call 00007FF9F3069860
00007FF99341281E int 3 for (int i = 0; i < source.Length; i++)
00007FF993412A50 sub rsp,28h
00007FF993412A54 mov rax,qword ptr [rcx]
00007FF993412A57 mov ecx,dword ptr [rcx+8]
00007FF993412A5A xor r8d,r8d
00007FF993412A5D test ecx,ecx
00007FF993412A5F jle 00007FF993412A94
{
destination[i] = ToUpperAsciiInvariant(source[i]);
00007FF993412A61 cmp r8d,dword ptr [rdx+8]
00007FF993412A65 jae 00007FF993412A99
00007FF993412A67 mov r9,qword ptr [rdx]
00007FF993412A6A movsxd r10,r8d
00007FF993412A6D lea r9,[r9+r10*2]
00007FF993412A71 movzx r10d,word ptr [rax+r10*2]
00007FF993412A76 lea r11d,[r10-61h]
00007FF993412A7A cmp r11d,19h
00007FF993412A7E ja 00007FF993412A88
00007FF993412A80 and r10d,0FFFFFFDFh
00007FF993412A84 movzx r10d,r10w
00007FF993412A88 mov word ptr [r9],r10w
for (int i = 0; i < source.Length; i++)
00007FF993412A8C inc r8d
00007FF993412A8F cmp r8d,ecx
00007FF993412A92 jl 00007FF993412A61
00007FF993412A94 add rsp,28h
00007FF993412A98 ret
00007FF993412A99 call 00007FF9F3069860
00007FF993412A9E int 3
stephentoub
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks, @ahsonkhan.
| /// Copies the characters from the source span into the destination, converting each character to lowercase | ||
| /// using the casing rules of the invariant culture. | ||
| /// </summary> | ||
| public static int ToLowerInvariant(this ReadOnlySpan<char> source, Span<char> destination) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this call the ToLower with culture = CultureInfo.InvariantCulture passed in? Same with ToUpper.
It avoids code duplication at the cost of an extra null check.
Part of https://github.com/dotnet/corefx/issues/21395#issuecomment-359906138
TODO:
Add the allocating counterpart for slow span in corefx and add tests there.Related corefx PR: dotnet/corefx#27193
cc @jkotas, @stephentoub, @KrzysztofCwalina, @tarekgh, @JeremyKuhne