-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Add internal Array.Clear(Array), optimize some Array callers + codegen #51548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Tagging subscribers to this area: @tannergooding Issue DetailsOptimized some calls to This is marked as draft because I'm soliciting feedback on whether to pursue this. It introduces some complexity to the The codegen for 00007ffa`321721a0 4883ec28 sub rsp, 28h
00007ffa`321721a4 4885c9 test rcx, rcx ; arr != null check
00007ffa`321721a7 745d je System_Private_CoreLib!System.Array.Clear(System.Array)+0xffffffff`a0fc77c6 (00007ffa`32172206)
00007ffa`321721a9 488b11 mov rdx, qword ptr [rcx] ; rdx := pMethodTable
00007ffa`321721ac 4c8bc2 mov r8, rdx
00007ffa`321721af 410fb700 movzx eax, word ptr [r8] ; rax := MethodTable::ComponentSize
00007ffa`321721b3 480faf4108 imul rax, qword ptr [rcx+8] ; rax := TotalByteLength (ComponentSize * NumElements)
00007ffa`321721b8 8b5204 mov edx, dword ptr [rdx+4] ; rdx := MethodTable::BaseSize
00007ffa`321721bb 4883c108 add rcx, 8 ; ideally would be rolled into line immediately below, but eh, whatever
00007ffa`321721bf 488d4c11f0 lea rcx, [rcx+rdx-10h] ; rcx := &Array.FirstElement
00007ffa`321721c4 41f70000000001 test dword ptr [r8], 1000000h ; <does the method table contain gc-tracked types?>
; <snip>jit-diff output for CoreLib below. Not sure what's going on with some methods being deleted + reintroduced. Maybe something weird in my build environment? Codegen for methods which now call Top method regressions (bytes):
260 ( ∞ of base) : System.Private.CoreLib.dasm - Array:<Sort>g__GenericSort|128_0(Array,Array,int,int) (0 base, 1 diff methods)
151 ( ∞ of base) : System.Private.CoreLib.dasm - Array:<BinarySearch>g__GenericBinarySearch|82_0(Array,int,int,Object):int (0 base, 1 diff methods)
151 ( ∞ of base) : System.Private.CoreLib.dasm - Array:<IndexOf>g__GenericIndexOf|107_0(Array,Object,int,int):int (0 base, 1 diff methods)
151 ( ∞ of base) : System.Private.CoreLib.dasm - Array:<LastIndexOf>g__GenericLastIndexOf|113_0(Array,Object,int,int):int (0 base, 1 diff methods)
120 ( ∞ of base) : System.Private.CoreLib.dasm - SpanHelpers:ClearWithReferences(byref,long) (0 base, 1 diff methods)
117 ( ∞ of base) : System.Private.CoreLib.dasm - Array:Clear(Array) (0 base, 1 diff methods)
5 ( ∞ of base) : System.Private.CoreLib.dasm - Array:get_NativeLength():long:this (0 base, 1 diff methods)
1 (25.00% of base) : System.Private.CoreLib.dasm - Array:get_LongLength():long:this
Top method improvements (bytes):
-260 (-100.00% of base) : System.Private.CoreLib.dasm - Array:<Sort>g__GenericSort|125_0(Array,Array,int,int) (1 base, 0 diff methods)
-151 (-100.00% of base) : System.Private.CoreLib.dasm - Array:<BinarySearch>g__GenericBinarySearch|79_0(Array,int,int,Object):int (1 base, 0 diff methods)
-151 (-100.00% of base) : System.Private.CoreLib.dasm - Array:<IndexOf>g__GenericIndexOf|104_0(Array,Object,int,int):int (1 base, 0 diff methods)
-151 (-100.00% of base) : System.Private.CoreLib.dasm - Array:<LastIndexOf>g__GenericLastIndexOf|110_0(Array,Object,int,int):int (1 base, 0 diff methods)
-62 (-29.81% of base) : System.Private.CoreLib.dasm - String:Ctor(ushort,int):String:this
-22 (-2.41% of base) : System.Private.CoreLib.dasm - TlsOverPerCoreLockedStacksArrayPool`1:Return(ref,bool):this
-13 (-6.02% of base) : System.Private.CoreLib.dasm - Array:Copy(Array,int,Array,int,int)
-10 (-1.34% of base) : System.Private.CoreLib.dasm - Buffer:BlockCopy(Array,int,Array,int,int)
-9 (-1.01% of base) : System.Private.CoreLib.dasm - Array:Copy(Array,int,Array,int,int,bool)
-6 (-2.70% of base) : System.Private.CoreLib.dasm - Array:Clear(Array,int,int)
-6 (-0.59% of base) : System.Private.CoreLib.dasm - TlsOverPerCoreLockedStacksArrayPool`1:Trim():bool:this
-5 (-1.62% of base) : System.Private.CoreLib.dasm - ConfigurableArrayPool`1:Return(ref,bool):this
-4 (-1.71% of base) : System.Private.CoreLib.dasm - Array:Copy(Array,Array,int)
-2 (-1.72% of base) : System.Private.CoreLib.dasm - Array:UnsafeArrayAsSpan(Array,int,int):Span`1
-2 (-1.13% of base) : System.Private.CoreLib.dasm - Buffer:ByteLength(Array):int
-2 (-2.99% of base) : System.Private.CoreLib.dasm - Buffer:GetByte(Array,int):ubyte
-2 (-2.82% of base) : System.Private.CoreLib.dasm - Buffer:SetByte(Array,int,ubyte)
|
|
Note to reviewers - test failures suggest that there are scenarios where the padding inside the array object is non-zero. However, browsing through the sources, I can't find any examples of where this might be the case. Thoughts? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see the improvement here. Is this just a micro-optimization to compensate for missing JIT optimization? I would much preffer to fix the JIT instead of this.
Whatever it is, I do not think we should be introducing code that depends on this padding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the NativeLength property to make these casts go away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of this was trying to reduce codegen and save a register allocation by having a qword comparison directly against the NativeLength field rather than first requiring zero-extension from a dword field into a qword register. Might have to give up that optimization if we can't assume the padding is all-zero, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think it is worth it to complicate the code to save a few bytes of codegen. It is very x86/64 specific code-size micro-optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could just have NativeLength be => Unsafe.As<RawArrayData>(this).Length to force the zero-extension to nuint.
That would make it easier to switch to properly be NativeLength if we ever decide to make the switch in the future and would avoid the potential problems here while keeping things as is (not requiring more complex changes to actually track nuint in the Array metadata today).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can't actually overflow, can it? arr.Length <= Array.MaxLength < int.MaxValue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mono isn't bound by this same constraint. It also seems fragile to depend on the runtime never allowing this value to increase in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is out of date.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we open an issue to expose Array.Clear(Array) publicly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pulled out into #51581.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (index < lowerBound || offset < 0 || length < 0 || (uint)(offset + length) > array.NativeLength) | |
| if (index < lowerBound || (offset | length) < 0 || (uint)(offset + length) > array.NativeLength) |
Ideally the JIT could do this optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happily, already in progress! :)
#13573
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be more convenient to just turn this into no-op if the array is null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the typical matchcount.Length? Is this going to be an improvement in typical case?
|
Latest commit explicitly zeroes out the padding in |
7008bdd to
9b65c32
Compare
|
Force-push because I undid most of the work in this PR. The PR is now greatly slimmed:
|
jkotas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Optimized some calls to
Array.Clear, including adding an internal non-genericArray.Clear(Array)method for common use cases, plus redirecting some existing calls toArray.ClearorSpan<T>.Clear(see #51534) orSpan<T>.Fill.This is marked as draft because I'm soliciting feedback on whether to pursue this. It introduces some complexity to the
RawArrayDataandRawDatatypes (using FieldOffset) in order to knock down the codegen. Whether complicating the C# definitions of these types in order to save some bytes of codegen is worth it I'll leave up to smarter people. :)The codegen for
Array.Clear(Array)is pretty well optimized now:jit-diff output for CoreLib below. Not sure what's going on with some methods being deleted + reintroduced. Maybe something weird in my build environment? Codegen for methods which now call
Array.Clear(Array)instead ofArray.Clear(Array, int, int)is smaller, as expected. Methods likeArray.Copywhich now rely on the internalNativeLengthproperty instead ofLongLengthalso saw codegen reductions.Top method regressions (bytes): 260 ( ∞ of base) : System.Private.CoreLib.dasm - Array:<Sort>g__GenericSort|128_0(Array,Array,int,int) (0 base, 1 diff methods) 151 ( ∞ of base) : System.Private.CoreLib.dasm - Array:<BinarySearch>g__GenericBinarySearch|82_0(Array,int,int,Object):int (0 base, 1 diff methods) 151 ( ∞ of base) : System.Private.CoreLib.dasm - Array:<IndexOf>g__GenericIndexOf|107_0(Array,Object,int,int):int (0 base, 1 diff methods) 151 ( ∞ of base) : System.Private.CoreLib.dasm - Array:<LastIndexOf>g__GenericLastIndexOf|113_0(Array,Object,int,int):int (0 base, 1 diff methods) 120 ( ∞ of base) : System.Private.CoreLib.dasm - SpanHelpers:ClearWithReferences(byref,long) (0 base, 1 diff methods) 117 ( ∞ of base) : System.Private.CoreLib.dasm - Array:Clear(Array) (0 base, 1 diff methods) 5 ( ∞ of base) : System.Private.CoreLib.dasm - Array:get_NativeLength():long:this (0 base, 1 diff methods) 1 (25.00% of base) : System.Private.CoreLib.dasm - Array:get_LongLength():long:this Top method improvements (bytes): -260 (-100.00% of base) : System.Private.CoreLib.dasm - Array:<Sort>g__GenericSort|125_0(Array,Array,int,int) (1 base, 0 diff methods) -151 (-100.00% of base) : System.Private.CoreLib.dasm - Array:<BinarySearch>g__GenericBinarySearch|79_0(Array,int,int,Object):int (1 base, 0 diff methods) -151 (-100.00% of base) : System.Private.CoreLib.dasm - Array:<IndexOf>g__GenericIndexOf|104_0(Array,Object,int,int):int (1 base, 0 diff methods) -151 (-100.00% of base) : System.Private.CoreLib.dasm - Array:<LastIndexOf>g__GenericLastIndexOf|110_0(Array,Object,int,int):int (1 base, 0 diff methods) -62 (-29.81% of base) : System.Private.CoreLib.dasm - String:Ctor(ushort,int):String:this -22 (-2.41% of base) : System.Private.CoreLib.dasm - TlsOverPerCoreLockedStacksArrayPool`1:Return(ref,bool):this -13 (-6.02% of base) : System.Private.CoreLib.dasm - Array:Copy(Array,int,Array,int,int) -10 (-1.34% of base) : System.Private.CoreLib.dasm - Buffer:BlockCopy(Array,int,Array,int,int) -9 (-1.01% of base) : System.Private.CoreLib.dasm - Array:Copy(Array,int,Array,int,int,bool) -6 (-2.70% of base) : System.Private.CoreLib.dasm - Array:Clear(Array,int,int) -6 (-0.59% of base) : System.Private.CoreLib.dasm - TlsOverPerCoreLockedStacksArrayPool`1:Trim():bool:this -5 (-1.62% of base) : System.Private.CoreLib.dasm - ConfigurableArrayPool`1:Return(ref,bool):this -4 (-1.71% of base) : System.Private.CoreLib.dasm - Array:Copy(Array,Array,int) -2 (-1.72% of base) : System.Private.CoreLib.dasm - Array:UnsafeArrayAsSpan(Array,int,int):Span`1 -2 (-1.13% of base) : System.Private.CoreLib.dasm - Buffer:ByteLength(Array):int -2 (-2.99% of base) : System.Private.CoreLib.dasm - Buffer:GetByte(Array,int):ubyte -2 (-2.82% of base) : System.Private.CoreLib.dasm - Buffer:SetByte(Array,int,ubyte)