Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Optimize StackTrace.ToString() generation#20448

Merged
jkotas merged 1 commit intodotnet:masterfrom
NickCraver:craver/stacktrace-optimization
Oct 17, 2018
Merged

Optimize StackTrace.ToString() generation#20448
jkotas merged 1 commit intodotnet:masterfrom
NickCraver:craver/stacktrace-optimization

Conversation

@NickCraver
Copy link
Member

This is optimizing StackTrace.ToString() in general. The newer version of StackTrace makes async stack traces far more usable, but also introduces overhead due to System.Reflection being...less than awesome.

The code in question is trying to take what is an async stack frame and find the "user" method that goes with it. This happens by checking if the type we're on is decorated with [CompilerGenerated] (via .IsDefined()) and if so, looping through all methods in the parent class decorated with [AsyncStateMachine], looping at their Type property, and seeing which one we belong with.

With the above info, we know we need not look at parents in the reflection path. This introduces additional overhead in both .IsDefined() and .GetCustomAttributes<T>() in pretty much every call to them. Since this resolution runs for every stack frame in every exception, so the multiplicative impact is quite large.

Details

For .GetCustomAttributes<T>():
This eventually calls down to CustomAttribute.GetCustomAttributes(). By looking at the short circuit path in:

if (!inherit || (caType.IsSealed && !CustomAttribute.GetAttributeUsage(caType).Inherited))
{
object[] attributes = GetCustomAttributes(method.GetRuntimeModule(), method.MetadataToken, pcaCount, caType);
if (pcaCount > 0) Array.Copy(pca, 0, attributes, attributes.Length - pcaCount, pcaCount);
return attributes;
}

We can avoid the list allocation, the array from .ToArray(), and the resulting combined array by not looking for inherited attributed members that cannot exist for our purposes of creating a stack trace.

For .IsDefined():
This eventually calls down to CustomAttribute.IsDefined(). We can also short circuit overhead there.

Impact

In a test deploy of some .NET Core code for websockets at Stack Overflow, we were throwing a few thousand exceptions per minute (not that many) and racked up millions of arrays from these code paths. Here's a quick summary of that memory dump:
image

Note that Gen 0 is not being cleared correctly (that's a separate issue), but we needn't be allocating these arrays in the first place.

I'll open another PR tomorrow (thanks to @benaadams for digging with me!) where we have a plan to utilize Array.Empty<T>() in the reflection miss (no attributes found) path that's another optimization on top of this with quick wins for all reflection consumers.

cc @benaadams @mgravell @deanward81

When looking for compiler generated async state machine types and methods, we need not look at parents in the reflection path. This introduces additional overhead in both .IsDefined() and .GetCustomAttributes<T>() in every path.

See the short circuit path in: https://github.com/dotnet/coreclr/blob/57f8358221a3c4ad7f1608f625bc3c5936618505/src/System.Private.CoreLib/src/System/Reflection/CustomAttribute.cs#L1315-L1320

We can avoid the list allocation, the array from it, and the resulting combined array by not looking for inherited attributed members that cannot exist for our purposes of creating a stack trace.
@stephentoub
Copy link
Member

cc: @noahfalk

@jkotas jkotas merged commit 34327d3 into dotnet:master Oct 17, 2018
NickCraver added a commit to NickCraver/coreclr that referenced this pull request Nov 3, 2018
This is a follow-up to many allocations recognized in dotnet#20448.

A lot of methods ultimately call through CustomAttributes.GetCustomAttributes(). In the inherited search path (default for most searches above), the inheritance path of a class is traversed, resulting in an array allocation per crawled type, a copy to the overall List<object> and then after that - in current code - an array allocation from that list and a typed array allocation it's copied to.

However, in the common miss case as simple as:
typeof(T).GetCustomAttributes(typeof(TAttr), true);
...and many other overloads, all the same path underneath...

We can avoid the last 2 arrays in the miss case. We have the List<object> to go off of. If that's a zero-entry list, we can return an Array.Empty<TAttr>(). That's effectively what this change does.

While converting the entire attribute pipeline to generics is problematic and has issue since some object[] return abstracts aren't sealed, we can at least somewhat trivially cache an array per attribute type (only one static, ultimately from Array.Empty<T> underneath) and return that for the miss case.

There are far more wins to be had here, but they require more changes.
jkotas pushed a commit that referenced this pull request Nov 4, 2018
…'s inherited path (#20779)

* Optimization: avoid 2 array allocations in inherited attribute misses

This is a follow-up to many allocations recognized in #20448.

A lot of methods ultimately call through CustomAttributes.GetCustomAttributes(). In the inherited search path (default for most searches above), the inheritance path of a class is traversed, resulting in an array allocation per crawled type, a copy to the overall List<object> and then after that - in current code - an array allocation from that list and a typed array allocation it's copied to.

However, in the common miss case as simple as:
typeof(T).GetCustomAttributes(typeof(TAttr), true);
...and many other overloads, all the same path underneath...

We can avoid the last 2 arrays in the miss case. We have the List<object> to go off of. If that's a zero-entry list, we can return an Array.Empty<TAttr>(). That's effectively what this change does.

While converting the entire attribute pipeline to generics is problematic and has issue since some object[] return abstracts aren't sealed, we can at least somewhat trivially cache an array per attribute type (only one static, ultimately from Array.Empty<T> underneath) and return that for the miss case.

There are far more wins to be had here, but they require more changes.

* Move RuntimeType empty array cache generation to Attribute.CreateAttributeArrayHelper

This exposes Attribute.CreateAttributeArrayHelper to internal and uses it directly on the RuntimeType caching for empty arrays. Though this allocated 1 additional array overall, it's simpler, faster to init, and still is an infinite win over the per-call allocations before this overall changesets.

* CustomAttributes: remove needless array copy in the inherited hit case

This removes a .ToArray() for the sake of Array.Copy() where a simple for loop suffices and removes the allocation. Reversing the "empty" result checks is also just a bit cleaner here.

This also expands the same fix to the MemberInfo path.

Note: should DRY these up too (longstanding issue) - but let's do that in a separate commit for clarity.

* GetCusomAttributes: use ListBuilder<object> for inheritance crawls

This exposes RuntimeType.ListBuilder<T> for internal usage and replaces the List<T> allocation in GetCustomAttributes() paths to reduce allocations and increase performance in the inherited crawl paths (which is the default for many optional-parameter methods in layers above).

Note: there is a subtle behavior depending on previous-null (not possible with a struct now) in AttributeUsageCheck() that I believe still behaves correctly, but could use another set of eyes and a full test suite run to confirm. object[] attributes was removed there simply because it wasn't used before - only cleaning up.

* Attribute caching: use Array.CreateInstance() directly on RuntimeType

This also reverts the CreateAttributeArrayHelper => internal change, since it's no longer needed.

* Ref passing for RuntimeType.ListBuilder<object> & CustomAttribute simplification

This fixes the struct passing duplication and tweaks how we're creating arrays a bit, centralizing the zero-element checks to cover all cases as well as simplify the per-method code to rely on the fact this is happening underneath.
A-And pushed a commit to A-And/coreclr that referenced this pull request Nov 20, 2018
When looking for compiler generated async state machine types and methods, we need not look at parents in the reflection path. This introduces additional overhead in both .IsDefined() and .GetCustomAttributes<T>() in every path.

See the short circuit path in: https://github.com/dotnet/coreclr/blob/57f8358221a3c4ad7f1608f625bc3c5936618505/src/System.Private.CoreLib/src/System/Reflection/CustomAttribute.cs#L1315-L1320

We can avoid the list allocation, the array from it, and the resulting combined array by not looking for inherited attributed members that cannot exist for our purposes of creating a stack trace.
A-And pushed a commit to A-And/coreclr that referenced this pull request Nov 20, 2018
…'s inherited path (dotnet#20779)

* Optimization: avoid 2 array allocations in inherited attribute misses

This is a follow-up to many allocations recognized in dotnet#20448.

A lot of methods ultimately call through CustomAttributes.GetCustomAttributes(). In the inherited search path (default for most searches above), the inheritance path of a class is traversed, resulting in an array allocation per crawled type, a copy to the overall List<object> and then after that - in current code - an array allocation from that list and a typed array allocation it's copied to.

However, in the common miss case as simple as:
typeof(T).GetCustomAttributes(typeof(TAttr), true);
...and many other overloads, all the same path underneath...

We can avoid the last 2 arrays in the miss case. We have the List<object> to go off of. If that's a zero-entry list, we can return an Array.Empty<TAttr>(). That's effectively what this change does.

While converting the entire attribute pipeline to generics is problematic and has issue since some object[] return abstracts aren't sealed, we can at least somewhat trivially cache an array per attribute type (only one static, ultimately from Array.Empty<T> underneath) and return that for the miss case.

There are far more wins to be had here, but they require more changes.

* Move RuntimeType empty array cache generation to Attribute.CreateAttributeArrayHelper

This exposes Attribute.CreateAttributeArrayHelper to internal and uses it directly on the RuntimeType caching for empty arrays. Though this allocated 1 additional array overall, it's simpler, faster to init, and still is an infinite win over the per-call allocations before this overall changesets.

* CustomAttributes: remove needless array copy in the inherited hit case

This removes a .ToArray() for the sake of Array.Copy() where a simple for loop suffices and removes the allocation. Reversing the "empty" result checks is also just a bit cleaner here.

This also expands the same fix to the MemberInfo path.

Note: should DRY these up too (longstanding issue) - but let's do that in a separate commit for clarity.

* GetCusomAttributes: use ListBuilder<object> for inheritance crawls

This exposes RuntimeType.ListBuilder<T> for internal usage and replaces the List<T> allocation in GetCustomAttributes() paths to reduce allocations and increase performance in the inherited crawl paths (which is the default for many optional-parameter methods in layers above).

Note: there is a subtle behavior depending on previous-null (not possible with a struct now) in AttributeUsageCheck() that I believe still behaves correctly, but could use another set of eyes and a full test suite run to confirm. object[] attributes was removed there simply because it wasn't used before - only cleaning up.

* Attribute caching: use Array.CreateInstance() directly on RuntimeType

This also reverts the CreateAttributeArrayHelper => internal change, since it's no longer needed.

* Ref passing for RuntimeType.ListBuilder<object> & CustomAttribute simplification

This fixes the struct passing duplication and tweaks how we're creating arrays a bit, centralizing the zero-element checks to cover all cases as well as simplify the per-method code to rely on the fact this is happening underneath.
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
…'s inherited path (dotnet/coreclr#20779)

* Optimization: avoid 2 array allocations in inherited attribute misses

This is a follow-up to many allocations recognized in dotnet/coreclr#20448.

A lot of methods ultimately call through CustomAttributes.GetCustomAttributes(). In the inherited search path (default for most searches above), the inheritance path of a class is traversed, resulting in an array allocation per crawled type, a copy to the overall List<object> and then after that - in current code - an array allocation from that list and a typed array allocation it's copied to.

However, in the common miss case as simple as:
typeof(T).GetCustomAttributes(typeof(TAttr), true);
...and many other overloads, all the same path underneath...

We can avoid the last 2 arrays in the miss case. We have the List<object> to go off of. If that's a zero-entry list, we can return an Array.Empty<TAttr>(). That's effectively what this change does.

While converting the entire attribute pipeline to generics is problematic and has issue since some object[] return abstracts aren't sealed, we can at least somewhat trivially cache an array per attribute type (only one static, ultimately from Array.Empty<T> underneath) and return that for the miss case.

There are far more wins to be had here, but they require more changes.

* Move RuntimeType empty array cache generation to Attribute.CreateAttributeArrayHelper

This exposes Attribute.CreateAttributeArrayHelper to internal and uses it directly on the RuntimeType caching for empty arrays. Though this allocated 1 additional array overall, it's simpler, faster to init, and still is an infinite win over the per-call allocations before this overall changesets.

* CustomAttributes: remove needless array copy in the inherited hit case

This removes a .ToArray() for the sake of Array.Copy() where a simple for loop suffices and removes the allocation. Reversing the "empty" result checks is also just a bit cleaner here.

This also expands the same fix to the MemberInfo path.

Note: should DRY these up too (longstanding issue) - but let's do that in a separate commit for clarity.

* GetCusomAttributes: use ListBuilder<object> for inheritance crawls

This exposes RuntimeType.ListBuilder<T> for internal usage and replaces the List<T> allocation in GetCustomAttributes() paths to reduce allocations and increase performance in the inherited crawl paths (which is the default for many optional-parameter methods in layers above).

Note: there is a subtle behavior depending on previous-null (not possible with a struct now) in AttributeUsageCheck() that I believe still behaves correctly, but could use another set of eyes and a full test suite run to confirm. object[] attributes was removed there simply because it wasn't used before - only cleaning up.

* Attribute caching: use Array.CreateInstance() directly on RuntimeType

This also reverts the CreateAttributeArrayHelper => internal change, since it's no longer needed.

* Ref passing for RuntimeType.ListBuilder<object> & CustomAttribute simplification

This fixes the struct passing duplication and tweaks how we're creating arrays a bit, centralizing the zero-element checks to cover all cases as well as simplify the per-method code to rely on the fact this is happening underneath.


Commit migrated from dotnet/coreclr@6d9fc60
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants