Skip to content

Conversation

@stephentoub
Copy link
Member

We previously special-cased up to three active AsyncLocals in a given async flow, but it seems four is also very common. Special-casing four as well results in four using ~20% less allocation and ~10% less CPU overhead in a simple mutation test.

private AsyncLocal<int> asyncLocal1 = new AsyncLocal<int>();
private AsyncLocal<int> asyncLocal2 = new AsyncLocal<int>();
private AsyncLocal<int> asyncLocal3 = new AsyncLocal<int>();
private AsyncLocal<int> asyncLocal4 = new AsyncLocal<int>();

[Benchmark(OperationsPerInvoke = 4000)]
public void Update()
{
    for (int i = 0; i < 1000; i++)
    {
        asyncLocal1.Value++;
        asyncLocal2.Value++;
        asyncLocal3.Value++;
        asyncLocal4.Value++;
    }
}
Method Toolchain Mean Ratio Allocated Alloc Ratio
Update \main\corerun.exe 68.71 ns 1.00 176 B 1.00
Update \pr\corerun.exe 60.96 ns 0.87 144 B 0.82

@stephentoub stephentoub added this to the 7.0.0 milestone May 2, 2022
@ghost ghost assigned stephentoub May 2, 2022
@ghost
Copy link

ghost commented May 2, 2022

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

Issue Details

We previously special-cased up to three active AsyncLocals in a given async flow, but it seems four is also very common. Special-casing four as well results in four using ~20% less allocation and ~10% less CPU overhead in a simple mutation test.

private AsyncLocal<int> asyncLocal1 = new AsyncLocal<int>();
private AsyncLocal<int> asyncLocal2 = new AsyncLocal<int>();
private AsyncLocal<int> asyncLocal3 = new AsyncLocal<int>();
private AsyncLocal<int> asyncLocal4 = new AsyncLocal<int>();

[Benchmark(OperationsPerInvoke = 4000)]
public void Update()
{
    for (int i = 0; i < 1000; i++)
    {
        asyncLocal1.Value++;
        asyncLocal2.Value++;
        asyncLocal3.Value++;
        asyncLocal4.Value++;
    }
}
Method Toolchain Mean Ratio Allocated Alloc Ratio
Update \main\corerun.exe 68.71 ns 1.00 176 B 1.00
Update \pr\corerun.exe 60.96 ns 0.87 144 B 0.82
Author: stephentoub
Assignees: -
Labels:

area-System.Threading, tenet-performance

Milestone: 7.0.0

@stephentoub stephentoub requested a review from kouvel May 4, 2022 02:51
@Silvenga
Copy link
Contributor

Silvenga commented May 4, 2022

Completely out of the blue, we actually looked into AsyncLocal performance, since the AsyncLocal was becoming a bottleneck for us. I wrote this statement a couple days ago, after our investigation:

In real world profiling of ASP.NET Core applications, at least 4 items are generally stored in the async locals. This suggests that the MultiElementAsyncLocalValueMap type is most commonly used.

This is awesome!

We previously special-cased up to three active AsyncLocals in a given async flow, but it seems four is also very common.  Special-casing four as well results in four using ~20% less allocation and ~10% less CPU overhead.
@stephentoub stephentoub merged commit 16b6369 into dotnet:main May 5, 2022
@stephentoub
Copy link
Member Author

Thanks for reviewing, @kouvel.

@stephentoub stephentoub deleted the asynclocal4 branch May 5, 2022 01:11
@davidfowl
Copy link
Member

Activity, HttpContextAccessor, Logging scope, what's the 4th 😄. I need to go look...

@ghost ghost locked as resolved and limited conversation to collaborators Jun 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants