Skip to content
This repository was archived by the owner on Nov 1, 2020. It is now read-only.

Conversation

@MichalStrehovsky
Copy link
Member

When we unified the implementations of these across all of our runtimes, we lost all size optimizations people have been doing on the Project N side over the past six years.

This restores a bit of the loss. For one sample app with lots of async usage, this removes 2.1 MB of generic instantiations.

There is more we can do, but I can't spend time on that right now. These two things jumped out on me when I was looking at it back in December and were an easy fix I wanted to do for a while.

When we unified the implementations of these across all of our runtimes, we lost all size optimizations people have been doing on the Project N side over the past six years.

This restores a bit of the loss. For one sample app with lots of async usage, this removes 2.1 MB of generic instantiations.

There is more we can do, but I can't spend time on that right now. These two things jumped out on me when I was looking at it back in December and were an easy fix I wanted to do for a while.
@jkotas
Copy link
Member

jkotas commented Jan 29, 2019

BTW: A few callbacks have been converted from explicit ones to lamdas in dotnet/coreclr#22172 . Hopefully, this cleanup is fine since it is in non-generic type.

@jkotas
Copy link
Member

jkotas commented Jan 29, 2019

@jaredpar The Roslyn creates dummy singleton helper objects for "static" lambdas. I assume that it is a micro-optimization to avoid overhead of shuffle thunks. Do you happen to know the PR or piece of code that introduced this optimization?

We have been wondering whether the Roslyn policy on when to produce these dummy static singletons should be revisited since it adds significant static overhead to frequently instantiated generics.

{
/// <summary>Delegate used to invoke on an ExecutionContext when passed an instance of this box type.</summary>
private static readonly ContextCallback s_callback = s =>
private static readonly ContextCallback s_callback = ExecutionContextCallback;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried to move the lambda to the body? This forces static ctor which looks like is not a common path here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This forces static ctor which looks like is not a common path here.

This callback is common path. Unless you've explicitly suppressed ExecutionContext flow with ExecutionContext.SuppressFlow(), this callback will be used every time the async state machine resumes after an await.

Copy link
Member

@jkotas jkotas Jan 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, before-field-init static ctors are executed lazily in CoreCLR so they pose no problem there. IIRC, it is different in Mono - before-field-init static ctors are eager in Mono that makes use of the before-field-init as a startup and static footprint optimization ineffective.

@jkotas jkotas merged commit 325f6b9 into dotnet:master Jan 29, 2019
@MichalStrehovsky MichalStrehovsky deleted the taskSize branch January 29, 2019 21:58
@jaredpar
Copy link
Member

@jkotas

I assume that it is a micro-optimization to avoid overhead of shuffle thunks. Do you happen to know the PR or piece of code that introduced this optimization?

This change was done long before Roslyn went OSS. Hence you're unlikely to find it in our history anywhere.

We have been wondering whether the Roslyn policy on when to produce these dummy static singletons should be revisited since it adds significant static overhead to frequently instantiated generics.

@VSadov originally did this change hence he may have more feedback here. But my understannding is that the savings were pretty significant for Roslyn + VS when the change was made.

@VSadov
Copy link
Member

VSadov commented Jan 30, 2019

Yes. this is a very old change by now. It was prior Roslyn v1.0
It achieves 2 goals as I remember.

  • improves throughput of lambdas at the cost of some space (metadata, instantiations).
    Basically a space vs. speed tradeoff. As I remember the speed gain was significant. Especially on small lambdas, which is common in Linq and similar frameworks.
  • enables caching of generic lambdas. Prior to that we would allocate a new delegate every time for those. That was expensive. People were doing unnatural tricks to avoid mixing generics and lambdas on hot paths.
    As I remember @stephentoub simplified more than one place where this was happening in FX after the change went in.

@stephentoub
Copy link
Member

As I remember @stephentoub simplified more than one place where this was happening in FX after the change went in.

I didn't remember that the ability to cache generic lambdas was tied to that. We do rely on the generic lambda caching in a large number of places; those could be unwound, but it results in writing code in a fairly ugly / pre-lambda manner.

@VSadov
Copy link
Member

VSadov commented Jan 30, 2019

I can imagine using different lowering strategy if we had -Os switch (as in for "code size")

There could be other places where codegen could be less generic and/or more compact (possible at the cost of some throughput) - anonymous types, async stack spilling....

@VSadov
Copy link
Member

VSadov commented Jan 31, 2019

@stephentoub generic lambda caching came naturally from the change since generic "static" lambdas are no longer backed by static delegates and there is a containing type to put the static cache field.
It basically just worked, but was a welcomed change. The primary goal was the throughput improvement.

It might be possible to untangle just generic lambda caching from the rest of the change, if needed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants