Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Conversation

@benaadams
Copy link
Member

@benaadams benaadams commented Apr 20, 2017

Trying again, follow up to #11095

Puts the Threadpool threads on the default context.

Less churn when moving between Default contexts (ExecutionContext.RunDefaultContext); with lighter context switching.

Also allows the opportunity to benefit from Finally Cloning by the Jit in RunDefaultContext.

Resolves https://github.com/dotnet/coreclr/issues/11126

@benaadams
Copy link
Member Author

benaadams commented Apr 20, 2017

Most of the EC calls for Kestrel on libuv take this path

Before (call count underlined)

After (call count underlined)

The 18 of the other 22 are timer callbacks; which work off timer thread or do an Unsafe queue to threadpool and other 4 are thread starts. (didn't want to mess in this area yet)

The Socket version and IO completion doesn't seem to be on the default context (or it would take a different path and doesn't call Run) so there are a higher percentage that are in the regular Run path.

/cc @jkotas @AndyAyersMS @stephentoub @davidfowl

@benaadams
Copy link
Member Author

Socket version

Upstream for the Run

@danmoseley danmoseley requested a review from stephentoub April 21, 2017 01:44
@stephentoub
Copy link
Member

What impact does this have on throughout?

If code calls TP.UQUWI and then changes the EC (e.g. changes an AsyncLocal), that'll poison the TP thread and knock subsequent work items off the fast path? And the slow path is now slower than the fast path today?

@benaadams
Copy link
Member Author

benaadams commented Apr 21, 2017

And the slow path is now slower than the fast path today?

Slow path is slightly faster than current as using the local Default in Restore means one less call to CORINFO_HELP_GETSHARED_GCSTATIC_BASE; but otherwise unchanged.

If code calls TP.UQUWI and then changes the EC (e.g. changes an AsyncLocal), that'll poison the TP thread and knock subsequent work items off the fast path?

Off the faster-path and only for that Dispatch quantum; will still use the previous default context fast path; albeit with one extra call. Slow path is unchanged.

However have fixed.

One speed bump is as there are more explicit default context threads ExecutionContext.Capture will check m_isFlowSuppressed before returning the context; but then it won't call CORINFO_HELP_GETSHARED_GCSTATIC_BASE to get the default, which it would do if it was null; so not sure how much that factors; but probably less than it never inlining.

I'd like to move more threads onto the default context, like the IO and explicitly call RunDefaultContext rather than Run; but I think that needs more caution.

Preliminary local testing is faster; my threadpool/task tests are slower but with CPU time in kernel greatly increased - which paradoxically means its faster shown through higher SpinWait contention - need to improve how they measure 😆

Will do further testing.

@benaadams
Copy link
Member Author

Feel a bit foolish, is easy to test by setting

set COMPlus_ThreadPool_ForceMinWorkerThreads=1
set COMPlus_ThreadPool_ForceMaxWorkerThreads=1

Then all contention measurement drops out

@benaadams
Copy link
Member Author

Full tests using single threadpool thread https://gist.github.com/benaadams/bf56dc4424ca6888aba178074cdae2b6

Highlights

                                                                             Parallelism
                                  Serial          2x         16x         64x        512x

Pre

QUWI No Queues (TP)              8.982 M     9.500 M     9.467 M     9.318 M     9.234 M
- Depth    2                     9.367 M     9.362 M     9.486 M     9.283 M     9.310 M
- Depth   16                     9.699 M     9.767 M     9.698 M     9.664 M     9.682 M
- Depth   64                     9.756 M     9.761 M     9.764 M     9.862 M     9.630 M
- Depth  512                     9.923 M     9.948 M     9.949 M     9.876 M     9.761 M

QUWI No Queues                  13.695 M    10.798 M     3.565 M     5.252 M     4.560 M
- Depth    2                     6.792 M     6.078 M     3.798 M     7.401 M     4.040 M
- Depth   16                     6.443 M     5.297 M     9.365 M     8.035 M     3.546 M
- Depth   64                     7.934 M     9.117 M     9.239 M     8.092 M     3.825 M
- Depth  512                     9.737 M     9.781 M     9.140 M     8.050 M     4.313 M

Post

QUWI No Queues (TP)             10.049 M    10.716 M    10.668 M    10.619 M    10.461 M
- Depth    2                    10.850 M    10.877 M    10.716 M    10.744 M    10.527 M
- Depth   16                    11.165 M    11.151 M    11.261 M    11.181 M    11.063 M
- Depth   64                    11.329 M    11.335 M    11.206 M    11.222 M    11.066 M
- Depth  512                    11.379 M    11.340 M    11.353 M    11.271 M    11.233 M

QUWI No Queues                  15.985 M    12.116 M     3.238 M     6.168 M     4.727 M
- Depth    2                     7.814 M     6.308 M     4.384 M     6.899 M     3.971 M
- Depth   16                     9.124 M     9.009 M     9.242 M     8.667 M     5.019 M
- Depth   64                     9.836 M    10.090 M    10.260 M     8.762 M     4.546 M
- Depth  512                    11.014 M    11.030 M    10.287 M     9.285 M     5.285 M

Pre

SubTask Chain Awaited (TP)       1.070 M     1.067 M     1.071 M     1.072 M     1.066 M
- Depth    2                     1.097 M     1.104 M     1.102 M     1.101 M     1.099 M
- Depth   16                     1.227 M     1.222 M     1.222 M     1.219 M     1.217 M
- Depth   64                     1.227 M     1.227 M     1.227 M     1.223 M     1.221 M
- Depth  512                     1.210 M     1.218 M     1.221 M     1.220 M     1.218 M

SubTask Chain Awaited            1.061 M     1.067 M     1.049 M     1.025 M   926.972 k
- Depth    2                     1.097 M     1.090 M     1.085 M     1.082 M   972.882 k
- Depth   16                     1.226 M     1.220 M     1.215 M     1.194 M     1.052 M
- Depth   64                     1.224 M     1.226 M     1.219 M     1.195 M     1.104 M
- Depth  512                     1.214 M     1.217 M     1.220 M     1.190 M     1.102 M

Post

SubTask Chain Awaited (TP)       1.112 M     1.120 M     1.117 M     1.115 M     1.122 M
- Depth    2                     1.139 M     1.136 M     1.135 M     1.138 M     1.136 M
- Depth   16                     1.242 M     1.243 M     1.245 M     1.246 M     1.245 M
- Depth   64                     1.243 M     1.241 M     1.245 M     1.240 M     1.237 M
- Depth  512                     1.232 M     1.232 M     1.235 M     1.235 M     1.238 M

SubTask Chain Awaited            1.113 M     1.082 M     1.103 M     1.087 M     1.023 M
- Depth    2                     1.138 M     1.138 M     1.130 M     1.110 M   955.189 k
- Depth   16                     1.247 M     1.245 M     1.239 M     1.224 M     1.119 M
- Depth   64                     1.246 M     1.236 M     1.235 M     1.220 M     1.083 M
- Depth  512                     1.236 M     1.233 M     1.231 M     1.213 M     1.080 M

Pre

Continuation Chain (TP)        753.935 k   747.613 k   746.931 k   744.462 k   745.137 k
- Depth    2                     1.226 M     1.223 M     1.215 M     1.214 M     1.225 M
- Depth   16                     2.867 M     2.876 M     2.836 M     2.829 M     2.800 M
- Depth   64                     3.321 M     3.297 M     3.303 M     3.281 M     3.176 M
- Depth  512                     3.456 M     3.473 M     3.480 M     3.436 M     3.439 M

Continuation Chain             749.259 k   748.858 k   737.790 k   732.280 k   683.638 k
- Depth    2                     1.222 M     1.225 M     1.199 M     1.181 M     1.047 M
- Depth   16                     2.821 M     2.842 M     2.708 M     2.631 M     2.121 M
- Depth   64                     3.294 M     3.295 M     3.171 M     3.097 M     2.237 M
- Depth  512                     3.488 M     3.521 M     3.400 M     3.306 M     2.891 M

Post

Continuation Chain (TP)        790.326 k   790.035 k   789.038 k   785.879 k   787.792 k
- Depth    2                     1.299 M     1.291 M     1.287 M     1.282 M     1.292 M
- Depth   16                     2.992 M     2.986 M     2.965 M     2.949 M     2.958 M
- Depth   64                     3.449 M     3.419 M     3.428 M     3.408 M     3.311 M
- Depth  512                     3.605 M     3.587 M     3.578 M     3.573 M     3.553 M

Continuation Chain             787.984 k   787.776 k   780.919 k   770.841 k   677.396 k
- Depth    2                     1.288 M     1.283 M     1.273 M     1.250 M     1.131 M
- Depth   16                     2.954 M     2.953 M     2.830 M     2.768 M     2.127 M
- Depth   64                     3.450 M     3.431 M     3.310 M     3.238 M     2.418 M
- Depth  512                     3.634 M     3.623 M     3.494 M     3.431 M     3.492 M

Pre

Yield Chain Awaited (TP)         1.953 M     1.919 M     1.919 M     1.901 M     1.885 M
- Depth    2                     2.348 M     2.349 M     2.328 M     2.330 M     2.328 M
- Depth   16                     3.027 M     2.999 M     2.990 M     2.982 M     3.006 M
- Depth   64                     3.107 M     3.099 M     3.092 M     3.172 M     3.044 M
- Depth  512                     3.175 M     3.111 M     3.191 M     3.133 M     3.115 M

Yield Chain Awaited              1.984 M     1.975 M     1.877 M     1.827 M     1.494 M
- Depth    2                     2.332 M     2.327 M     2.271 M     2.228 M     1.671 M
- Depth   16                     3.000 M     2.986 M     2.893 M     2.778 M     2.116 M
- Depth   64                     3.200 M     3.083 M     3.023 M     2.957 M     2.335 M
- Depth  512                     3.222 M     3.051 M     3.079 M     3.084 M     2.534 M

Post

Yield Chain Awaited (TP)         2.186 M     2.145 M     2.129 M     2.134 M     2.081 M
- Depth    2                     2.487 M     2.483 M     2.488 M     2.460 M     2.499 M
- Depth   16                     3.049 M     3.053 M     3.025 M     2.986 M     3.031 M
- Depth   64                     3.114 M     3.128 M     3.086 M     3.171 M     3.086 M
- Depth  512                     3.203 M     3.113 M     3.194 M     3.107 M     3.093 M

Yield Chain Awaited              2.209 M     2.203 M     2.099 M     2.023 M     1.665 M
- Depth    2                     2.476 M     2.466 M     2.398 M     2.369 M     1.855 M
- Depth   16                     3.022 M     3.024 M     2.937 M     2.810 M     2.175 M
- Depth   64                     3.156 M     3.093 M     3.030 M     3.023 M     2.555 M
- Depth  512                     3.148 M     3.055 M     3.104 M     3.156 M     2.464 M

Generally looks better

@benaadams
Copy link
Member Author

@dotnet-bot test Windows_NT x64 corefx_baseline

{
// Current thread passed as parameter to avoid extern current native thread lookup call
Debug.Assert(currentThread == Thread.CurrentThread);
// Default context passed as parameter to avoid static inializer checks in NGen'd code
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: inializer

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved


if (currentThread.ExecutionContext != defaultContext)
{
currentThread.ExecutionContext = defaultContext;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this call ExecutionContext.RestoreDefault as in UndoToDefault to check for AsyncLocal changes and raise events?

If so, then perhaps UndoToDefault could be renamed to SetDefaults and this overload with no parameters can call the one with parameters.

Copy link
Member Author

@benaadams benaadams Apr 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is only used for after running an Unsafe workitem and Dispatch when no user code is running

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this, the unsafe work item could change async locals and it seems this is the only place where we don't notify when restoring the context. For the Dispatch case where no user code is running, if the EC needs to be initialized to Default, could probably do that directly and even avoid checking the SynchronizationContext which should be null anyway at that point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently if Unsafe changed async locals you'd wouldn't get notification at the end of the Unsafe call but you'd get them at the start of every regular threadpool item execution and the values would live in a kinda limbo land.

Also you could never move back onto the DefaultContext as even if you clear the values it won't move you back onto Default #8227 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its the poisoning @stephentoub referred to in #11100 (comment) that I resolved with f358e02

Can revert it to allow poisoning by Unsafe which would permanently move it onto slower path but maintain current behaviour?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea that is broken. @stephentoub, thoughts on whether we should restore the context, and whether we should notify of async local changes upon restoring here? Not sure how important the Unsafe version is, but the same path applies to suppressed-flow contexts as well.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea that is broken

I was referring to your previous comment:

Currently if Unsafe changed async locals you'd wouldn't get notification at the end of the Unsafe call but you'd get them at the start of every regular threadpool item execution and the values would live in a kinda limbo land.

currentThread.ExecutionContext = defaultContext;

// For the purposes of dealing with context change, null counts as the default EC
if (previous != null)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this would call OnContextChanged when previous == defaultContext. Maybe this should be previous != defaultContext && previous != null.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is called by UndoToDefault when previous != default so its already implied that previous isn't default. Probably should add an additional asset at top.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, makes sense

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea an assert would be good, as it is exposed to future calls from elsewhere

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

internal static bool Dispatch()
{
// Put threadpool thread on default context
ExecutionContextSwitcher.SetDefaults();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary? Shouldn't each work item restore the context especially after these changes, or is that still missing somewhere?

Copy link
Member Author

@benaadams benaadams Apr 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Threadpool threads don't start on Default context as the VM creates them rather than the managed code; starts as null

Copy link
Member Author

@benaadams benaadams Apr 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though it should only need ever to be called once (on thread create); so maybe there is a better way of doing it?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the context be null to start with? Maybe ExecutionContextSwitcher.CurrentIsDefault should also check for null.

Copy link
Member Author

@benaadams benaadams Apr 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If its null then it needs to do a memory barrier assign to Default; so its once per Dispatch or once per loop in Dispatch (as it would get restored to null).

Copy link
Member Author

@benaadams benaadams Apr 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda agree; but the EC mix with default and flow suppressed is more complicated so it would need a greater set of changes.

e.g. For Capture for CurrentThread.ExecutionContext

EC == null => Default
EC != null && FlowSupressed => null
EC != null =>EC

So null means flow suppressed, unless you recapture when it means Default

Copy link
Member Author

@benaadams benaadams Apr 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e. could have a FlowSupressed context with null being Default instead; but that's probably a more dangerous change?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So null means flow suppressed, unless you recapture when it means Default

I didn't follow that. I think Capture could stay as is, if it returns null it means EC != null and FlowSuppressed == true. If Capture returns Default it means EC == null. A current flow-suppressed context means EC != null.

i.e. could have a FlowSupressed context with null being Default instead; but that's probably a more dangerous change?

I didn't follow that either, could you please elaborate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will try using null for default internally

@danmoseley
Copy link
Member

Hi @benaadams do you expect to take it further?

@benaadams
Copy link
Member Author

@danmosemsft yes, I need to invert what its doing as per @kouvel's suggestion (i.e. what null represents)

@danmoseley
Copy link
Member

@benaadams interested in continuing?...

@karelz
Copy link
Member

karelz commented Aug 30, 2017

@benaadams if you don't have time to work on it now, let's close the PR, until you have the time again ;)

@benaadams
Copy link
Member Author

Very interested; but swamped. Will close and revisit.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants