-
Notifications
You must be signed in to change notification settings - Fork 6.1k
Note for torn-reads in EventCounters #20984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
TABS instead of spaces.
|
/cc @dotnet/dotnet-diag |
| DisplayRateTimeScale = TimeSpan.FromSeconds(1) | ||
| }; | ||
| ``` | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this might better belong with the code above at line 111.
Volatile.Read might be the better default sample code. As it probably affects weakly ordered architectures as well as 32-bit architectures.
I wonder if _requestCount itself should be marked volatile. Does that work fix the issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_requestCountitself should be marked volatile. Does that work fix the issue?
I don't think so, and on 32-bit torn-reads, which we want to avoid, are still a concern.
There's an interesting discusssion in npgsql/npgsql#3223 whether Volatile.Read is sufficient or not in contrast to Interlocked.Read.
the better default sample code
Isn't the default code sample to show "how it's not done"?
Maybe it should be changed so that the example shows a correct reference implementation and then the text points out for what to watch for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I quickly read through the conversation in npgsql/npgsql#3223. Semantically Interlocked.Read seems more explicit, The Volatile anti-tearing seems more like a side-effect. Given policy of non-breaking changes it is likely to stay a permanent side-effect though.
I would have expected an Interlocked<T> C# generic type which mandated interlocked access, so it could be used in member variables.... Perhaps I'm too optimistic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any objection or downside to Interlocked.Read? Interlocked would be my preference. The conversation made it sound like "Interlocked is clearly documented to do what we want, Volatile.Read also technically works but relies on more obscure detail." I didn't notice any rationale why we'd prefer using Volatile given the choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any rationale why we'd prefer using Volatile
I think this goes on my hat. I've chosen it because on x64 it produces so nice asm 😉
But now I'm not sure if this choice was a good one and I'm clueless hoping for a conclusion which can be applied on "all" uses for the polling-counter.
As background:
I needed to implement an event counter, looked at the docs and searched for references. The docs lacked that point -- hence this PR -- and in ASP.NET Core the reads were just plain reads, but there was an issue to fix this.
Fast forward, we landed here, and I'd like to have the docs show a good and proper reference implementation which can be copied by folks without further thinking whether it's correct or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
I've chosen it because on x64 it produces so nice asm
These event counter callbacks are invoked at most once a second (per counter). I would expect any sub-microsecond optimization will have no measurable impact on app performance and we can prioritize coding style that has clear intent.
I'd like to have the docs show a good and proper reference implementation which can be copied by folks without further thinking whether it's correct or not
I think Interlocked.Read makes a clear and correct default implementation with neglible downsides - that is what I'd recommend we use in the sample.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to Interlocked.Read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prioritize coding style that has clear intent.
Absolutely, especially in the docs.
no measurable impact on app performance
Agree.
The following is more a theoretical digression than real observation.
It's just about to help me understand how things work or if I have a failure in my mental model so that it can be corrected.
Let's assume a high-perf server with lots of requests (e.g. kestrel) running on x64.
For each request a long field in the event source object will be incremented by Interlocked.Increment, thus lock add qword ptr [rcx], 1 will be executed. The lock-prefix ensures that the cpu has exclusive ownership of the cache line in play for the duration of the operation (MESI-protocol and variants of it), and it turns the instruction into a full memory barrier (mfence).
Now the timer (1s interval) for the event counter triggers, so the field is read by Interlocked.Read, which is lock cmpxchg [rcx], rdx -- it's not only a read, but a read-modify-write. From Intel Developer manual "The processor never produces a locked read without also producing a locked write." and also quote here.
So, same as before, exclusive ownership of the cache line is ensured.
This also means that during this read no other write can happen -- only before or after the instruction -- and so a increment at the same time will be "blocked", also the caller of the increment which will ultimately "block" the processing of that request.
So being on x64 on we don't care if the snapshot taken from the counter is the most to up to date value (we don't need memory barriers to ensure the freshest value), a ordinary load mov rax, [rcx+8] would be the least disruptive operation.
Notes:
- "block" is in quotes, as I really wouldn't say blocking as it's just a mini-fraction of time. But I don't any better word therefore
- I'm aware that this won't show up in RPS or any other measure -- at least it will be within noise, so negligible
- I don't want contrast this with Volatile or put any argument pro or contra a variant disussed here -- it's only for understanding as written above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any specific question. I don't see any gross error in your description.
Coming from the weakly ordered side of the micro-architectural world, I would replace the word "blocked" with "ordered". When the lock cmpxchg executes as a read or a write it is guaranteed to write, so it must take exclusive ownership of the cache line. So its write is ordered with respect to other writes/reads. Effectively the writes/reads are strongly ordered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the confirmation 👍, that's what I was looking for with the previous comment.
And for the better wording with "ordered".
noahfalk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
longfor_requestCountwhich is more realisitc than usingintinta torn read can happen on 32-bit architectures, so added a note thereforeFor background see e.g. dotnet/aspnetcore#26630