Note for torn-reads in EventCounters #20984

gfoidl · 2020-10-08T09:36:07Z

updated the sample code to use long for _requestCount which is more realisitc than using int
for int a torn read can happen on 32-bit architectures, so added a note therefore

For background see e.g. dotnet/aspnetcore#26630

…sample code

TABS instead of spaces.

sdmaclea · 2020-10-08T18:13:36Z

/cc @dotnet/dotnet-diag

sdmaclea · 2020-10-08T18:20:33Z

docs/core/diagnostics/event-counters.md

+    DisplayRateTimeScale = TimeSpan.FromSeconds(1)
+};
+```
+


I wonder if this might better belong with the code above at line 111.

Volatile.Read might be the better default sample code. As it probably affects weakly ordered architectures as well as 32-bit architectures.

I wonder if _requestCount itself should be marked volatile. Does that work fix the issue?

_requestCount itself should be marked volatile. Does that work fix the issue?

I don't think so, and on 32-bit torn-reads, which we want to avoid, are still a concern.

There's an interesting discusssion in npgsql/npgsql#3223 whether Volatile.Read is sufficient or not in contrast to Interlocked.Read.

the better default sample code

Isn't the default code sample to show "how it's not done"?
Maybe it should be changed so that the example shows a correct reference implementation and then the text points out for what to watch for?

I quickly read through the conversation in npgsql/npgsql#3223. Semantically Interlocked.Read seems more explicit, The Volatile anti-tearing seems more like a side-effect. Given policy of non-breaking changes it is likely to stay a permanent side-effect though.

I would have expected an Interlocked<T> C# generic type which mandated interlocked access, so it could be used in member variables.... Perhaps I'm too optimistic.

Is there any objection or downside to Interlocked.Read? Interlocked would be my preference. The conversation made it sound like "Interlocked is clearly documented to do what we want, Volatile.Read also technically works but relies on more obscure detail." I didn't notice any rationale why we'd prefer using Volatile given the choice.

any rationale why we'd prefer using Volatile

I think this goes on my hat. I've chosen it because on x64 it produces so nice asm 😉
But now I'm not sure if this choice was a good one and I'm clueless hoping for a conclusion which can be applied on "all" uses for the polling-counter.

As background:
I needed to implement an event counter, looked at the docs and searched for references. The docs lacked that point -- hence this PR -- and in ASP.NET Core the reads were just plain reads, but there was an issue to fix this.
Fast forward, we landed here, and I'd like to have the docs show a good and proper reference implementation which can be copied by folks without further thinking whether it's correct or not.

Thanks!

I've chosen it because on x64 it produces so nice asm

These event counter callbacks are invoked at most once a second (per counter). I would expect any sub-microsecond optimization will have no measurable impact on app performance and we can prioritize coding style that has clear intent.

I'd like to have the docs show a good and proper reference implementation which can be copied by folks without further thinking whether it's correct or not

I think Interlocked.Read makes a clear and correct default implementation with neglible downsides - that is what I'd recommend we use in the sample.

Updated to Interlocked.Read.

prioritize coding style that has clear intent.

Absolutely, especially in the docs.

no measurable impact on app performance

Agree.

The following is more a theoretical digression than real observation.
It's just about to help me understand how things work or if I have a failure in my mental model so that it can be corrected.

Let's assume a high-perf server with lots of requests (e.g. kestrel) running on x64.

For each request a long field in the event source object will be incremented by Interlocked.Increment, thus lock add qword ptr [rcx], 1 will be executed. The lock-prefix ensures that the cpu has exclusive ownership of the cache line in play for the duration of the operation (MESI-protocol and variants of it), and it turns the instruction into a full memory barrier (mfence).

Now the timer (1s interval) for the event counter triggers, so the field is read by Interlocked.Read, which is lock cmpxchg [rcx], rdx -- it's not only a read, but a read-modify-write. From Intel Developer manual "The processor never produces a locked read without also producing a locked write." and also quote here.
So, same as before, exclusive ownership of the cache line is ensured.
This also means that during this read no other write can happen -- only before or after the instruction -- and so a increment at the same time will be "blocked", also the caller of the increment which will ultimately "block" the processing of that request.

So being on x64 on we don't care if the snapshot taken from the counter is the most to up to date value (we don't need memory barriers to ensure the freshest value), a ordinary load mov rax, [rcx+8] would be the least disruptive operation.

Notes:

"block" is in quotes, as I really wouldn't say blocking as it's just a mini-fraction of time. But I don't any better word therefore

I'm aware that this won't show up in RPS or any other measure -- at least it will be within noise, so negligible

I don't want contrast this with Volatile or put any argument pro or contra a variant disussed here -- it's only for understanding as written above

I don't see any specific question. I don't see any gross error in your description.

Coming from the weakly ordered side of the micro-architectural world, I would replace the word "blocked" with "ordered". When the lock cmpxchg executes as a read or a write it is guaranteed to write, so it must take exclusive ownership of the cache line. So its write is ordered with respect to other writes/reads. Effectively the writes/reads are strongly ordered.

Thanks for the confirmation 👍, that's what I was looking for with the previous comment.
And for the better wording with "ordered".

noahfalk

LGTM, thanks!

Added note for potential torn-reads to EventCounters and updated the …

b26ca23

…sample code

gfoidl requested review from a team and sdmaclea as code owners October 8, 2020 09:36

dotnet-bot added this to the October 2020 milestone Oct 8, 2020

dotnet-bot added the 📚 Area - .NET Core Guide label Oct 8, 2020

Fixed copy & paste error from snippet

e809e84

TABS instead of spaces.

sdmaclea reviewed Oct 8, 2020

View reviewed changes

This was referenced Oct 8, 2020

Fixed potential torn reads in EventCounters npgsql/npgsql#3223

Merged

Fixed potential torn reads in EventCounter glennc/BlazingMicroPizzas#2

Closed

Fixed potential torn reads in EventCounters apache/pulsar-dotpulsar#59

Merged

Use Interlocked.Read instead of Volatile.Read

bf25df6

noahfalk approved these changes Oct 9, 2020

View reviewed changes

This was referenced Oct 9, 2020

Add perf counters AzureAD/microsoft-identity-web#669

Merged

Fixed potential torn reads in EventCounters dotnet/aspnetcore#26630

Merged

IEvangelist merged commit c77eae3 into dotnet:master Oct 9, 2020

gfoidl deleted the event-counters branch October 9, 2020 19:09

gfoidl mentioned this pull request Oct 9, 2020

Fix potential torn reads by counters grpc/grpc-dotnet#1073

Merged

BillWagner added dotnet-core/prod and removed 📚 Area - .NET Core Guide labels Feb 10, 2021

Note for torn-reads in EventCounters #20984

Note for torn-reads in EventCounters #20984

Uh oh!

Conversation

gfoidl commented Oct 8, 2020

Uh oh!

sdmaclea commented Oct 8, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

noahfalk left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants