Fixed potential torn reads in EventCounters #3223

gfoidl · 2020-10-08T08:59:08Z

The backing fields for the counters are long, so there was potential torn read on 32-bit platforms.
Volatile.Read ensures that the value is read in a atomic way, so no torn reads can occur.

The backing fields for the counters are `long`, so there was potential torn read on 32-bit platforms. `Volatile.Read` ensures that the value is read in a atomic way, so no torn reads can occur.

YohDeadfall

Good catch! Thank you!

YohDeadfall · 2020-10-08T09:40:04Z

Cherry picked to 4.1.6 via 3db17ea.

roji · 2020-10-08T14:03:28Z

I'm not aware of any atomicity guarantee provided by Volatile.Read - as opposed to Interlocked.Read which very clearly documents the 32-bit torn value case:

The Read method is unnecessary on 64-bit systems, because 64-bit read operations are already atomic. On 32-bit systems, 64-bit read operations are not atomic unless performed using Read.

Do you have any resource explicitly pointing to Volatile.Read as a guard against torn values?

I did do a quick sharplab test to see what assembly gets emitted, and on .NET Framework 32-bit Volatile.Read indeed emits a call to Interlocked.CompareExchange, making it totally safe. Unfortunately, I don't know what happens under .NET Core 32-bit (sharplab doesn't support that). Another interesting point is that Interlocked.Read emits cmpxchgon 64-bit, presumably for 8-byte values crossing cache lines (which could get therefore get torn even on 64-bit systems, thanks @damageboy). This means we'd take a theoretical, slight perf hit, but the reading of these counters happens very infrequently, so this is insignificant.

YohDeadfall · 2020-10-08T14:10:06Z

It depends on the architecture (source):

        [Intrinsic]
        [NonVersionable]
        public static long Read(ref long location) =>
#if TARGET_64BIT
            (long)Unsafe.As<long, VolatileIntPtr>(ref location).Value;
#else
            // On 32-bit machines, we use Interlocked, since an ordinary volatile read would not be atomic.
            Interlocked.CompareExchange(ref location, 0, 0);
#endif

gfoidl · 2020-10-08T14:33:59Z

... and it's marked as [Intrinsic] so the JIT follows these "rules":

Platform	operation
x64	ordinary (volatile) read, as reading 64-bit values (`long`) in 64-bit processes is atomic by nature
x86	`Interlocked.CompareExchange` to make the read of the long atomic
arm	similar to the two above, additionally an memory barrier will be emitted as arm has a weak memory model as opposed to x86/x64

8-byte values crossing cache lines

I assume that the CPU's cache protocol won't allow torn-reads, but I can't say for 100% sure.

Under the assumption it can be torn, quite a lot of vector operations (say SSE 128bit data crossing a cache line or more often AVX 256bit data crossing the cache line boundary) would suffer from torn-reads too, and I'm not aware that this happens. If so, quite a lot of coreclr-code would be broken too.

gfoidl · 2020-10-08T14:59:36Z

The ECMA CLI spec I.12.6.6 (page 128) states

A conforming CLI shall guarantee that read and write access to properly aligned memory locations no larger than the native word size (the size of type native int) is atomic when all the write accesses to a location are the same size.

so a ordinary read of long in a 64-bit process should be safe if the CLI conforms.

This also means that "random 64-bit reads" aren't atomic by this guarantee. It's possible to allocate unmanaged memory and read non-atomically a long from there.
But when reading a long that is layed out in memory by the CLI (as it's the case here with the field of the class) follows the above rule, so it is properly aligned and the read safe.

roji · 2020-10-08T16:55:54Z

I agree that implementation-wise, Volatilre.Read seems to generate Interlocked.CompareExchange in x86 (although I've yet to see/test this actually happening on 32-bit .NET Core). The main question in my mind, is whether this behavior is specified or documented anywhere; coming from other languages, volatile does not necessarily have any sort of atomicity guarantees - it's about caching, staleness and lifting to register (which is also a form of caching). This is why I'd rather see Interlocked.Read for atomicity - at the very least, it documents what's being done better.

Re the ECMA CLI spec etc., I'm definitely not worried about 64-bit platforms here. As @damageboy helped me understand, as long as an 8-byte value fits within a cache line, operations on it are indeed guaranteed to be atomic; note, however, that for long values which straddle a cache line boundary, tearing may still occur, in which case Interlocked.Read can ensure atomicity, but, importantly, not Volatile.Read (that alone is a good reason to avoid Volatile.Read for atomicity). This doesn't concern us here because these long values are laid out by the CLI and are guaranteed to be aligned and within the cache line.

To sum it up, when the goal is to guarantee atomicity, I prefer using the API which expressly does that (Interlocked.Read), rather than Volatile.Read - even if implementation-wise it may be implemented atomically.

gfoidl · 2020-10-08T18:00:26Z

Disclaimer: I just want to share my thoughts, so if there's any misconception I'm happy to be corrected so I'll learn something.

volatile does not necessarily have any sort of atomicity guarantees

Yep, that's a different matter. I'd say volatile has nothing to do with atomicity, just as you pointed out.
Here it's about Volatile-class, where the JIT emits volatile (i.e. memory barriers) as needed, and for reads an atomic operation as it's implemented that way. It may be a bit misleading that its name is Volatile though (at least for me).

that for long values which straddle a cache line boundary, tearing may still occur

Following the ECMA spec in a strict sense, a torn-read of a long might also happen if the address isn't aligned to 8-byte boundary even if it doesn't cross a cache line boundary.
This can only ever happen using unsafe code and reading not from CLI created locations. I.e. long tmp = *(long*)anyPointer;
I don't believe there is any CPU out (at least no recent one), on which .NET can run, that might show up with this hypothetical (alignment of long not to 8-byte boundary and being within a cache line) torn-read.

that for long values which straddle a cache line boundary, tearing may still occur

This is consistence and make sense. Notes about this -- in general: reads that cross a cache line boundary -- are stated in the Intel Developer manual, and I assume arm as well other processor won't be any different.
Hence this is definitely a point to keep in mind (so thanks for pointing this out explicitely).

Side note: When looking at usages of *(long*) across GH-repos there are quite a lot, it looks like the majority doesn't respect cache lines, but it's OK as the memory location from which is read isn't accessed concurrently, so torn reads won't occur even if the reading thread will be interrupted.

roji · 2020-10-08T18:12:34Z

Yep, that's a different matter. I'd say volatile has nothing to do with atomicity, just as you pointed out.
Here it's about Volatile-class, where the JIT emits volatile (i.e. memory barriers) as needed, and for reads an atomic operation as it's implemented that way. It may be a bit misleading that its name is Volatile though (at least for me).

I totally agree. I originally thought Volatile.Read really doesn't help with tearing at all, and was surprised to see it emit CompareExchange in 32-bit. This seems like a .NET particularity.

Following the ECMA spec in a strict sense, a torn-read of a long might also happen if the address isn't aligned to 8-byte boundary even if it doesn't cross a cache line boundary. [...]

I agree that the words say that, and also that hardware-wise I don't think it's possible to produce tearing when the entire value is contained in a single cache-line. In that sense the ECMA wording may provide weaker guarantees than what the hardware actually provides.

Side note: When looking at usages of (long) across GH-repos there are quite a lot, it looks like the majority doesn't respect cache lines, but it's OK as the memory location from which is read isn't accessed concurrently, so torn reads won't occur even if the reading thread will be interrupted.

Yeah, to hit an actual issue you must both do unprotected concurrency (no locking/interlocked) and read unaligned data straddling cache line boundaries.

Following the conversation above, @gfoidl does it make sense for you to replace the Volatile.Read occurrences introduced by this PR with Interlocked.Exchange (and amend dotnet/efcore#22923)?

gfoidl · 2020-10-08T18:29:18Z

Following the conversation above

I'm "torn" back and forth.

It's a very interesting conversation, and so I'd say to use 'Interlocked.Read` makes it obvious what happens, especially as potential perf-benefits of Volatile won't show up here. So a change would be good.

On the other side dotnet/aspnetcore#26630 was done just before and there it got merged with Volatile.Read and I proposed dotnet/docs#20984 to update the sample-code over there.
So I'd like to wait if the diag-guys from .NET Core have any other insights to this which we didn't consider or if there's a general alignment on way or the other -- at least it would be nice to have all usages of the polling-counters consistent.

roji · 2020-10-08T18:47:25Z

I'm "torn" back and forth.

🤣

It's a very interesting conversation [...]

Agreed, I've also learned a lot from it - so thanks!

To be sure, I agree that this is largely about documentation/obviousness, and has little actual impact. Let's wait a bit as you suggest.

(@YohDeadfall what's your reason for preferring Volatile.Read over Interlocked.Read?)

YohDeadfall · 2020-10-08T19:18:35Z

There is a difference between these two method below for x64, but obviously the first one is faster. Can't provide how much because there is no latency specified by Intel in docs, but to be sure ask @damageboy.

public static void V(ref long x)
{
   Volatile.Read(ref x);
}

public static void I(ref long x)
{
    Interlocked.Read(ref x);
}

; Volatile(Int64 ByRef)
    L0000: mov rax, [rcx]
    L0003: ret

; Interlocked(Int64 ByRef)
    L0000: xor edx, edx
    L0002: xor eax, eax
    L0004: lock cmpxchg [rcx], rdx
    L0009: ret

Plus I'm aware of the implementation for a long time thankfully for Reference Source and IL Spy. Right now .NET is fully open and anyone may take a look at the source in case of uncertainty.

roji · 2020-10-08T21:59:22Z

There is a difference between these two method below for x64, but obviously the first one is faster.

Is that meaningful in any way given that the reading happens when the counter is polled (i.e. once per second)?

roji · 2020-10-08T21:59:49Z

Related PR to fix the docs on Interlocked.Read: dotnet/dotnet-api-docs#4962

YohDeadfall · 2020-10-08T22:17:57Z

Since the .NET team decided to use Interlocked in docs (dotnet/docs#20984 (comment)), there is something with which we can be aligned.

gfoidl · 2020-10-09T08:38:18Z

I'll send (another) PR to change this to Interlocked so we are consistent.

Thanks again for the intersting discussion about this topic!

YohDeadfall · 2020-10-09T13:09:38Z

FYI, the runtime uses Interlocked too (dotnet/runtime#37619), so we're on the right way with #3225.

Fixed potential torn reads in EventCounters

26b4459

The backing fields for the counters are `long`, so there was potential torn read on 32-bit platforms. `Volatile.Read` ensures that the value is read in a atomic way, so no torn reads can occur.

gfoidl requested review from YohDeadfall and roji as code owners October 8, 2020 08:59

YohDeadfall assigned gfoidl Oct 8, 2020

YohDeadfall added the bug label Oct 8, 2020

YohDeadfall added this to the 4.1.6 milestone Oct 8, 2020

YohDeadfall approved these changes Oct 8, 2020

View reviewed changes

YohDeadfall merged commit a570da9 into npgsql:main Oct 8, 2020

gfoidl deleted the patch-1 branch October 8, 2020 09:15

roji mentioned this pull request Oct 8, 2020

Fixed potential torn reads in EventCounters dotnet/efcore#22923

Merged

gfoidl mentioned this pull request Oct 8, 2020

Note for torn-reads in EventCounters dotnet/docs#20984

Merged

This was referenced Oct 9, 2020

Use Interlocked.Read instead of Volatile.Read for PollingCounters #3225

Merged

Fixed potential torn reads in EventCounters dotnet/aspnetcore#26630

Merged

Fixed potential torn reads in EventCounters #3223

Fixed potential torn reads in EventCounters #3223

Uh oh!

Conversation

gfoidl commented Oct 8, 2020

Uh oh!

YohDeadfall left a comment

Choose a reason for hiding this comment

Uh oh!

YohDeadfall commented Oct 8, 2020

Uh oh!

roji commented Oct 8, 2020

Uh oh!

YohDeadfall commented Oct 8, 2020

Uh oh!

gfoidl commented Oct 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gfoidl commented Oct 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

roji commented Oct 8, 2020

Uh oh!

gfoidl commented Oct 8, 2020

Uh oh!

roji commented Oct 8, 2020

Uh oh!

gfoidl commented Oct 8, 2020

Uh oh!

roji commented Oct 8, 2020

Uh oh!

YohDeadfall commented Oct 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

roji commented Oct 8, 2020

Uh oh!

roji commented Oct 8, 2020

Uh oh!

YohDeadfall commented Oct 8, 2020

Uh oh!

gfoidl commented Oct 9, 2020

Uh oh!

YohDeadfall commented Oct 9, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gfoidl commented Oct 8, 2020 •

edited

Loading

gfoidl commented Oct 8, 2020 •

edited

Loading

YohDeadfall commented Oct 8, 2020 •

edited

Loading