Skip to content

Conversation

@gfoidl
Copy link
Contributor

@gfoidl gfoidl commented Oct 8, 2020

The backing fields for the counters are long, so there was potential torn read on 32-bit platforms.
Volatile.Read ensures that the value is read in a atomic way, so no torn reads can occur.

The backing fields for the counters are `long`, so there was potential torn read on 32-bit platforms.
`Volatile.Read` ensures that the value is read in a atomic way, so no torn reads can occur.
@YohDeadfall YohDeadfall added the bug label Oct 8, 2020
@YohDeadfall YohDeadfall added this to the 4.1.6 milestone Oct 8, 2020
Copy link
Contributor

@YohDeadfall YohDeadfall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Thank you!

@YohDeadfall YohDeadfall merged commit a570da9 into npgsql:main Oct 8, 2020
@gfoidl gfoidl deleted the patch-1 branch October 8, 2020 09:15
@YohDeadfall
Copy link
Contributor

Cherry picked to 4.1.6 via 3db17ea.

@roji
Copy link
Member

roji commented Oct 8, 2020

I'm not aware of any atomicity guarantee provided by Volatile.Read - as opposed to Interlocked.Read which very clearly documents the 32-bit torn value case:

The Read method is unnecessary on 64-bit systems, because 64-bit read operations are already atomic. On 32-bit systems, 64-bit read operations are not atomic unless performed using Read.

Do you have any resource explicitly pointing to Volatile.Read as a guard against torn values?

I did do a quick sharplab test to see what assembly gets emitted, and on .NET Framework 32-bit Volatile.Read indeed emits a call to Interlocked.CompareExchange, making it totally safe. Unfortunately, I don't know what happens under .NET Core 32-bit (sharplab doesn't support that). Another interesting point is that Interlocked.Read emits cmpxchgon 64-bit, presumably for 8-byte values crossing cache lines (which could get therefore get torn even on 64-bit systems, thanks @damageboy). This means we'd take a theoretical, slight perf hit, but the reading of these counters happens very infrequently, so this is insignificant.

@YohDeadfall
Copy link
Contributor

It depends on the architecture (source):

        [Intrinsic]
        [NonVersionable]
        public static long Read(ref long location) =>
#if TARGET_64BIT
            (long)Unsafe.As<long, VolatileIntPtr>(ref location).Value;
#else
            // On 32-bit machines, we use Interlocked, since an ordinary volatile read would not be atomic.
            Interlocked.CompareExchange(ref location, 0, 0);
#endif

@gfoidl
Copy link
Contributor Author

gfoidl commented Oct 8, 2020

... and it's marked as [Intrinsic] so the JIT follows these "rules":

Platform operation
x64 ordinary (volatile) read, as reading 64-bit values (long) in 64-bit processes is atomic by nature
x86 Interlocked.CompareExchange to make the read of the long atomic
arm similar to the two above, additionally an memory barrier will be emitted as arm has a weak memory model as opposed to x86/x64

8-byte values crossing cache lines

I assume that the CPU's cache protocol won't allow torn-reads, but I can't say for 100% sure.

Under the assumption it can be torn, quite a lot of vector operations (say SSE 128bit data crossing a cache line or more often AVX 256bit data crossing the cache line boundary) would suffer from torn-reads too, and I'm not aware that this happens. If so, quite a lot of coreclr-code would be broken too.

@gfoidl
Copy link
Contributor Author

gfoidl commented Oct 8, 2020

The ECMA CLI spec I.12.6.6 (page 128) states

A conforming CLI shall guarantee that read and write access to properly aligned memory locations no larger than the native word size (the size of type native int) is atomic when all the write accesses to a location are the same size.

so a ordinary read of long in a 64-bit process should be safe if the CLI conforms.

This also means that "random 64-bit reads" aren't atomic by this guarantee. It's possible to allocate unmanaged memory and read non-atomically a long from there.
But when reading a long that is layed out in memory by the CLI (as it's the case here with the field of the class) follows the above rule, so it is properly aligned and the read safe.

@roji
Copy link
Member

roji commented Oct 8, 2020

I agree that implementation-wise, Volatilre.Read seems to generate Interlocked.CompareExchange in x86 (although I've yet to see/test this actually happening on 32-bit .NET Core). The main question in my mind, is whether this behavior is specified or documented anywhere; coming from other languages, volatile does not necessarily have any sort of atomicity guarantees - it's about caching, staleness and lifting to register (which is also a form of caching). This is why I'd rather see Interlocked.Read for atomicity - at the very least, it documents what's being done better.

Re the ECMA CLI spec etc., I'm definitely not worried about 64-bit platforms here. As @damageboy helped me understand, as long as an 8-byte value fits within a cache line, operations on it are indeed guaranteed to be atomic; note, however, that for long values which straddle a cache line boundary, tearing may still occur, in which case Interlocked.Read can ensure atomicity, but, importantly, not Volatile.Read (that alone is a good reason to avoid Volatile.Read for atomicity). This doesn't concern us here because these long values are laid out by the CLI and are guaranteed to be aligned and within the cache line.

To sum it up, when the goal is to guarantee atomicity, I prefer using the API which expressly does that (Interlocked.Read), rather than Volatile.Read - even if implementation-wise it may be implemented atomically.

@gfoidl
Copy link
Contributor Author

gfoidl commented Oct 8, 2020

Disclaimer: I just want to share my thoughts, so if there's any misconception I'm happy to be corrected so I'll learn something.


volatile does not necessarily have any sort of atomicity guarantees

Yep, that's a different matter. I'd say volatile has nothing to do with atomicity, just as you pointed out.
Here it's about Volatile-class, where the JIT emits volatile (i.e. memory barriers) as needed, and for reads an atomic operation as it's implemented that way. It may be a bit misleading that its name is Volatile though (at least for me).

that for long values which straddle a cache line boundary, tearing may still occur

Following the ECMA spec in a strict sense, a torn-read of a long might also happen if the address isn't aligned to 8-byte boundary even if it doesn't cross a cache line boundary.
This can only ever happen using unsafe code and reading not from CLI created locations. I.e. long tmp = *(long*)anyPointer;
I don't believe there is any CPU out (at least no recent one), on which .NET can run, that might show up with this hypothetical (alignment of long not to 8-byte boundary and being within a cache line) torn-read.

that for long values which straddle a cache line boundary, tearing may still occur

This is consistence and make sense. Notes about this -- in general: reads that cross a cache line boundary -- are stated in the Intel Developer manual, and I assume arm as well other processor won't be any different.
Hence this is definitely a point to keep in mind (so thanks for pointing this out explicitely).

Side note: When looking at usages of *(long*) across GH-repos there are quite a lot, it looks like the majority doesn't respect cache lines, but it's OK as the memory location from which is read isn't accessed concurrently, so torn reads won't occur even if the reading thread will be interrupted.

@roji
Copy link
Member

roji commented Oct 8, 2020

Yep, that's a different matter. I'd say volatile has nothing to do with atomicity, just as you pointed out.
Here it's about Volatile-class, where the JIT emits volatile (i.e. memory barriers) as needed, and for reads an atomic operation as it's implemented that way. It may be a bit misleading that its name is Volatile though (at least for me).

I totally agree. I originally thought Volatile.Read really doesn't help with tearing at all, and was surprised to see it emit CompareExchange in 32-bit. This seems like a .NET particularity.

Following the ECMA spec in a strict sense, a torn-read of a long might also happen if the address isn't aligned to 8-byte boundary even if it doesn't cross a cache line boundary. [...]

I agree that the words say that, and also that hardware-wise I don't think it's possible to produce tearing when the entire value is contained in a single cache-line. In that sense the ECMA wording may provide weaker guarantees than what the hardware actually provides.

Side note: When looking at usages of (long) across GH-repos there are quite a lot, it looks like the majority doesn't respect cache lines, but it's OK as the memory location from which is read isn't accessed concurrently, so torn reads won't occur even if the reading thread will be interrupted.

Yeah, to hit an actual issue you must both do unprotected concurrency (no locking/interlocked) and read unaligned data straddling cache line boundaries.

Following the conversation above, @gfoidl does it make sense for you to replace the Volatile.Read occurrences introduced by this PR with Interlocked.Exchange (and amend dotnet/efcore#22923)?

@gfoidl
Copy link
Contributor Author

gfoidl commented Oct 8, 2020

Following the conversation above

I'm "torn" back and forth.

It's a very interesting conversation, and so I'd say to use 'Interlocked.Read` makes it obvious what happens, especially as potential perf-benefits of Volatile won't show up here. So a change would be good.

On the other side dotnet/aspnetcore#26630 was done just before and there it got merged with Volatile.Read and I proposed dotnet/docs#20984 to update the sample-code over there.
So I'd like to wait if the diag-guys from .NET Core have any other insights to this which we didn't consider or if there's a general alignment on way or the other -- at least it would be nice to have all usages of the polling-counters consistent.

@roji
Copy link
Member

roji commented Oct 8, 2020

I'm "torn" back and forth.

🤣

It's a very interesting conversation [...]

Agreed, I've also learned a lot from it - so thanks!

To be sure, I agree that this is largely about documentation/obviousness, and has little actual impact. Let's wait a bit as you suggest.

(@YohDeadfall what's your reason for preferring Volatile.Read over Interlocked.Read?)

@YohDeadfall
Copy link
Contributor

YohDeadfall commented Oct 8, 2020

There is a difference between these two method below for x64, but obviously the first one is faster. Can't provide how much because there is no latency specified by Intel in docs, but to be sure ask @damageboy.

public static void V(ref long x)
{
   Volatile.Read(ref x);
}

public static void I(ref long x)
{
    Interlocked.Read(ref x);
}
; Volatile(Int64 ByRef)
    L0000: mov rax, [rcx]
    L0003: ret

; Interlocked(Int64 ByRef)
    L0000: xor edx, edx
    L0002: xor eax, eax
    L0004: lock cmpxchg [rcx], rdx
    L0009: ret

Plus I'm aware of the implementation for a long time thankfully for Reference Source and IL Spy. Right now .NET is fully open and anyone may take a look at the source in case of uncertainty.

@roji
Copy link
Member

roji commented Oct 8, 2020

There is a difference between these two method below for x64, but obviously the first one is faster.

Is that meaningful in any way given that the reading happens when the counter is polled (i.e. once per second)?

@roji
Copy link
Member

roji commented Oct 8, 2020

Related PR to fix the docs on Interlocked.Read: dotnet/dotnet-api-docs#4962

@YohDeadfall
Copy link
Contributor

Since the .NET team decided to use Interlocked in docs (dotnet/docs#20984 (comment)), there is something with which we can be aligned.

@gfoidl
Copy link
Contributor Author

gfoidl commented Oct 9, 2020

I'll send (another) PR to change this to Interlocked so we are consistent.

Thanks again for the intersting discussion about this topic!

@YohDeadfall
Copy link
Contributor

FYI, the runtime uses Interlocked too (dotnet/runtime#37619), so we're on the right way with #3225.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants