Skip to content

Conversation

@damageboy
Copy link

@damageboy damageboy commented Oct 8, 2020

The documentation re. Interlocked.Read for 64-bit reads sized reads on 64-bit systems is simply not correct.
There is no known architecture in existence where a 64-bit value that spans across two cache lines can be read without the use of atomic operations.

For example, consult Intel's Software Development Manual, Volume 3A which clearly states on section 8.1.1:

Accesses to cacheable memory that are split across cache lines and page boundaries are not guaranteed to be
atomic by the Intel Core 2 Duo, Intel® Atom™, Intel Core Duo, Pentium M, Pentium 4, Intel Xeon, P6 family,
Pentium, and Intel486 processors. The Intel Core 2 Duo, Intel Atom, Intel Core Duo, Pentium M, Pentium 4, Intel
Xeon, and P6 family processors provide bus control signals that permit external memory subsystems to make split
accesses atomic; however, nonaligned data accesses will seriously impact the performance of the processor and
should be avoided

Atomic operations are required for such cases, and those cases (split cache-line access) while uncommon in .NET are not impossible.

Summary

Describe your changes here.

Fixes #Issue_Number (if available)

The documentation re. `Interlocked.Read` for 64-bit reads sized reads on 64-bit systems is simply not correct.
There is known architecture in existence where a 64-bit value that spans across two cache lines can be read without the use of atomic operations.

For example, consult [Intel's Software Development Manual, Volume 3A](https://software.intel.com/content/dam/develop/public/us/en/documents/253668-sdm-vol-3a.pdf) which clearly states on section 8.1.1:

 > Accesses to cacheable memory that are split across cache lines and page boundaries are not guaranteed to be
atomic by the Intel Core 2 Duo, Intel® Atom™, Intel Core Duo, Pentium M, Pentium 4, Intel Xeon, P6 family,
Pentium, and Intel486 processors. The Intel Core 2 Duo, Intel Atom, Intel Core Duo, Pentium M, Pentium 4, Intel
Xeon, and P6 family processors provide bus control signals that permit external memory subsystems to make split
accesses atomic; however, nonaligned data accesses will seriously impact the performance of the processor and
should be avoided

Atomic operations are required for such cases, and those cases (split cache-line access) while uncommon in .NET are not impossible.
@opbld32
Copy link

opbld32 commented Oct 8, 2020

Docs Build status updates of commit 6211bfb:

✅ Validation status: passed

File Status Preview URL Details
xml/System.Threading/Interlocked.xml ✅Succeeded View

For more details, please refer to the build report.

Note: Broken links written as relative paths are included in the above build report. For broken links written as absolute paths or external URLs, see the broken link report.

For any questions, please:

@gfoidl
Copy link
Member

gfoidl commented Oct 9, 2020

while uncommon in .NET are not impossible.

E.g. with such a struct?

[StructLayout(LayoutKind.Explicit)]
public struct Bar
{
    [FieldOffset(0)]
    public byte A;

    [FieldOffset(1)]
    public long B;

    [FieldOffset(9)]
    public byte C;

    [FieldOffset(10)]
    public long D;
}

B and D aren't aligned to a 8 byte boundary anymore, the size of the struct is 24 bytes so it may cross cache lines.

@damageboy
Copy link
Author

while uncommon in .NET are not impossible.

E.g. with such a struct?

[StructLayout(LayoutKind.Explicit)]
public struct Bar
{
    [FieldOffset(0)]
    public byte A;

    [FieldOffset(1)]
    public long B;

    [FieldOffset(9)]
    public byte C;

    [FieldOffset(10)]
    public long D;
}

B and D aren't aligned to a 8 byte boundary anymore, the size of the struct is 24 bytes so it may cross cache lines.

Yeap. This used to be the more common way of getting to this sort of situation, but with the advent of .NET Span you have new ways of getting to this sort of situation.

@opbld30
Copy link

opbld30 commented Oct 9, 2020

Docs Build status updates of commit 979e162:

✅ Validation status: passed

File Status Preview URL Details
xml/System.Threading/Interlocked.xml ✅Succeeded View

For more details, please refer to the build report.

Note: Broken links written as relative paths are included in the above build report. For broken links written as absolute paths or external URLs, see the broken link report.

For any questions, please:

@gewarren gewarren requested a review from kouvel October 21, 2020 21:33
@kouvel
Copy link
Contributor

kouvel commented Oct 22, 2020

I believe interlocked operations on 64-bit memory locations have a requirement for the memory location to be aligned to 8 bytes, so they cannot cross a cache line boundary. I don't think there is a way to do atomic operations on unaligned 64-bit memory locations.

@damageboy
Copy link
Author

I believe interlocked operations on 64-bit memory locations have a requirement for the memory location to be aligned to 8 bytes, so they cannot cross a cache line boundary. I don't think there is a way to do atomic operations on unaligned 64-bit memory locations.

None of the above is true for Intel / x86 architectures:

  1. 8 byte and cache-line alignment are unrelated (although I do understand you intention that some non-aligned 8-byte addresses will also be un-aligned to a 64 bytes cache line)
  2. Cross-cache-line atomic are allowed, albeit at a significant performance drop
  3. Intra cache-line misaligned atomics are allowed and carry no performance degradation.

Reference Intel docs / text explaining (1)-(3) was linked in the original comment at the top of this issue.

@kouvel
Copy link
Contributor

kouvel commented Dec 9, 2020

True, but that doesn't seem to be the case for ARM architectures, where load/store-exclusive instructions seem to require alignment in many if not all cases, with perhaps a stricter requirement on the size of the locked region (within which unaligned accesses may be allowed) than the size of a cache line. The interlocked read prevents tearing on 32-bit systems (similarly to Volatile.Read(ref Int64), which would also have the same alignment constraints on 32-bit processors that don't allow interlocked unaligned accesses). The operation offers stronger memory ordering than both a basic aligned read and an aligned Volatile.Read(ref Int64) even on 64-bit systems. Instead of suggesting when usage of the API is not necessary I think it would be more useful to say something like:

  • Some systems may require the memory location to be aligned to 8 bytes, so for compatibility the method should not be used on references that may not be aligned as such.

Maybe some more guidance would be useful, suggestions?

@damageboy
Copy link
Author

That's something different then... 👍

All I'm saying is that this PR, specifically, corrects a false statement:

...method is unnecessary on 64-bit systems, because 64-bit

Into a correct one:

...method is only necessary on 64-bit systems when the location refers...

I think that the issue of alignment is more of a conceptual issue that is probably best addressed in general in some conceptual-level documentation of the Interlocked class...

Maybe something such as a "See Also" link from this page pointing to a dedicated page that discusses alignment, can even (gasp!) link to the actual relevant documentation per supported architecture in .NET (e.g. x86 + arm) and give the proper context...

Makes sense?

@kouvel
Copy link
Contributor

kouvel commented Dec 9, 2020

Yea maybe the general guidance should be in the Interlocked class docs with links to it from method docs and maybe in Volatile class docs as well. For the purpose of correcting the incorrect info, that sounds fine to me, but I suggest changing method is only necessary ... when to method is necessary ... when, as the only seems to imply that that's the only case where it may be necessary.

@jeffhandley
Copy link
Member

Hi @damageboy; I think there's a piece of feedback left to address here. Would you like to update the PR for approval?

Thanks!

Base automatically changed from master to main March 5, 2021 20:52
@kouvel kouvel added the doc-enhancement Improve the current content label Aug 24, 2021
@damageboy damageboy closed this by deleting the head repository May 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-System.Threading doc-enhancement Improve the current content

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants