Skip to content

Conversation

@kunalspathak
Copy link
Contributor

Just to see the TP impact

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Nov 9, 2023
@ghost ghost assigned kunalspathak Nov 9, 2023
@ghost
Copy link

ghost commented Nov 9, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Just to see the TP impact

Author: kunalspathak
Assignees: kunalspathak
Labels:

area-CodeGen-coreclr

Milestone: -

//
uint32_t BitOperations::PopCount(unsigned __int128 value)
{
return BitOperations::PopCount(static_cast<uint64_t>(value));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return BitOperations::PopCount(static_cast<uint64_t>(value));
return BitOperations::PopCount(static_cast<uint64_t>(value >> 64)) + BitOperations::PopCount(static_cast<uint64_t>(value));

Shouldn't this check all the bits?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should. Currently, since we just use 64-bits, I want to quickly prototype and see the TP impact of changing the regMaskTP data type to 128-bits. I will be cleaning few things here and other places (e.g. BitScanForward) before marking it "ready for review".

@kunalspathak
Copy link
Contributor Author

After spending lot of time investigating the failures, here is what I found:

  • There are no failures when the arm64 binaries are compiled natively on arm64 machine.
  • The failures only repros when cross compiled binary of arm64 i.e. libclrjit_universal_arm64_x64.so are used. We use them during compiling System.Private.CoreLib.dll using crossgen2 in the CI (and hence the failures) or when using libclrjit_universal_arm64_x64.so in tpdiff pipeline. The reason for failures is that __int128 assumes that the memory is 16-byte aligned and hence the compiler generates movaps instruction instead of movups. I tried to objdump libclrjit_universal_arm64_x64.so before / after changes and we can see the difference:
image When we segfault, I see the memory is not 16-byte aligned: image

So it seems that this might not be a possible option, but i will keep digging on alternatives.

On side note, if we decide to just enable something on linux without having a native windows msvc support, we will have to add leg in superpmi-replay (which verifies libclrjit.so and libclrjit_universal_arm64_x64.so). Currently, superpmi-replay pipeline only verifies on binaries cross-compiled by windows.

@jakobbotsch
Copy link
Member

When we segfault, I see the memory is not 16-byte aligned

This sounds like a JIT bug. The JIT's allocator needs to guarantee 16 byte alignment for types with alignof(x) == 16. Sounds like it doesn't do that.

@kunalspathak
Copy link
Contributor Author

kunalspathak commented Nov 17, 2023

When we segfault, I see the memory is not 16-byte aligned

This sounds like a JIT bug. The JIT's allocator needs to guarantee 16 byte alignment for types with alignof(x) == 16. Sounds like it doesn't do that.

yes, we have this today that could change.

    // Ensure that we always allocate in pointer sized increments.
    size = roundUp(size, sizeof(size_t));

But from what I am discovering, this will impact memory consumption a lot. E.g. GenTree has regMaskSmall field and hence its size on arm64 will increase from 88 bytes to 96 bytes and likewise for several other data structures. So I am reconsidering if this is a good solution or not.

@jakobbotsch
Copy link
Member

But from what I am discovering, this will impact memory consumption a lot. E.g. GenTree has regMaskSmall field and hence its size on arm64 will increase from 88 bytes to 96 bytes and likewise for several other data structures. So I am reconsidering if this is a good solution or not

Makes sense... Yeah, it does sound expensive. On a side note having such a mask in GenTree seems like a good place to try to regain some memory if we can move it somewhere else... Sounds like we're paying a lot of memory for something we only rarely use.

@kunalspathak
Copy link
Contributor Author

having such a mask in GenTree seems like a good place to tr

Exactly my thought when I noticed gtRsvdRegs. I will see if I can move it around or replace with something cheaper.

@kunalspathak
Copy link
Contributor Author

I will work on a different idea of splitting regMaskTP in regMaskInt, regMaskFloat, etc.

@github-actions github-actions bot locked and limited conversation to collaborators Dec 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants