-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Fix dynamic pgo #47790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix dynamic pgo #47790
Conversation
davidwrighton
commented
Feb 3, 2021
- Fix dynamic pgo (my previous change had a typo missing an asterisk.
- Add a smoke test for the dynamic pgo logic which runs a small test with tiered pgo enable, and verifies that the test compiles an interesting method twice, and that the method does not cause the runtime to crash. This test is intended as a checkin test to ensure that basic functionality involving dynamic pgo does not crash the runtime
- Hopefully in the future we can provide an event or something to validate that the actual pgo is working
AndyAyersMS
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible the test may fail in specialized outerloop tests, not sure how some of the options we set there (eg JITMinOpts) interact with the ones we set here.
| <CLRTestPriority>0</CLRTestPriority> | ||
| <DebugType>PdbOnly</DebugType> | ||
| <Optimize>True</Optimize> | ||
| <UnloadabilityIncompatible>true</UnloadabilityIncompatible> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| <UnloadabilityIncompatible>true</UnloadabilityIncompatible> |
|
Schema now makes it to Tier1 ok but I am seeing odd looking count data. Will try and debug. |
|
Eg for my test case (similar to yours): Counts should be 1000 - 3000 here. |
|
Need to add in the count offset when copying the count data. Also probably only copy 32 bits at a time, unless you've padded the length to 64 somehow (in my case we're copying 20 bytes). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[ignore this bit, see below]
volatile size_t*pSrc = (volatile size_t*)found->header.GetData();
should be something like
volatile uint32_t*pSrc = (volatile uint32_t*)(found->header.GetData() + found->header.countsOffset);
|
Hmm, maybe not - were you planning on the jit compensating for the offset? |
|
More likely you shouldn't subtract off the header length up above when computing |
|
Not subtracting the length seems to do the trick... |
diff --git a/src/coreclr/vm/pgo.cpp b/src/coreclr/vm/pgo.cpp
index fe062ef5b59..6581a5ee3c6 100644
--- a/src/coreclr/vm/pgo.cpp
+++ b/src/coreclr/vm/pgo.cpp
@@ -1061,7 +1061,7 @@ HRESULT PgoManager::getPgoInstrumentationResultsInstance(MethodDesc* pMD, BYTE**
if (schemaArray.GetCount() > 0)
{
auto lastSchema = schemaArray[schemaArray.GetCount() - 1];
- instrumentationDataSize = lastSchema.Offset + lastSchema.Count * InstrumentationKindToSize(lastSchema.InstrumentationKind) - found->header.countsOffset;
+ instrumentationDataSize = lastSchema.Offset + lastSchema.Count * InstrumentationKindToSize(lastSchema.InstrumentationKind);
}
*pAllocatedData = new BYTE[schemaDataSize + instrumentationDataSize];
*ppSchema = (ICorJitInfo::PgoInstrumentationSchema*)*pAllocatedData;
@@ -1069,10 +1069,11 @@ HRESULT PgoManager::getPgoInstrumentationResultsInstance(MethodDesc* pMD, BYTE**
memcpy(*pAllocatedData, schemaArray.OpenRawBuffer(), schemaDataSize);
schemaArray.CloseRawBuffer();
- size_t* pInstrumentationDataDst = (size_t*)((*pAllocatedData) + schemaDataSize);
- size_t* pInstrumentationDataDstEnd = (size_t*)((*pAllocatedData) + schemaDataSize + instrumentationDataSize);
+ int32_t* pInstrumentationDataDst = (int32_t*)((*pAllocatedData) + schemaDataSize);
+ int32_t* pInstrumentationDataDstEnd = (int32_t*)((*pAllocatedData) + schemaDataSize + instrumentationDataSize);
*pInstrumentationData = (BYTE*)pInstrumentationDataDst;
- volatile size_t*pSrc = (volatile size_t*)found->header.GetData();
+ volatile int32_t*pSrc = (volatile int32_t*)(found->header.GetData());
+
// Use a volatile memcpy to copy the instrumentation data into a temporary buffer
// This allows the instrumentation data to be made stable for reading during the execution of the jit
// and since the copy moves through a volatile pointer, there will be no tearing of individual data elements |
|
It isn't safe to copy 32bits at a time if there might be pointers such as typehandles in there. It introduces a pointer tearing risk, which may cause access violations. Instead I've made everything size_t aligned, and fixed the offset problem by not adding in the extra offset data. I'm also going to remove the smoke test from this PR as it seems to be surprisingly unreliable. I'll work on that separately to try and understand why. |
Makes sense.
Sad to see the test go, but I understand. Wonder if its because the CI machines are so heavily loaded...? |
|
The jit-experimental runs had failures in jitpgo_classes legs for both Windows and Linux, which look like possible corruption in the class profile data: |
|
Also count data looks to be off too -- there is no loop in this method. |
|
A bit more detail on bad count data... not sure where the fault is yet. Looks like the schema entry offsets shift between instrumentation and reconstruction? If so, we need to adjust the copy to start past the header. ;; Tier0 inc dword ptr [(reloc 0x7fff68aa6aa8)] // offset 0x10 ;; Runtime (looks like it copies the header, not the data) PGO: copying 16 bytes from 00007FFF68AA6A98 to 000002B9A84524F8 .. Base Schema[0] offset is 0, counter value at 00007FFF68AA6A98 is 42141214 ;; Tier1 Have profile data: 3 schema records (schema at 000002B9A84524B0, data at 000002B9A84524F8) Reconstructing block counts from sparse edge instrumentation |