Skip to content

Conversation

@davidwrighton
Copy link
Member

  • Fix dynamic pgo (my previous change had a typo missing an asterisk.
  • Add a smoke test for the dynamic pgo logic which runs a small test with tiered pgo enable, and verifies that the test compiles an interesting method twice, and that the method does not cause the runtime to crash. This test is intended as a checkin test to ensure that basic functionality involving dynamic pgo does not crash the runtime

- Hopefully in the future we can provide an event or something to validate that the actual pgo is working
@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 3, 2021
Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible the test may fail in specialized outerloop tests, not sure how some of the options we set there (eg JITMinOpts) interact with the ones we set here.

<CLRTestPriority>0</CLRTestPriority>
<DebugType>PdbOnly</DebugType>
<Optimize>True</Optimize>
<UnloadabilityIncompatible>true</UnloadabilityIncompatible>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<UnloadabilityIncompatible>true</UnloadabilityIncompatible>

@AndyAyersMS
Copy link
Member

Schema now makes it to Tier1 ok but I am seeing odd looking count data. Will try and debug.

@AndyAyersMS
Copy link
Member

Eg for my test case (similar to yours):

-----------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd                 weight          IBC  lp [IL range]     [jump]      [EH region]         [flags]
-----------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                           31746.      31746    [000..003)-> BB03 ( cond )                     IBC 
BB02 [0001]  1                             0             0    [003..00F)                                     rare IBC 
BB03 [0002]  2                           2155174k 2155174400    [00F..014)-> BB05 ( cond )                     IBC 
BB04 [0003]  1                           32767.      32767    [014..020)                                     IBC 
BB05 [0004]  2                           1965993k 1965992960    [020..021)        (return)                     IBC 

Counts should be 1000 - 3000 here.

@AndyAyersMS
Copy link
Member

Need to add in the count offset when copying the count data. Also probably only copy 32 bits at a time, unless you've padded the length to 64 somehow (in my case we're copying 20 bytes).

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[ignore this bit, see below]

        volatile size_t*pSrc = (volatile size_t*)found->header.GetData();

should be something like

        volatile uint32_t*pSrc = (volatile uint32_t*)(found->header.GetData() + found->header.countsOffset);

@AndyAyersMS
Copy link
Member

Hmm, maybe not - were you planning on the jit compensating for the offset?

@AndyAyersMS
Copy link
Member

More likely you shouldn't subtract off the header length up above when computing instrumentationDataSize.

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Feb 3, 2021

Not subtracting the length seems to do the trick...

-----------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd                 weight     IBC  lp [IL range]     [jump]      [EH region]         [flags]
-----------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                           44803. 44803    [000..003)-> BB03 ( cond )                     IBC 
BB02 [0001]  1                             0        0    [003..00F)                                     rare IBC 
BB03 [0002]  2                           44803. 44803    [00F..014)-> BB05 ( cond )                     IBC 
BB04 [0003]  1                           14935. 14935    [014..020)                                     IBC 
BB05 [0004]  2                           44803. 44803    [020..021)        (return)                     IBC 
-----------------------------------------------------------------------------------------------------------------------------------------

@AndyAyersMS
Copy link
Member

diff --git a/src/coreclr/vm/pgo.cpp b/src/coreclr/vm/pgo.cpp
index fe062ef5b59..6581a5ee3c6 100644
--- a/src/coreclr/vm/pgo.cpp
+++ b/src/coreclr/vm/pgo.cpp
@@ -1061,7 +1061,7 @@ HRESULT PgoManager::getPgoInstrumentationResultsInstance(MethodDesc* pMD, BYTE**
         if (schemaArray.GetCount() > 0)
         {
             auto lastSchema = schemaArray[schemaArray.GetCount() - 1];
-            instrumentationDataSize = lastSchema.Offset + lastSchema.Count * InstrumentationKindToSize(lastSchema.InstrumentationKind) - found->header.countsOffset;
+            instrumentationDataSize = lastSchema.Offset + lastSchema.Count * InstrumentationKindToSize(lastSchema.InstrumentationKind);
         }
         *pAllocatedData = new BYTE[schemaDataSize + instrumentationDataSize];
         *ppSchema = (ICorJitInfo::PgoInstrumentationSchema*)*pAllocatedData;
@@ -1069,10 +1069,11 @@ HRESULT PgoManager::getPgoInstrumentationResultsInstance(MethodDesc* pMD, BYTE**
         memcpy(*pAllocatedData, schemaArray.OpenRawBuffer(), schemaDataSize);
         schemaArray.CloseRawBuffer();

-        size_t* pInstrumentationDataDst = (size_t*)((*pAllocatedData) + schemaDataSize);
-        size_t* pInstrumentationDataDstEnd = (size_t*)((*pAllocatedData) + schemaDataSize + instrumentationDataSize);
+        int32_t* pInstrumentationDataDst = (int32_t*)((*pAllocatedData) + schemaDataSize);
+        int32_t* pInstrumentationDataDstEnd = (int32_t*)((*pAllocatedData) + schemaDataSize + instrumentationDataSize);
         *pInstrumentationData = (BYTE*)pInstrumentationDataDst;
-        volatile size_t*pSrc = (volatile size_t*)found->header.GetData();
+        volatile int32_t*pSrc = (volatile int32_t*)(found->header.GetData());
+
         // Use a volatile memcpy to copy the instrumentation data into a temporary buffer
         // This allows the instrumentation data to be made stable for reading during the execution of the jit
         // and since the copy moves through a volatile pointer, there will be no tearing of individual data elements

@davidwrighton
Copy link
Member Author

It isn't safe to copy 32bits at a time if there might be pointers such as typehandles in there. It introduces a pointer tearing risk, which may cause access violations. Instead I've made everything size_t aligned, and fixed the offset problem by not adding in the extra offset data. I'm also going to remove the smoke test from this PR as it seems to be surprisingly unreliable. I'll work on that separately to try and understand why.

@davidwrighton davidwrighton changed the title Fix dynamic pgo and add smoke test Fix dynamic pgo Feb 3, 2021
@AndyAyersMS
Copy link
Member

It isn't safe to copy 32bits at a time if there might be pointers such as typehandles in there

Makes sense.

I'm also going to remove the smoke test

Sad to see the test go, but I understand. Wonder if its because the CI machines are so heavily loaded...?

@davidwrighton davidwrighton merged commit 40b0915 into dotnet:master Feb 4, 2021
@AndyAyersMS
Copy link
Member

The jit-experimental runs had failures in jitpgo_classes legs for both Windows and Linux, which look like possible corruption in the class profile data:

;; managed\\Compilation\\Compilation\\Compilation.cmd

Assert failure(PID 4428 [0x0000114c], Thread: 10156 [0x27ac]): Consistency check failed: AV in clr at this callstack:
----
CORECLR! TypeHandle::IsRestored_NoLogging + 0xC (0x00007ffd9bd3c504)
CORECLR! TypeHandle::Verify + 0x1D (0x00007ffd9bd3e0a9)
CORECLR! CEEInfo::resolveVirtualMethodHelper + 0x156 (0x00007ffd9bc4f5da)
CORECLR! CEEInfo::resolveVirtualMethod + 0x16C (0x00007ffd9bc4f40c)
CLRJIT! Compiler::impDevirtualizeCall + 0xB6B (0x00007ffdbfdd9aeb)

@AndyAyersMS
Copy link
Member

Also count data looks to be off too -- there is no loop in this method.

BBnum BBid ref try hnd                 weight         IBC  lp [IL range]     [jump]      [EH region]         [flags]
-----------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                           42107k  42106892    [000..003)-> BB03 ( cond )                     IBC 
BB02 [0001]  1                           402786k 402785792    [003..00F)                                     IBC 
BB03 [0002]  2                           402787k 402786816    [00F..014)-> BB05 ( cond )                     IBC 
BB04 [0003]  1                            8372k   8371972    [014..020)                                     IBC 
BB05 [0004]  2                           23512.     23512    [020..021)        (return)                     IBC 
-----------------------------------------------------------------------------------------------------------------------------------------

@AndyAyersMS
Copy link
Member

A bit more detail on bad count data... not sure where the fault is yet. Looks like the schema entry offsets shift between instrumentation and reconstruction? If so, we need to adjust the copy to start past the header.

;; Tier0
Instrumentation data base address is 00007FFF68AA6A98

inc dword ptr [(reloc 0x7fff68aa6aa8)] // offset 0x10
inc dword ptr [(reloc 0x7fff68aa6aac)]
inc dword ptr [(reloc 0x7fff68aa6ab0)]

;; Runtime (looks like it copies the header, not the data)

PGO: copying 16 bytes from 00007FFF68AA6A98 to 000002B9A84524F8 ..

Base Schema[0] offset is 0, counter value at 00007FFF68AA6A98 is 42141214
Base Schema[1] offset is 4, counter value at 00007FFF68AA6A9C is 571612674
Base Schema[2] offset is 8, counter value at 00007FFF68AA6AA0 is 1092096546

;; Tier1

Have profile data: 3 schema records (schema at 000002B9A84524B0, data at 000002B9A84524F8)
Profile summary: 1 runs, 0 block probes, 3 edge probes, 0 class profiles, 0 other records

Reconstructing block counts from sparse edge instrumentation
... adding known edge BB02 -> BB03: weight 42141216.000000
... adding known edge BB04 -> BB05: weight 571612672.000000
... adding known edge BB05 -> BB01: weight 1092096512.000000

@ghost ghost locked as resolved and limited conversation to collaborators Mar 7, 2021
@davidwrighton davidwrighton deleted the dynamic_pgo_fix branch April 20, 2021 17:45
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants