Stop using LIST nodes for PHI operand lists by mikedn · Pull Request #20266 · dotnet/coreclr

mikedn · 2018-10-04T21:36:53Z

Using GT_LIST nodes to represent PHI's (and pretty much any) variable length operand list is completely unnecessary and wastes memory and CPU cycles. Not to mention that GT_LIST nodes are basically a case of square peg in the round hole as far as IR semantics is concerned. There's no way to assign them a meaningful type, value number or any kind "value" meaning, they're syntax nodes in a world of values.

The "wasted CPU cycles" part is particularly bad in the case of PHIs. Every time a PHI arg is added, SsaBuilder calls gtSetStmtInfo and fgSetStmtSeq to sequence the whole tree, those are relatively expensive and the use of GT_LIST nodes just increases the cost as they're extra nodes and need special handling. PHI costs are always 0, the tree always the same shape so it would be pretty easy to just add each new PHI arg to the linear order and call it a day. But GT_LIST use manages to make that more complicated as list nodes themselves have to be added to the linear order as well (well, that's not strictly true - none of these nodes need to be part of the linear order but that's another story).

So stop using GT_LIST nodes and instead maintain a single linked list of "use" nodes. These aren't GenTree nodes, they're not part of the linear order, they use less memory. An alternative may be to use arrays instead of linked lists but I haven't starred long enough at SsaBuilder to tell if it would be possible to guess the proper array size before adding all the args. Anyway, I consider that to be tuning and right now I'm more interested to just make the IR saner. Using arrays can be tried at a later time, thanks to the use of range-based for loops the amount of changes should be far smaller.

It's worth noting that a lot of (most actually?) changes are required to deal with various IR traversal facilities the JIT has. That stuff got out of control - there are general traversal facilities such as GenTreeVisitor & co., GenTree::Operands(), GenTree::NumChildren/GetChild/gtGetChildPointer and then there a bunch more functions that decided to reinvent the wheel and do their own custom traversal - fgSetTreeSeqHelper, gtSetEvalOrder and many others. Hopefully I'll clean up some of this mess in the future.

This saves 0.17% used memory and 0.52% instructions:
Mem diff: https://gist.github.com/mikedn/75bde8ebf3c1acbbcce85df6f9090dad
PIN data: https://1drv.ms/x/s!Av4baJYSo5pjgsArIw8sxENXDXGulA

Contributes to #19876
Contributes to #15108

mikedn · 2019-01-04T07:25:40Z

src/jit/valuenum.cpp


-        GenTreeLclVarCommon* phiArg = phiArgs->Current()->AsLclVarCommon();
-        phiArgs                     = phiArgs->Rest();
+            phiArg->gtVNPair = phiArgVNP;


Old code failed to assign a VN to the first PHI arg in the list. As far as I can tell, this did not have any ill side effects, except for confusing JIT dumps.

mikedn · 2019-01-04T07:30:04Z

No FX diffs.

This is done but it will likely conflict with #15000 and #20045

BruceForstall · 2019-01-08T22:08:58Z

@dotnet/jit-contrib

sandreenko · 2019-01-09T19:05:12Z

src/jit/gentree.h

+        return UseList(gtUses);
+    }
+
+#if DEBUGGABLE_GENTREE


What is the purpose of DEBUGGABLE_GENTREE?

Every GenTree class needs a default constructor to keep GenTree::VtablePtr GenTree::GetVtableForOper(genTreeOps oper) happy. It declares variables of various GenTree types so it can get its hands on a vtable pointer to use when the class of a GenTree node is changed using to ChangeOper. Of course, it's all a rather crazy, non-standard, not-anything hack 😄

From a user perspective, it allows the VS debugger to show you the "real" subtype of a GenTree* variable, which is super useful. Maybe something that is done in a more standard way with RTTI?

I don't know, the standard way would be to use placement new to change the class of an object. But... I need to double check the standard... I think doing that will invalidate the existing object's contents and we don't want that when using ChangeOper.

Considering that this is debug only code I think it's fine as it is. Perhaps we should just add a macro to declare these default constructors on a single line, just to reduce debug noise.

One "problem" with DEBUGGABLE_GENTREE is it isn't available in Release, so not in production crash dumps, but making it available there was considered too expensive, and unnecessary.

but making it available there was considered too expensive, and unnecessary

Yeah, at the end of day, DEBUGGABLE_GENTREE is pure convenience. One can still figure out the class from gtOper, it's just more work to do so in the debugger. I suppose it would be cool if C++'s RTTI was somehow customizable but it isn't, it requires a vtable pointer to work.

Maybe consider adding an operand count? There may be places we'd prefer to avoid traversing large PHI lists.

sandreenko · 2019-01-09T19:09:05Z

src/jit/gentree.h

-    int       m_state;
+    union {
+        GenTree*         m_argList;
+        GenTreePhi::Use* m_phiUseList;


From the first look I expected m_phiUseList to be UseList.

More like UseIterator. I went back and forth on this but I think it needs to be a pointer due GenTreeUseEdgeIterator's == operator

return (m_node == other.m_node) && (m_edge == other.m_edge) && (m_argList == other.m_argList) && (m_state == other.m_state);

This won't work correctly unless the 2 union members have the same layout/meaning. And even then it's very close to, if not right into, the undefined behavior land.

I should add that in an ideal world GenTreePhi::gtUses should be private. But it seems that it requires quite a bit more work for not so many benefits, there are are only 3 remaining references to gtUses outside of GenTreePhi. At the end of the day I'm trying to make things reasonably better, not perfect.

This seems a bit dicey to me - at least it might be good to make this a named union, and at least somewhat abstract the comparison that is done by the == operator, so that it's clear that these must be "compatible" in this way.

Since you have the root node couldn't you check for the case where it's a PHI and handle it specially, say comparing the other union fields instead?

Yep, I need to do something about this, it's a bit odd.

Since you have the root node couldn't you check for the case where it's a PHI and handle it specially, say comparing the other union fields instead?

That would work but I'm concerned that this will make operator == even more complex than it is. Unfortunatelly, the C++ iterator model (and by extension range for) doesn't work so great when the state is complex. The repeated comparison with end can become costly.

I went for a more low tech approach - just use a void*. The few places that use this pointer can just cast to the right type.

src/jit/assertionprop.cpp

sandreenko

LGTM, pin data looks nice.
The only concern is that it looks like it will be hard to reuse this code for other cases in #19876, won't it?

mikedn · 2019-01-11T06:53:33Z

@sandreenko Sorry, I didn't notice you already approved this.

I pushed a commit to cleanup that redundant AsPhi.

The only concern is that it looks like it will be hard to reuse this code for other cases in #19876, won't it?

Some of it can be reused but it remains to be seen if it's not more trouble than it's worth. The GT_FIELD_LIST change looks very similar to this one, the difference is that its Use has some extra members:

struct Use
{
    GenTree*  op;
    Use*      next;
    unsigned  offset;
    var_types type;
};

So it has to be a different class but I hope that I can make both use classes inherit from a base GenTreeUse class that has op and next. And make UseList and UseIterator templates.

But for GT_HWINTRINSIC it's more like that I'll use an array of uses rather than a linked list. The reason is that most intrinsics have 1-3 operands and that can fit inside GenTree. Using a linked list would be overkill. Its use class does not need next, only op.

For calls I would expect using a linked list again, similar to PHI and FIELD_LIST. But then calls have 2 arg lists so things will be slightly different.

That said, the main goal is other form of reusal - such as fgSetTreeSeqHelper that can be rewritten to use GenTreeVisitor in like 10 lines of code. The less custom traversal code we have, the better. It's large and error prone. And every time a new special oper is added you have to update it.

CarolEidt

LGTM in general, with some comments.
Would like to see @AndyAyersMS and/or @briansull weigh in though

CarolEidt · 2019-01-10T00:22:01Z

src/jit/gentree.cpp


+        case GT_PHI:
+        {
+            GenTreePhi::UseIterator op1Use    = op1->AsPhi()->Uses().begin();


This could benefit from a short comment stating the nearly-obvious: that two phis are equal only if they have the same number of uses, and the uses are equal.

Might be useful to also mention that the order matters for equality, even though it's otherwise irrelevant. That's a bit peculiar but I don't think this code is ever used.

I moved this code to a GenTreePhi::Equals function, with the appropriate function header. I don't see the point of dumping all this logic in a giant switch. If we'd do this for all GenTree classes then this switch would become trivial and could possibility be generated with macros, like the AsX functions are.

CarolEidt · 2019-01-11T22:38:33Z

src/jit/gentree.h

-    int       m_state;
+    union {
+        GenTree*         m_argList;
+        GenTreePhi::Use* m_phiUseList;


This seems a bit dicey to me - at least it might be good to make this a named union, and at least somewhat abstract the comparison that is done by the == operator, so that it's clear that these must be "compatible" in this way.

CarolEidt · 2019-01-11T22:42:16Z

src/jit/gentree.h

+{
+    struct Use
+    {
+        GenTree* op;


I don't think we have a policy, guideline or even consensus around this, but I find node to be more meaningful than 'op', which at least in the JIT is sometimes used to mean an operator and sometimes an operand (and yes, we've got op1 and op2 all over the place, but still ...)

node sounds pretty good to me too, I'll update. Anyway, "use" and "operand" seem to overlap a bit.

Changed. I used the opportunity to also add a Node() functions, the JIT practice of accessing data members directly is rather dubious. I'd make m_node private but unfortunately there are lots of places that take its address. Hopefully the m_ prefix will steer people towards using Node() whenever possible 😄

CarolEidt · 2019-01-11T22:48:02Z

src/jit/gentree.h

    inline GenTree(genTreeOps oper, var_types type DEBUGARG(bool largeNode = false));
 };

+struct GenTreePhi : public GenTree


This needs a header with a general description of the node, as well as the constraints around how these are used, e.g.:

Each of the Uses are GT_LCL_VAR nodes

This node is always the rhs of an assignment, the lhs of which is also a GT_LCL_VAR node.

While I'm at it, I'll mention that I wanted to make op GenTreePhiArg* rather than GenTree*, to make it clear what node types are supported. Unfortunately there are many places that take the address of op and pass it around as GenTree** so this is impossible to do.

Added. While at it I also made the class final, it's good to know that nothing derives from such GenTree classes.

CarolEidt · 2019-01-11T22:50:32Z

src/jit/ssabuilder.cpp

+    // Make sure it isn't already present: we should only add each definition once.
+    for (GenTreePhi::Use& use : phi->Uses())
+    {
+        assert(use.op->AsPhiArg()->GetSsaNum() != ssaNum);


Perhaps also assert that it's not for the same pred?

Exposing my ignorance of JIT SSA here -- I am a little surprised we don't record the phi input from each pred block, rather than doing a search before adding a phi arg to see if a ssa def already is an input to the phi (or, say, keep a block list per phi input).

It is more operands in some cases but perhaps avoids quadratic builder behavior for large phis. (Speaking of which, do you have data on how large these lists get in bad cases?)

And if we just record the "first block" that brings in a value I wonder how RangeCheck::ComputeRange works correctly for PHIs as it seems like it would skip looking at some preds.

Speaking of which, do you have data on how large these lists get in bad cases?

Here you go: https://1drv.ms/x/s!Av4baJYSo5pjgsQZfywTf5l3NUXitw

Data from all FX assemblies. Largest PHI has 255 operands. 90% have only 2 operands.

I was actually considering handling the 2 operands case specially - store the 2 uses inside the GenTreePhi node itself. But the code required to handle this special case gets kind of ugly.

It is more operands in some cases but perhaps avoids quadratic builder behavior for large phis.

I was curious to see what such quadratic behavior means on today hardware: https://gist.github.com/mikedn/23e708655cdb1d431ed165e300192c9e

It's less than 1μs for N = 10, ~68μs for N = 255, ~1ms for N = 1000 and ~200ms for N = 10000.

I'd say we're on the safe side.

And if we just record the "first block" that brings in a value I wonder how RangeCheck::ComputeRange works correctly for PHIs as it seems like it would skip looking at some preds.

Indeed, that seems broken. If one predecessor asserts that i <= 2 and another asserts that i <= 3 the resulting PHI range should have upper bound 3. But if you miss the later predecessor then the upper bound will be only 2... Now, there may be something else in RangeCheck that prevents this from happening, though I starred a lot at RangeCheck in the past and I just don't see that something else.

Perhaps also assert that it's not for the same pred?

Nope, it turns out that this assert isn't valid due to exception handling. It is possible to have something like PHI BB05:V02:u2, BB05:V02:u3, BB05:V02:u4 when the same block in a try region contains more than one definition of the variable that's live in the handler.

It's the BB-SSA_NUM pair that's supposed to be unique in the PHI arg list. But since the JIT already requires SSA_NUM to be unique there's not much we can validate about the BB.

AndyAyersMS

Generally looks good. Left a few notes/questions....

AndyAyersMS · 2019-01-11T23:50:54Z

src/jit/ssabuilder.cpp

+    // Make sure it isn't already present: we should only add each definition once.
+    for (GenTreePhi::Use& use : phi->Uses())
+    {
+        assert(use.op->AsPhiArg()->GetSsaNum() != ssaNum);


Exposing my ignorance of JIT SSA here -- I am a little surprised we don't record the phi input from each pred block, rather than doing a search before adding a phi arg to see if a ssa def already is an input to the phi (or, say, keep a block list per phi input).

It is more operands in some cases but perhaps avoids quadratic builder behavior for large phis. (Speaking of which, do you have data on how large these lists get in bad cases?)

And if we just record the "first block" that brings in a value I wonder how RangeCheck::ComputeRange works correctly for PHIs as it seems like it would skip looking at some preds.

AndyAyersMS · 2019-01-12T00:07:09Z

src/jit/gentree.h

-    int       m_state;
+    union {
+        GenTree*         m_argList;
+        GenTreePhi::Use* m_phiUseList;


Since you have the root node couldn't you check for the case where it's a PHI and handle it specially, say comparing the other union fields instead?

AndyAyersMS · 2019-01-12T00:16:44Z

src/jit/gentree.h

+        return UseList(gtUses);
+    }
+
+#if DEBUGGABLE_GENTREE


Maybe consider adding an operand count? There may be places we'd prefer to avoid traversing large PHI lists.

mikedn · 2019-01-12T11:26:11Z

Made some updates but I'm still going through the feedback.

mikedn · 2019-01-15T18:24:48Z

Ubuntu CoreFX failed due to a test timeout

05:59:08.566 WARNING: System.Collections.Concurrent.Tests.EtwTests.TestEtw is running for 19613s.
05:59:08.566 WARNING: System.Collections.Concurrent.Tests.ConcurrentQueueTests.ToArray_ParallelInvocations_Succeed is running for 19613s.
05:59:08.566 WARNING: System.Collections.Concurrent.Tests.ConcurrentStackTests.ToArray_ParallelInvocations_Succeed is running for 19606s.
06:00:08.568 WARNING: System.Collections.Concurrent.Tests.ConcurrentBagTests.ManyConcurrentAddsTakes_EnsureTrackedCountsMatchResultingCollection(threadsPerProc: 1, seconds: 0.5) is running for 19657s.
06:00:08.568 WARNING: System.Collections.Concurrent.Tests.EtwTests.TestEtw is running for 19673s.
06:00:08.568 WARNING: System.Collections.Concurrent.Tests.ConcurrentQueueTests.ToArray_ParallelInvocations_Succeed is running for 19673s.
06:00:08.568 WARNING: System.Collections.Concurrent.Tests.ConcurrentStackTests.ToArray_ParallelInvocations_Succeed is running for 19666s.
06:00:28.900 Build timed out (after 360 minutes). Marking the build as aborted.

Apparently a few tests ran concurrently and hang. I don't get any diffs in either tests or framework so there's probably something wrong with tests.

@dotnet-bot test Ubuntu x64 Checked CoreFX Tests

mikedn · 2019-01-15T18:37:19Z

I think I addressed all review feedback with the exception of adding a use count to PHI. It's easy to do that, there's enough space in GenTreePhi and after this change the only point where PHI args are added is SsaBuilder::AddPhiArg.

But considering just how few PHIs have a large number of arguments adding the count seems premature. And there don't seem to be many places where we could skip large PHIs:

RangeCheck - This could be useful but RangeCheck has its own chase budget that prevents from becoming too costly in extreme circumstances
assertionprop's optAssertionGenPhiDefn could probably skip large PHIs, but it's not that costly
VN might be an interesting case. The VN of a PHI is a chain of VNF_Phi VNs and 255 of them is pretty crazy. Perhaps it would make sense to fallback to a unique VN for large PHIs. But that should be investigated separately. And again, considering that so few PHIs are large...

RussKeldorph · 2019-05-28T13:45:01Z

@dotnet/jit-contrib Take now or post-3.0?

AndyAyersMS · 2019-05-28T16:39:01Z

I think we should hold off on taking an refactoring changes into 3.0.

At this point I think we're down to bug fixes and relatively contained and low-risk CQ improvements.

mikedn · 2019-05-28T16:44:13Z

Also this is part of a series that will finish with the complete removal of GT_LIST so it's probably better and safer to have all or nothing in a release.

briansull · 2019-05-28T16:59:09Z

I think we should hold off on taking refactoring changes into 3.0.

I also agree, we can take all these changes post 3.0

Calling gtSetStmtInfo/fgSetStmtSeq every time an arg is added is rather expensive and not really needed. Costs are always 0 and the shape of the PHI tree is always the same. Removal of GT_LIST nodes makes it easier to update the linear order incrementally when a GT_PHI_ARG node is added.

This replaces the manual traversal of PHI's list of uses with a range-based for loop and cleans up a bunch of convoluted code.

CarolEidt

LGTM - thanks!

CarolEidt · 2019-08-27T16:52:59Z

@dotnet/jit-contrib - I think this is ready to merge. However, the "Pri0 Linux_musl x64 release" indicates that 1 test failed, and I can't find a link to the test output. Can someone advise?

mikedn · 2019-08-27T16:54:22Z

The failed test output should be here: https://dev.azure.com/dnceng/public/_build/results?buildId=326410&view=ms.vss-test-web.build-test-results-tab&runId=9520886&resultId=100000&paneView=attachments

mikedn · 2019-08-27T17:05:17Z

Best I can tell the test timed out after 10 minutes. History shows that it's run time is all over the place, from 1-2 minutes to 8-9 minutes. This kind of test to does not belong in Pri0 tests IMO...

CarolEidt · 2019-08-27T17:05:40Z

Thanks @mikedn! Given that it's just a timeout, I'll go ahead and merge unless there are objections from the rest of the team in the next couple of hours. cc @dotnet/jit-contrib

* Stop using LIST nodes for PHI operand lists * Speed up PHI creation Calling gtSetStmtInfo/fgSetStmtSeq every time an arg is added is rather expensive and not really needed. Costs are always 0 and the shape of the PHI tree is always the same. Removal of GT_LIST nodes makes it easier to update the linear order incrementally when a GT_PHI_ARG node is added. * Cleanup PHI value numbering This replaces the manual traversal of PHI's list of uses with a range-based for loop and cleans up a bunch of convoluted code. * Remove redundant AsPhi * Remove list pointer union * Update PHI equality check * Add GenTreePhi comment * Rename op to node Commit migrated from dotnet/coreclr@501c1e3

mikedn force-pushed the ssa-phi-arg-list branch 2 times, most recently from c98f3b8 to 2689a71 Compare October 6, 2018 08:10

mikedn force-pushed the ssa-phi-arg-list branch 4 times, most recently from d6f6ba1 to 3ed3e50 Compare October 13, 2018 09:09

mikedn mentioned this pull request Oct 16, 2018

Extract argInfo building from fgMorphArgs #19658

Merged

mikedn force-pushed the ssa-phi-arg-list branch from 3ed3e50 to 0cdf77e Compare November 3, 2018 09:33

jkotas added the area-CodeGen label Dec 22, 2018

mikedn force-pushed the ssa-phi-arg-list branch from 0cdf77e to 20083cb Compare December 23, 2018 10:17

mikedn force-pushed the ssa-phi-arg-list branch 2 times, most recently from cbb8fd3 to b321728 Compare January 3, 2019 12:15

mikedn commented Jan 4, 2019

View reviewed changes

mikedn changed the title ~~[WIP] Stop using LIST nodes for PHI operand lists~~ Stop using LIST nodes for PHI operand lists Jan 4, 2019

sandreenko reviewed Jan 9, 2019

View reviewed changes

sandreenko approved these changes Jan 9, 2019

View reviewed changes

CarolEidt reviewed Jan 11, 2019

View reviewed changes

AndyAyersMS reviewed Jan 12, 2019

View reviewed changes

mikedn force-pushed the ssa-phi-arg-list branch 2 times, most recently from 72c1bf0 to f13025b Compare January 14, 2019 19:00

mikedn added 2 commits August 27, 2019 01:02

Stop using LIST nodes for PHI operand lists

823c32f

mikedn force-pushed the ssa-phi-arg-list branch from f13025b to 24f424e Compare August 26, 2019 22:05

mikedn added 6 commits August 27, 2019 08:10

Cleanup PHI value numbering

e4524de

This replaces the manual traversal of PHI's list of uses with a range-based for loop and cleans up a bunch of convoluted code.

Remove redundant AsPhi

890332c

Remove list pointer union

1a3d1ff

Update PHI equality check

0aeb659

Add GenTreePhi comment

e305b56

Rename op to node

52cdaed

mikedn force-pushed the ssa-phi-arg-list branch from 24f424e to 52cdaed Compare August 27, 2019 07:01

CarolEidt approved these changes Aug 27, 2019

View reviewed changes

mikedn mentioned this pull request Aug 27, 2019

Stop using LIST nodes for CALL arg lists #26392

Merged

CarolEidt merged commit 501c1e3 into dotnet:master Aug 27, 2019

mikedn deleted the ssa-phi-arg-list branch September 28, 2019 19:09

Conversation

mikedn commented Oct 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mikedn commented Jan 4, 2019

Uh oh!

BruceForstall commented Jan 8, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sandreenko left a comment

Choose a reason for hiding this comment

Uh oh!

mikedn commented Jan 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CarolEidt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mikedn commented Oct 4, 2018 •

edited

Loading

mikedn commented Jan 11, 2019 •

edited

Loading

briansull commented May 28, 2019 •

edited

Loading