Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Stop using LIST nodes for PHI operand lists#20266

Merged
CarolEidt merged 8 commits intodotnet:masterfrom
mikedn:ssa-phi-arg-list
Aug 27, 2019
Merged

Stop using LIST nodes for PHI operand lists#20266
CarolEidt merged 8 commits intodotnet:masterfrom
mikedn:ssa-phi-arg-list

Conversation

@mikedn
Copy link

@mikedn mikedn commented Oct 4, 2018

Using GT_LIST nodes to represent PHI's (and pretty much any) variable length operand list is completely unnecessary and wastes memory and CPU cycles. Not to mention that GT_LIST nodes are basically a case of square peg in the round hole as far as IR semantics is concerned. There's no way to assign them a meaningful type, value number or any kind "value" meaning, they're syntax nodes in a world of values.

The "wasted CPU cycles" part is particularly bad in the case of PHIs. Every time a PHI arg is added, SsaBuilder calls gtSetStmtInfo and fgSetStmtSeq to sequence the whole tree, those are relatively expensive and the use of GT_LIST nodes just increases the cost as they're extra nodes and need special handling. PHI costs are always 0, the tree always the same shape so it would be pretty easy to just add each new PHI arg to the linear order and call it a day. But GT_LIST use manages to make that more complicated as list nodes themselves have to be added to the linear order as well (well, that's not strictly true - none of these nodes need to be part of the linear order but that's another story).

So stop using GT_LIST nodes and instead maintain a single linked list of "use" nodes. These aren't GenTree nodes, they're not part of the linear order, they use less memory. An alternative may be to use arrays instead of linked lists but I haven't starred long enough at SsaBuilder to tell if it would be possible to guess the proper array size before adding all the args. Anyway, I consider that to be tuning and right now I'm more interested to just make the IR saner. Using arrays can be tried at a later time, thanks to the use of range-based for loops the amount of changes should be far smaller.

It's worth noting that a lot of (most actually?) changes are required to deal with various IR traversal facilities the JIT has. That stuff got out of control - there are general traversal facilities such as GenTreeVisitor & co., GenTree::Operands(), GenTree::NumChildren/GetChild/gtGetChildPointer and then there a bunch more functions that decided to reinvent the wheel and do their own custom traversal - fgSetTreeSeqHelper, gtSetEvalOrder and many others. Hopefully I'll clean up some of this mess in the future.

This saves 0.17% used memory and 0.52% instructions:
Mem diff: https://gist.github.com/mikedn/75bde8ebf3c1acbbcce85df6f9090dad
PIN data: https://1drv.ms/x/s!Av4baJYSo5pjgsArIw8sxENXDXGulA

Contributes to #19876
Contributes to #15108

@mikedn mikedn force-pushed the ssa-phi-arg-list branch 2 times, most recently from c98f3b8 to 2689a71 Compare October 6, 2018 08:10
@mikedn mikedn force-pushed the ssa-phi-arg-list branch 4 times, most recently from d6f6ba1 to 3ed3e50 Compare October 13, 2018 09:09
@mikedn mikedn force-pushed the ssa-phi-arg-list branch 2 times, most recently from cbb8fd3 to b321728 Compare January 3, 2019 12:15

GenTreeLclVarCommon* phiArg = phiArgs->Current()->AsLclVarCommon();
phiArgs = phiArgs->Rest();
phiArg->gtVNPair = phiArgVNP;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old code failed to assign a VN to the first PHI arg in the list. As far as I can tell, this did not have any ill side effects, except for confusing JIT dumps.

@mikedn
Copy link
Author

mikedn commented Jan 4, 2019

No FX diffs.

This is done but it will likely conflict with #15000 and #20045

@mikedn mikedn changed the title [WIP] Stop using LIST nodes for PHI operand lists Stop using LIST nodes for PHI operand lists Jan 4, 2019
@BruceForstall
Copy link

@dotnet/jit-contrib

return UseList(gtUses);
}

#if DEBUGGABLE_GENTREE

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of DEBUGGABLE_GENTREE?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every GenTree class needs a default constructor to keep GenTree::VtablePtr GenTree::GetVtableForOper(genTreeOps oper) happy. It declares variables of various GenTree types so it can get its hands on a vtable pointer to use when the class of a GenTree node is changed using to ChangeOper. Of course, it's all a rather crazy, non-standard, not-anything hack 😄

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a user perspective, it allows the VS debugger to show you the "real" subtype of a GenTree* variable, which is super useful. Maybe something that is done in a more standard way with RTTI?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, the standard way would be to use placement new to change the class of an object. But... I need to double check the standard... I think doing that will invalidate the existing object's contents and we don't want that when using ChangeOper.

Considering that this is debug only code I think it's fine as it is. Perhaps we should just add a macro to declare these default constructors on a single line, just to reduce debug noise.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One "problem" with DEBUGGABLE_GENTREE is it isn't available in Release, so not in production crash dumps, but making it available there was considered too expensive, and unnecessary.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but making it available there was considered too expensive, and unnecessary

Yeah, at the end of day, DEBUGGABLE_GENTREE is pure convenience. One can still figure out the class from gtOper, it's just more work to do so in the debugger. I suppose it would be cool if C++'s RTTI was somehow customizable but it isn't, it requires a vtable pointer to work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe consider adding an operand count? There may be places we'd prefer to avoid traversing large PHI lists.

int m_state;
union {
GenTree* m_argList;
GenTreePhi::Use* m_phiUseList;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the first look I expected m_phiUseList to be UseList.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More like UseIterator. I went back and forth on this but I think it needs to be a pointer due GenTreeUseEdgeIterator's == operator

        return (m_node == other.m_node) && (m_edge == other.m_edge) && (m_argList == other.m_argList) &&
               (m_state == other.m_state);

This won't work correctly unless the 2 union members have the same layout/meaning. And even then it's very close to, if not right into, the undefined behavior land.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should add that in an ideal world GenTreePhi::gtUses should be private. But it seems that it requires quite a bit more work for not so many benefits, there are are only 3 remaining references to gtUses outside of GenTreePhi. At the end of the day I'm trying to make things reasonably better, not perfect.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit dicey to me - at least it might be good to make this a named union, and at least somewhat abstract the comparison that is done by the == operator, so that it's clear that these must be "compatible" in this way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you have the root node couldn't you check for the case where it's a PHI and handle it specially, say comparing the other union fields instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I need to do something about this, it's a bit odd.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you have the root node couldn't you check for the case where it's a PHI and handle it specially, say comparing the other union fields instead?

That would work but I'm concerned that this will make operator == even more complex than it is. Unfortunatelly, the C++ iterator model (and by extension range for) doesn't work so great when the state is complex. The repeated comparison with end can become costly.

I went for a more low tech approach - just use a void*. The few places that use this pointer can just cast to the right type.

Copy link

@sandreenko sandreenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pin data looks nice.
The only concern is that it looks like it will be hard to reuse this code for other cases in #19876, won't it?

@mikedn
Copy link
Author

mikedn commented Jan 11, 2019

@sandreenko Sorry, I didn't notice you already approved this.

I pushed a commit to cleanup that redundant AsPhi.

The only concern is that it looks like it will be hard to reuse this code for other cases in #19876, won't it?

Some of it can be reused but it remains to be seen if it's not more trouble than it's worth. The GT_FIELD_LIST change looks very similar to this one, the difference is that its Use has some extra members:

struct Use
{
    GenTree*  op;
    Use*      next;
    unsigned  offset;
    var_types type;
};

So it has to be a different class but I hope that I can make both use classes inherit from a base GenTreeUse class that has op and next. And make UseList and UseIterator templates.

But for GT_HWINTRINSIC it's more like that I'll use an array of uses rather than a linked list. The reason is that most intrinsics have 1-3 operands and that can fit inside GenTree. Using a linked list would be overkill. Its use class does not need next, only op.

For calls I would expect using a linked list again, similar to PHI and FIELD_LIST. But then calls have 2 arg lists so things will be slightly different.

That said, the main goal is other form of reusal - such as fgSetTreeSeqHelper that can be rewritten to use GenTreeVisitor in like 10 lines of code. The less custom traversal code we have, the better. It's large and error prone. And every time a new special oper is added you have to update it.

Copy link

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general, with some comments.
Would like to see @AndyAyersMS and/or @briansull weigh in though


case GT_PHI:
{
GenTreePhi::UseIterator op1Use = op1->AsPhi()->Uses().begin();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could benefit from a short comment stating the nearly-obvious: that two phis are equal only if they have the same number of uses, and the uses are equal.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be useful to also mention that the order matters for equality, even though it's otherwise irrelevant. That's a bit peculiar but I don't think this code is ever used.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this code to a GenTreePhi::Equals function, with the appropriate function header. I don't see the point of dumping all this logic in a giant switch. If we'd do this for all GenTree classes then this switch would become trivial and could possibility be generated with macros, like the AsX functions are.

int m_state;
union {
GenTree* m_argList;
GenTreePhi::Use* m_phiUseList;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit dicey to me - at least it might be good to make this a named union, and at least somewhat abstract the comparison that is done by the == operator, so that it's clear that these must be "compatible" in this way.

{
struct Use
{
GenTree* op;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have a policy, guideline or even consensus around this, but I find node to be more meaningful than 'op', which at least in the JIT is sometimes used to mean an operator and sometimes an operand (and yes, we've got op1 and op2 all over the place, but still ...)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

node sounds pretty good to me too, I'll update. Anyway, "use" and "operand" seem to overlap a bit.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed. I used the opportunity to also add a Node() functions, the JIT practice of accessing data members directly is rather dubious. I'd make m_node private but unfortunately there are lots of places that take its address. Hopefully the m_ prefix will steer people towards using Node() whenever possible 😄

inline GenTree(genTreeOps oper, var_types type DEBUGARG(bool largeNode = false));
};

struct GenTreePhi : public GenTree

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a header with a general description of the node, as well as the constraints around how these are used, e.g.:

  • Each of the Uses are GT_LCL_VAR nodes
  • This node is always the rhs of an assignment, the lhs of which is also a GT_LCL_VAR node.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I'm at it, I'll mention that I wanted to make op GenTreePhiArg* rather than GenTree*, to make it clear what node types are supported. Unfortunately there are many places that take the address of op and pass it around as GenTree** so this is impossible to do.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added. While at it I also made the class final, it's good to know that nothing derives from such GenTree classes.

// Make sure it isn't already present: we should only add each definition once.
for (GenTreePhi::Use& use : phi->Uses())
{
assert(use.op->AsPhiArg()->GetSsaNum() != ssaNum);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps also assert that it's not for the same pred?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exposing my ignorance of JIT SSA here -- I am a little surprised we don't record the phi input from each pred block, rather than doing a search before adding a phi arg to see if a ssa def already is an input to the phi (or, say, keep a block list per phi input).

It is more operands in some cases but perhaps avoids quadratic builder behavior for large phis. (Speaking of which, do you have data on how large these lists get in bad cases?)

And if we just record the "first block" that brings in a value I wonder how RangeCheck::ComputeRange works correctly for PHIs as it seems like it would skip looking at some preds.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speaking of which, do you have data on how large these lists get in bad cases?

Here you go: https://1drv.ms/x/s!Av4baJYSo5pjgsQZfywTf5l3NUXitw

Data from all FX assemblies. Largest PHI has 255 operands. 90% have only 2 operands.

I was actually considering handling the 2 operands case specially - store the 2 uses inside the GenTreePhi node itself. But the code required to handle this special case gets kind of ugly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is more operands in some cases but perhaps avoids quadratic builder behavior for large phis.

I was curious to see what such quadratic behavior means on today hardware: https://gist.github.com/mikedn/23e708655cdb1d431ed165e300192c9e

It's less than 1μs for N = 10, ~68μs for N = 255, ~1ms for N = 1000 and ~200ms for N = 10000.

I'd say we're on the safe side.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if we just record the "first block" that brings in a value I wonder how RangeCheck::ComputeRange works correctly for PHIs as it seems like it would skip looking at some preds.

Indeed, that seems broken. If one predecessor asserts that i <= 2 and another asserts that i <= 3 the resulting PHI range should have upper bound 3. But if you miss the later predecessor then the upper bound will be only 2... Now, there may be something else in RangeCheck that prevents this from happening, though I starred a lot at RangeCheck in the past and I just don't see that something else.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps also assert that it's not for the same pred?

Nope, it turns out that this assert isn't valid due to exception handling. It is possible to have something like PHI BB05:V02:u2, BB05:V02:u3, BB05:V02:u4 when the same block in a try region contains more than one definition of the variable that's live in the handler.

It's the BB-SSA_NUM pair that's supposed to be unique in the PHI arg list. But since the JIT already requires SSA_NUM to be unique there's not much we can validate about the BB.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good. Left a few notes/questions....

// Make sure it isn't already present: we should only add each definition once.
for (GenTreePhi::Use& use : phi->Uses())
{
assert(use.op->AsPhiArg()->GetSsaNum() != ssaNum);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exposing my ignorance of JIT SSA here -- I am a little surprised we don't record the phi input from each pred block, rather than doing a search before adding a phi arg to see if a ssa def already is an input to the phi (or, say, keep a block list per phi input).

It is more operands in some cases but perhaps avoids quadratic builder behavior for large phis. (Speaking of which, do you have data on how large these lists get in bad cases?)

And if we just record the "first block" that brings in a value I wonder how RangeCheck::ComputeRange works correctly for PHIs as it seems like it would skip looking at some preds.

int m_state;
union {
GenTree* m_argList;
GenTreePhi::Use* m_phiUseList;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you have the root node couldn't you check for the case where it's a PHI and handle it specially, say comparing the other union fields instead?

return UseList(gtUses);
}

#if DEBUGGABLE_GENTREE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe consider adding an operand count? There may be places we'd prefer to avoid traversing large PHI lists.

@mikedn
Copy link
Author

mikedn commented Jan 12, 2019

Made some updates but I'm still going through the feedback.

@mikedn mikedn force-pushed the ssa-phi-arg-list branch 2 times, most recently from 72c1bf0 to f13025b Compare January 14, 2019 19:00
@mikedn
Copy link
Author

mikedn commented Jan 15, 2019

Ubuntu CoreFX failed due to a test timeout

05:59:08.566 WARNING: System.Collections.Concurrent.Tests.EtwTests.TestEtw is running for 19613s.
05:59:08.566 WARNING: System.Collections.Concurrent.Tests.ConcurrentQueueTests.ToArray_ParallelInvocations_Succeed is running for 19613s.
05:59:08.566 WARNING: System.Collections.Concurrent.Tests.ConcurrentStackTests.ToArray_ParallelInvocations_Succeed is running for 19606s.
06:00:08.568 WARNING: System.Collections.Concurrent.Tests.ConcurrentBagTests.ManyConcurrentAddsTakes_EnsureTrackedCountsMatchResultingCollection(threadsPerProc: 1, seconds: 0.5) is running for 19657s.
06:00:08.568 WARNING: System.Collections.Concurrent.Tests.EtwTests.TestEtw is running for 19673s.
06:00:08.568 WARNING: System.Collections.Concurrent.Tests.ConcurrentQueueTests.ToArray_ParallelInvocations_Succeed is running for 19673s.
06:00:08.568 WARNING: System.Collections.Concurrent.Tests.ConcurrentStackTests.ToArray_ParallelInvocations_Succeed is running for 19666s.
06:00:28.900 Build timed out (after 360 minutes). Marking the build as aborted.

Apparently a few tests ran concurrently and hang. I don't get any diffs in either tests or framework so there's probably something wrong with tests.

@dotnet-bot test Ubuntu x64 Checked CoreFX Tests

@mikedn
Copy link
Author

mikedn commented Jan 15, 2019

I think I addressed all review feedback with the exception of adding a use count to PHI. It's easy to do that, there's enough space in GenTreePhi and after this change the only point where PHI args are added is SsaBuilder::AddPhiArg.

But considering just how few PHIs have a large number of arguments adding the count seems premature. And there don't seem to be many places where we could skip large PHIs:

  • RangeCheck - This could be useful but RangeCheck has its own chase budget that prevents from becoming too costly in extreme circumstances
  • assertionprop's optAssertionGenPhiDefn could probably skip large PHIs, but it's not that costly
  • VN might be an interesting case. The VN of a PHI is a chain of VNF_Phi VNs and 255 of them is pretty crazy. Perhaps it would make sense to fallback to a unique VN for large PHIs. But that should be investigated separately. And again, considering that so few PHIs are large...

@RussKeldorph
Copy link

@dotnet/jit-contrib Take now or post-3.0?

@AndyAyersMS
Copy link
Member

I think we should hold off on taking an refactoring changes into 3.0.

At this point I think we're down to bug fixes and relatively contained and low-risk CQ improvements.

@mikedn
Copy link
Author

mikedn commented May 28, 2019

Also this is part of a series that will finish with the complete removal of GT_LIST so it's probably better and safer to have all or nothing in a release.

@briansull
Copy link

briansull commented May 28, 2019

I think we should hold off on taking refactoring changes into 3.0.

I also agree, we can take all these changes post 3.0

mikedn added 2 commits August 27, 2019 01:02
Calling gtSetStmtInfo/fgSetStmtSeq every time an arg is added is rather expensive and not really needed. Costs are always 0 and the shape of the PHI tree is always the same.

Removal of GT_LIST nodes makes it easier to update the linear order incrementally when a GT_PHI_ARG node is added.
mikedn added 6 commits August 27, 2019 08:10
This replaces the manual traversal of PHI's list of uses with a range-based for loop and cleans up a bunch of convoluted code.
Copy link

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks!

@CarolEidt
Copy link

@dotnet/jit-contrib - I think this is ready to merge. However, the "Pri0 Linux_musl x64 release" indicates that 1 test failed, and I can't find a link to the test output. Can someone advise?

@mikedn
Copy link
Author

mikedn commented Aug 27, 2019

Best I can tell the test timed out after 10 minutes. History shows that it's run time is all over the place, from 1-2 minutes to 8-9 minutes. This kind of test to does not belong in Pri0 tests IMO...

@CarolEidt
Copy link

Thanks @mikedn! Given that it's just a timeout, I'll go ahead and merge unless there are objections from the rest of the team in the next couple of hours. cc @dotnet/jit-contrib

@CarolEidt CarolEidt merged commit 501c1e3 into dotnet:master Aug 27, 2019
@mikedn mikedn deleted the ssa-phi-arg-list branch September 28, 2019 19:09
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
* Stop using LIST nodes for PHI operand lists

* Speed up PHI creation

Calling gtSetStmtInfo/fgSetStmtSeq every time an arg is added is rather expensive and not really needed. Costs are always 0 and the shape of the PHI tree is always the same.

Removal of GT_LIST nodes makes it easier to update the linear order incrementally when a GT_PHI_ARG node is added.

* Cleanup PHI value numbering

This replaces the manual traversal of PHI's list of uses with a range-based for loop and cleans up a bunch of convoluted code.

* Remove redundant AsPhi

* Remove list pointer union

* Update PHI equality check

* Add GenTreePhi comment

* Rename op to node


Commit migrated from dotnet/coreclr@501c1e3
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants