Stop using LIST nodes for PHI operand lists#20266
Conversation
c98f3b8 to
2689a71
Compare
d6f6ba1 to
3ed3e50
Compare
3ed3e50 to
0cdf77e
Compare
0cdf77e to
20083cb
Compare
cbb8fd3 to
b321728
Compare
|
|
||
| GenTreeLclVarCommon* phiArg = phiArgs->Current()->AsLclVarCommon(); | ||
| phiArgs = phiArgs->Rest(); | ||
| phiArg->gtVNPair = phiArgVNP; |
There was a problem hiding this comment.
Old code failed to assign a VN to the first PHI arg in the list. As far as I can tell, this did not have any ill side effects, except for confusing JIT dumps.
|
@dotnet/jit-contrib |
| return UseList(gtUses); | ||
| } | ||
|
|
||
| #if DEBUGGABLE_GENTREE |
There was a problem hiding this comment.
What is the purpose of DEBUGGABLE_GENTREE?
There was a problem hiding this comment.
Every GenTree class needs a default constructor to keep GenTree::VtablePtr GenTree::GetVtableForOper(genTreeOps oper) happy. It declares variables of various GenTree types so it can get its hands on a vtable pointer to use when the class of a GenTree node is changed using to ChangeOper. Of course, it's all a rather crazy, non-standard, not-anything hack 😄
There was a problem hiding this comment.
From a user perspective, it allows the VS debugger to show you the "real" subtype of a GenTree* variable, which is super useful. Maybe something that is done in a more standard way with RTTI?
There was a problem hiding this comment.
I don't know, the standard way would be to use placement new to change the class of an object. But... I need to double check the standard... I think doing that will invalidate the existing object's contents and we don't want that when using ChangeOper.
Considering that this is debug only code I think it's fine as it is. Perhaps we should just add a macro to declare these default constructors on a single line, just to reduce debug noise.
There was a problem hiding this comment.
One "problem" with DEBUGGABLE_GENTREE is it isn't available in Release, so not in production crash dumps, but making it available there was considered too expensive, and unnecessary.
There was a problem hiding this comment.
but making it available there was considered too expensive, and unnecessary
Yeah, at the end of day, DEBUGGABLE_GENTREE is pure convenience. One can still figure out the class from gtOper, it's just more work to do so in the debugger. I suppose it would be cool if C++'s RTTI was somehow customizable but it isn't, it requires a vtable pointer to work.
There was a problem hiding this comment.
Maybe consider adding an operand count? There may be places we'd prefer to avoid traversing large PHI lists.
src/jit/gentree.h
Outdated
| int m_state; | ||
| union { | ||
| GenTree* m_argList; | ||
| GenTreePhi::Use* m_phiUseList; |
There was a problem hiding this comment.
From the first look I expected m_phiUseList to be UseList.
There was a problem hiding this comment.
More like UseIterator. I went back and forth on this but I think it needs to be a pointer due GenTreeUseEdgeIterator's == operator
return (m_node == other.m_node) && (m_edge == other.m_edge) && (m_argList == other.m_argList) &&
(m_state == other.m_state);This won't work correctly unless the 2 union members have the same layout/meaning. And even then it's very close to, if not right into, the undefined behavior land.
There was a problem hiding this comment.
I should add that in an ideal world GenTreePhi::gtUses should be private. But it seems that it requires quite a bit more work for not so many benefits, there are are only 3 remaining references to gtUses outside of GenTreePhi. At the end of the day I'm trying to make things reasonably better, not perfect.
There was a problem hiding this comment.
This seems a bit dicey to me - at least it might be good to make this a named union, and at least somewhat abstract the comparison that is done by the == operator, so that it's clear that these must be "compatible" in this way.
There was a problem hiding this comment.
Since you have the root node couldn't you check for the case where it's a PHI and handle it specially, say comparing the other union fields instead?
There was a problem hiding this comment.
Yep, I need to do something about this, it's a bit odd.
There was a problem hiding this comment.
Since you have the root node couldn't you check for the case where it's a PHI and handle it specially, say comparing the other union fields instead?
That would work but I'm concerned that this will make operator == even more complex than it is. Unfortunatelly, the C++ iterator model (and by extension range for) doesn't work so great when the state is complex. The repeated comparison with end can become costly.
I went for a more low tech approach - just use a void*. The few places that use this pointer can just cast to the right type.
sandreenko
left a comment
There was a problem hiding this comment.
LGTM, pin data looks nice.
The only concern is that it looks like it will be hard to reuse this code for other cases in #19876, won't it?
|
@sandreenko Sorry, I didn't notice you already approved this. I pushed a commit to cleanup that redundant
Some of it can be reused but it remains to be seen if it's not more trouble than it's worth. The struct Use
{
GenTree* op;
Use* next;
unsigned offset;
var_types type;
};So it has to be a different class but I hope that I can make both use classes inherit from a base But for For calls I would expect using a linked list again, similar to That said, the main goal is other form of reusal - such as |
CarolEidt
left a comment
There was a problem hiding this comment.
LGTM in general, with some comments.
Would like to see @AndyAyersMS and/or @briansull weigh in though
src/jit/gentree.cpp
Outdated
|
|
||
| case GT_PHI: | ||
| { | ||
| GenTreePhi::UseIterator op1Use = op1->AsPhi()->Uses().begin(); |
There was a problem hiding this comment.
This could benefit from a short comment stating the nearly-obvious: that two phis are equal only if they have the same number of uses, and the uses are equal.
There was a problem hiding this comment.
Might be useful to also mention that the order matters for equality, even though it's otherwise irrelevant. That's a bit peculiar but I don't think this code is ever used.
There was a problem hiding this comment.
I moved this code to a GenTreePhi::Equals function, with the appropriate function header. I don't see the point of dumping all this logic in a giant switch. If we'd do this for all GenTree classes then this switch would become trivial and could possibility be generated with macros, like the AsX functions are.
src/jit/gentree.h
Outdated
| int m_state; | ||
| union { | ||
| GenTree* m_argList; | ||
| GenTreePhi::Use* m_phiUseList; |
There was a problem hiding this comment.
This seems a bit dicey to me - at least it might be good to make this a named union, and at least somewhat abstract the comparison that is done by the == operator, so that it's clear that these must be "compatible" in this way.
src/jit/gentree.h
Outdated
| { | ||
| struct Use | ||
| { | ||
| GenTree* op; |
There was a problem hiding this comment.
I don't think we have a policy, guideline or even consensus around this, but I find node to be more meaningful than 'op', which at least in the JIT is sometimes used to mean an operator and sometimes an operand (and yes, we've got op1 and op2 all over the place, but still ...)
There was a problem hiding this comment.
node sounds pretty good to me too, I'll update. Anyway, "use" and "operand" seem to overlap a bit.
There was a problem hiding this comment.
Changed. I used the opportunity to also add a Node() functions, the JIT practice of accessing data members directly is rather dubious. I'd make m_node private but unfortunately there are lots of places that take its address. Hopefully the m_ prefix will steer people towards using Node() whenever possible 😄
src/jit/gentree.h
Outdated
| inline GenTree(genTreeOps oper, var_types type DEBUGARG(bool largeNode = false)); | ||
| }; | ||
|
|
||
| struct GenTreePhi : public GenTree |
There was a problem hiding this comment.
This needs a header with a general description of the node, as well as the constraints around how these are used, e.g.:
- Each of the
Uses areGT_LCL_VARnodes - This node is always the rhs of an assignment, the lhs of which is also a
GT_LCL_VARnode.
There was a problem hiding this comment.
While I'm at it, I'll mention that I wanted to make op GenTreePhiArg* rather than GenTree*, to make it clear what node types are supported. Unfortunately there are many places that take the address of op and pass it around as GenTree** so this is impossible to do.
There was a problem hiding this comment.
Added. While at it I also made the class final, it's good to know that nothing derives from such GenTree classes.
src/jit/ssabuilder.cpp
Outdated
| // Make sure it isn't already present: we should only add each definition once. | ||
| for (GenTreePhi::Use& use : phi->Uses()) | ||
| { | ||
| assert(use.op->AsPhiArg()->GetSsaNum() != ssaNum); |
There was a problem hiding this comment.
Perhaps also assert that it's not for the same pred?
There was a problem hiding this comment.
Exposing my ignorance of JIT SSA here -- I am a little surprised we don't record the phi input from each pred block, rather than doing a search before adding a phi arg to see if a ssa def already is an input to the phi (or, say, keep a block list per phi input).
It is more operands in some cases but perhaps avoids quadratic builder behavior for large phis. (Speaking of which, do you have data on how large these lists get in bad cases?)
And if we just record the "first block" that brings in a value I wonder how RangeCheck::ComputeRange works correctly for PHIs as it seems like it would skip looking at some preds.
There was a problem hiding this comment.
Speaking of which, do you have data on how large these lists get in bad cases?
Here you go: https://1drv.ms/x/s!Av4baJYSo5pjgsQZfywTf5l3NUXitw
Data from all FX assemblies. Largest PHI has 255 operands. 90% have only 2 operands.
I was actually considering handling the 2 operands case specially - store the 2 uses inside the GenTreePhi node itself. But the code required to handle this special case gets kind of ugly.
There was a problem hiding this comment.
It is more operands in some cases but perhaps avoids quadratic builder behavior for large phis.
I was curious to see what such quadratic behavior means on today hardware: https://gist.github.com/mikedn/23e708655cdb1d431ed165e300192c9e
It's less than 1μs for N = 10, ~68μs for N = 255, ~1ms for N = 1000 and ~200ms for N = 10000.
I'd say we're on the safe side.
There was a problem hiding this comment.
And if we just record the "first block" that brings in a value I wonder how RangeCheck::ComputeRange works correctly for PHIs as it seems like it would skip looking at some preds.
Indeed, that seems broken. If one predecessor asserts that i <= 2 and another asserts that i <= 3 the resulting PHI range should have upper bound 3. But if you miss the later predecessor then the upper bound will be only 2... Now, there may be something else in RangeCheck that prevents this from happening, though I starred a lot at RangeCheck in the past and I just don't see that something else.
There was a problem hiding this comment.
Perhaps also assert that it's not for the same pred?
Nope, it turns out that this assert isn't valid due to exception handling. It is possible to have something like PHI BB05:V02:u2, BB05:V02:u3, BB05:V02:u4 when the same block in a try region contains more than one definition of the variable that's live in the handler.
It's the BB-SSA_NUM pair that's supposed to be unique in the PHI arg list. But since the JIT already requires SSA_NUM to be unique there's not much we can validate about the BB.
AndyAyersMS
left a comment
There was a problem hiding this comment.
Generally looks good. Left a few notes/questions....
src/jit/ssabuilder.cpp
Outdated
| // Make sure it isn't already present: we should only add each definition once. | ||
| for (GenTreePhi::Use& use : phi->Uses()) | ||
| { | ||
| assert(use.op->AsPhiArg()->GetSsaNum() != ssaNum); |
There was a problem hiding this comment.
Exposing my ignorance of JIT SSA here -- I am a little surprised we don't record the phi input from each pred block, rather than doing a search before adding a phi arg to see if a ssa def already is an input to the phi (or, say, keep a block list per phi input).
It is more operands in some cases but perhaps avoids quadratic builder behavior for large phis. (Speaking of which, do you have data on how large these lists get in bad cases?)
And if we just record the "first block" that brings in a value I wonder how RangeCheck::ComputeRange works correctly for PHIs as it seems like it would skip looking at some preds.
src/jit/gentree.h
Outdated
| int m_state; | ||
| union { | ||
| GenTree* m_argList; | ||
| GenTreePhi::Use* m_phiUseList; |
There was a problem hiding this comment.
Since you have the root node couldn't you check for the case where it's a PHI and handle it specially, say comparing the other union fields instead?
| return UseList(gtUses); | ||
| } | ||
|
|
||
| #if DEBUGGABLE_GENTREE |
There was a problem hiding this comment.
Maybe consider adding an operand count? There may be places we'd prefer to avoid traversing large PHI lists.
|
Made some updates but I'm still going through the feedback. |
72c1bf0 to
f13025b
Compare
|
Ubuntu CoreFX failed due to a test timeout Apparently a few tests ran concurrently and hang. I don't get any diffs in either tests or framework so there's probably something wrong with tests. @dotnet-bot test Ubuntu x64 Checked CoreFX Tests |
|
I think I addressed all review feedback with the exception of adding a use count to PHI. It's easy to do that, there's enough space in But considering just how few PHIs have a large number of arguments adding the count seems premature. And there don't seem to be many places where we could skip large PHIs:
|
|
@dotnet/jit-contrib Take now or post-3.0? |
|
I think we should hold off on taking an refactoring changes into 3.0. At this point I think we're down to bug fixes and relatively contained and low-risk CQ improvements. |
|
Also this is part of a series that will finish with the complete removal of |
I also agree, we can take all these changes post 3.0 |
Calling gtSetStmtInfo/fgSetStmtSeq every time an arg is added is rather expensive and not really needed. Costs are always 0 and the shape of the PHI tree is always the same. Removal of GT_LIST nodes makes it easier to update the linear order incrementally when a GT_PHI_ARG node is added.
f13025b to
24f424e
Compare
This replaces the manual traversal of PHI's list of uses with a range-based for loop and cleans up a bunch of convoluted code.
24f424e to
52cdaed
Compare
|
@dotnet/jit-contrib - I think this is ready to merge. However, the "Pri0 Linux_musl x64 release" indicates that 1 test failed, and I can't find a link to the test output. Can someone advise? |
|
Best I can tell the test timed out after 10 minutes. History shows that it's run time is all over the place, from 1-2 minutes to 8-9 minutes. This kind of test to does not belong in Pri0 tests IMO... |
|
Thanks @mikedn! Given that it's just a timeout, I'll go ahead and merge unless there are objections from the rest of the team in the next couple of hours. cc @dotnet/jit-contrib |
* Stop using LIST nodes for PHI operand lists * Speed up PHI creation Calling gtSetStmtInfo/fgSetStmtSeq every time an arg is added is rather expensive and not really needed. Costs are always 0 and the shape of the PHI tree is always the same. Removal of GT_LIST nodes makes it easier to update the linear order incrementally when a GT_PHI_ARG node is added. * Cleanup PHI value numbering This replaces the manual traversal of PHI's list of uses with a range-based for loop and cleans up a bunch of convoluted code. * Remove redundant AsPhi * Remove list pointer union * Update PHI equality check * Add GenTreePhi comment * Rename op to node Commit migrated from dotnet/coreclr@501c1e3
Using
GT_LISTnodes to represent PHI's (and pretty much any) variable length operand list is completely unnecessary and wastes memory and CPU cycles. Not to mention thatGT_LISTnodes are basically a case of square peg in the round hole as far as IR semantics is concerned. There's no way to assign them a meaningful type, value number or any kind "value" meaning, they're syntax nodes in a world of values.The "wasted CPU cycles" part is particularly bad in the case of PHIs. Every time a PHI arg is added,
SsaBuildercallsgtSetStmtInfoandfgSetStmtSeqto sequence the whole tree, those are relatively expensive and the use ofGT_LISTnodes just increases the cost as they're extra nodes and need special handling. PHI costs are always 0, the tree always the same shape so it would be pretty easy to just add each new PHI arg to the linear order and call it a day. ButGT_LISTuse manages to make that more complicated as list nodes themselves have to be added to the linear order as well (well, that's not strictly true - none of these nodes need to be part of the linear order but that's another story).So stop using
GT_LISTnodes and instead maintain a single linked list of "use" nodes. These aren'tGenTreenodes, they're not part of the linear order, they use less memory. An alternative may be to use arrays instead of linked lists but I haven't starred long enough atSsaBuilderto tell if it would be possible to guess the proper array size before adding all the args. Anyway, I consider that to be tuning and right now I'm more interested to just make the IR saner. Using arrays can be tried at a later time, thanks to the use of range-based for loops the amount of changes should be far smaller.It's worth noting that a lot of (most actually?) changes are required to deal with various IR traversal facilities the JIT has. That stuff got out of control - there are general traversal facilities such as
GenTreeVisitor& co.,GenTree::Operands(),GenTree::NumChildren/GetChild/gtGetChildPointerand then there a bunch more functions that decided to reinvent the wheel and do their own custom traversal -fgSetTreeSeqHelper,gtSetEvalOrderand many others. Hopefully I'll clean up some of this mess in the future.This saves 0.17% used memory and 0.52% instructions:
Mem diff: https://gist.github.com/mikedn/75bde8ebf3c1acbbcce85df6f9090dad
PIN data: https://1drv.ms/x/s!Av4baJYSo5pjgsArIw8sxENXDXGulA
Contributes to #19876
Contributes to #15108