Update fgCanFastTailCall to use new fgArgInfo#20643
Conversation
|
/cc @dotnet/jit-contrib |
|
Note I still have to run diffs. With the caveat that it may be diffs regarding fast tail call sites. |
CarolEidt
left a comment
There was a problem hiding this comment.
This LGTM overall, with some suggestions. I'd strongly recommend additional reviewers on this (which I see you've already requested)!
|
Maybe I'm confused, but I thought the "theoretical optimization" was one that we already did sometimes, by using (early) ref counts as a proxy for liveness. See Lines 5042 to 5056 in c34a0c0 If possible I'd really like to see this change done as a pure refactor with no (or very minimal) diffs. Do things the new way and show that it is cleaner and gives us the same results. Then if there are bugs to fix we can do that subsequently. |
That is the hope for this change. So far I have found no diffs on x64 unix. Note this was using run-pmi-diff.py There are a few caveats to this. I will be adding logic to fix https://github.com/dotnet/coreclr/issues/20726. Theoretically will not see diffs on this because we have no coverage of the scenario.
This optimization should have not been done for fastTailCall cases, as the previous logic would force a no fastTailCall decision if an x64 Windows struct could not be enregistered. |
684fb37 to
b434159
Compare
|
Rebased against master |
src/jit/morph.cpp
Outdated
| // Note that GenericContext and VarargCookie are added by importer while | ||
| // importing the call to gtCallArgs list along with explicit user args. | ||
| size_t calleeArgRegCount = 0; | ||
| size_t callerIntArgRegCount = 0; |
|
I guess my earlier confusion was in parsing So I take it we are still doing this "last use means no copy needed" optimization for normal calls where the arg has just one reference, and we should now be disqualifying calls passed implicit byref args from being tail call candidates. Can we change the code to assert for this? assert(!call->IsTailCall());
if (varDsc->lvRefCnt(RCS_EARLY) == 1) && !fgMightHaveLoop())
{
varDsc->setLvRefCnt(0, RCS_EARLY); |
Correct
This is mostly correct, with the addition that these implicit byref args would have already been marked as cannot be fast call. As they would have been marked as multibyte arguments and reported as cannot fastTailCall hasMultiByteArg. Will definitely add the assert. |
BruceForstall
left a comment
There was a problem hiding this comment.
I haven't had a chance to fully review fgCanFastTailCall yet.
BruceForstall
left a comment
There was a problem hiding this comment.
Are these changes zero diff on Windows/x64, Linux/x64, and arm64?
CarolEidt
left a comment
There was a problem hiding this comment.
Some additional comments/suggestions, but overall LGTM
b434159 to
5aeeb41
Compare
5aeeb41 to
27cf4aa
Compare
27cf4aa to
485a707
Compare
485a707 to
999d1a6
Compare
CarolEidt
left a comment
There was a problem hiding this comment.
LGTM. If possible, it would be great to add the following to the test case:
- Document what 'NotExplicit' means in the structs. It's not clear to me.
- For the cases that indicate that they will fail tailcall, add comments to indicate why, and in particular whether it's a specific platform that has the limitation.
This is required because morphTailCall modifies the argument list. It would be nice in the future to go through each argument on subsequent calls to fgInitArgInfo to verify there have been no changes.
Addresses PR feedback Removes dead code Changes to bail a fast tail call when the caller has a multi slot struct
f87651e to
d4e95d0
Compare
|
Rebased on #25885. Testing failed with OSX, issue seems to be https://github.com/dotnet/core-eng/issues/7155. I have verified the tests pass locally on OSX. |
* Use fgInitArgInfo in fgCanFastTailCall This leverages the work done in dotnet/coreclr#19658 to use fgInitArgInfo in fgCanFastTailCall. This removes much of the duplicate ABI code that was required before dotnet/coreclr#19658. In addition, the change more clearly explains the issues with LowerFastTailCall and more clearly bails on when it can and cannot fastTailCall. Note: Part of the old logic would report no to a fastTailCall on Windows AMD64, if the struct was a power of two (engregisterable). This has been removed; however, it is unclear currently whether LowerFastTailCall/Codgen will support the change. * Address pr feedback for FastTailCallCandidates * Update variable name based on rebase * Fix comment headers based on feedback * More feedback addressed * Fix rebase error * Address feedback and correctly bail on byref args * Restrict to only bail on byref struct args * Correctly used computations from fgArgInfo * Correctly re-init arg infor for explicit tailcalls This is required because morphTailCall modifies the argument list. It would be nice in the future to go through each argument on subsequent calls to fgInitArgInfo to verify there have been no changes. * Fix x86 assert * Address PR feedback related to recomputing initArgs * Fix comment * Update fgCanFastTailCall header * Address PR feedback * Apply format patch * Correctly reset arginfo after morphtailcall * Commit does several small things, and fixes a bug Addresses PR feedback Removes dead code Changes to bail a fast tail call when the caller has a multi slot struct * Fix windows build * Address pr feedback * Fix arm and x86 builds * apply format patch and remove dotnet/coreclr#25885 * Rebase on master * Apply format patch Commit migrated from dotnet/coreclr@f8f0c55
Update fgCanFastTailCall to use init fgArgInfo. Note that fgCanFastTailCall will now do all of the argInfo init and fgMorphArgs will now just re-query the ArgInfo for #FEATURE_FAST_TAILCALL && the call is a possible fast tail call candidate.
The change also includes other changes such as:
https://github.com/dotnet/coreclr/pull/20643/files#diff-bd2d23afdb1936d1982f699f760429d2R7070 summarizes the reason to not allow byref types in fast tail calls. Which is that although it is theoretically possible to use the caller's allocated structure and manipulate it by pointer there is more work required to force using the caller's memory when passing the struct to subsequent callees. Take the following example on windows x64: