Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Update fgCanFastTailCall to use new fgArgInfo#20643

Merged
jashook merged 24 commits intodotnet:masterfrom
jashook:update_fgCanFastTailCall_to_use_new_fgArgInfo
Jul 29, 2019
Merged

Update fgCanFastTailCall to use new fgArgInfo#20643
jashook merged 24 commits intodotnet:masterfrom
jashook:update_fgCanFastTailCall_to_use_new_fgArgInfo

Conversation

@jashook
Copy link

@jashook jashook commented Oct 26, 2018

Update fgCanFastTailCall to use init fgArgInfo. Note that fgCanFastTailCall will now do all of the argInfo init and fgMorphArgs will now just re-query the ArgInfo for #FEATURE_FAST_TAILCALL && the call is a possible fast tail call candidate.

The change also includes other changes such as:

  1. Saving the struct's int and float reg count in the fgArgTabEntry for x64 unix.
  2. Removes the multibyte decision for not fastTailCalling. This is now changed to a much less complicated decision in which all calls which have a byref struct argument will be not fast tail called. See logic below.
  3. It changes some of the older logic calling out the problems in LowerFastTailCall to move stack slot arguments > 1 stack slot size.
  4. Simplifies the fgCanFastTailCall logic to rely on the fgArgEntry information instead of recomputing the arguments size and abi information.

https://github.com/dotnet/coreclr/pull/20643/files#diff-bd2d23afdb1936d1982f699f760429d2R7070 summarizes the reason to not allow byref types in fast tail calls. Which is that although it is theoretically possible to use the caller's allocated structure and manipulate it by pointer there is more work required to force using the caller's memory when passing the struct to subsequent callees. Take the following example on windows x64:

public static void callee(int a, int b, int c, int d, 32ByteStruct s) {
    // At this point we would create an outgoing 32byte struct on our stack 
    // however, the caller has no stack allocated to it, this operation is illegal
    // and best case will AV.
    //
    // One could imagine a theoretical option to reuse the existing byref as the
    // copy, as s is dead after before call in a fast tail call.
    callee(1, 2, 3, 4, s);
}

public static void caller(32ByteStruct s) {
    // Outgoing 32byte struct is placed on the stack and passed byref
    callee(1, 2, 3, 4, s);
}

@jashook jashook changed the title Update fgCanFastTailCall to use new fgArgInfo - WIP Update fgCanFastTailCall to use new fgArgInfo Oct 30, 2018
@jashook
Copy link
Author

jashook commented Oct 30, 2018

/cc @dotnet/jit-contrib

@jashook
Copy link
Author

jashook commented Oct 30, 2018

Note I still have to run diffs. With the caveat that it may be diffs regarding fast tail call sites.

Copy link

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM overall, with some suggestions. I'd strongly recommend additional reviewers on this (which I see you've already requested)!

@AndyAyersMS
Copy link
Member

Maybe I'm confused, but I thought the "theoretical optimization" was one that we already did sometimes, by using (early) ref counts as a proxy for liveness. See fgMakeOutgoingStructArgCopy:

coreclr/src/jit/morph.cpp

Lines 5042 to 5056 in c34a0c0

if (lvaIsImplicitByRefLocal(varNum))
{
LclVarDsc* varDsc = &lvaTable[varNum];
// JIT_TailCall helper has an implicit assumption that all tail call arguments live
// on the caller's frame. If an argument lives on the caller caller's frame, it may get
// overwritten if that frame is reused for the tail call. Therefore, we should always copy
// struct parameters if they are passed as arguments to a tail call.
if (!call->IsTailCallViaHelper() && (varDsc->lvRefCnt(RCS_EARLY) == 1) && !fgMightHaveLoop())
{
varDsc->setLvRefCnt(0, RCS_EARLY);
args->gtOp.gtOp1 = lcl;
argEntry->node = lcl;
JITDUMP("did not have to make outgoing copy for V%2d", varNum);
return;

If possible I'd really like to see this change done as a pure refactor with no (or very minimal) diffs. Do things the new way and show that it is cleaner and gives us the same results.

Then if there are bugs to fix we can do that subsequently.

@jashook
Copy link
Author

jashook commented Nov 1, 2018

If possible I'd really like to see this change done as a pure refactor with no (or very minimal) diffs. Do things the new way and show that it is cleaner and gives us the same results. Then if there are bugs to fix we can do that subsequently.

That is the hope for this change. So far I have found no diffs on x64 unix. Note this was using run-pmi-diff.py

There are a few caveats to this. I will be adding logic to fix https://github.com/dotnet/coreclr/issues/20726. Theoretically will not see diffs on this because we have no coverage of the scenario.

Maybe I'm confused, but I thought the "theoretical optimization" was one that we already did sometimes, by using (early) ref counts as a proxy for liveness. See fgMakeOutgoingStructArgCopy:

This optimization should have not been done for fastTailCall cases, as the previous logic would force a no fastTailCall decision if an x64 Windows struct could not be enregistered.

@jashook jashook force-pushed the update_fgCanFastTailCall_to_use_new_fgArgInfo branch from 684fb37 to b434159 Compare November 1, 2018 22:12
@jashook
Copy link
Author

jashook commented Nov 1, 2018

Rebased against master

// Note that GenericContext and VarargCookie are added by importer while
// importing the call to gtCallArgs list along with explicit user args.
size_t calleeArgRegCount = 0;
size_t callerIntArgRegCount = 0;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting error

Copy link

@briansull briansull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AndyAyersMS
Copy link
Member

I guess my earlier confusion was in parsing !call->IsTailCallViaHelper() to mean the call was a fast tail call. That's too restrictive. It may also mean the call is not a tail call at all.

So I take it we are still doing this "last use means no copy needed" optimization for normal calls where the arg has just one reference, and we should now be disqualifying calls passed implicit byref args from being tail call candidates. Can we change the code to assert for this?

            assert(!call->IsTailCall());
            if (varDsc->lvRefCnt(RCS_EARLY) == 1) && !fgMightHaveLoop())
            {
                varDsc->setLvRefCnt(0, RCS_EARLY);

@jashook
Copy link
Author

jashook commented Nov 5, 2018

I guess my earlier confusion was in parsing !call->IsTailCallViaHelper() to mean the call was a fast tail call. That's too restrictive. It may also mean the call is not a tail call at all.

Correct

So I take it we are still doing this "last use means no copy needed" optimization for normal calls where the arg has just one reference, and we should now be disqualifying calls passed implicit byref args from being tail call candidates. Can we change the code to assert for this?

This is mostly correct, with the addition that these implicit byref args would have already been marked as cannot be fast call. As they would have been marked as multibyte arguments and reported as cannot fastTailCall hasMultiByteArg.

Will definitely add the assert.

Copy link

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't had a chance to fully review fgCanFastTailCall yet.

Copy link

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these changes zero diff on Windows/x64, Linux/x64, and arm64?

Copy link

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some additional comments/suggestions, but overall LGTM

@jashook jashook force-pushed the update_fgCanFastTailCall_to_use_new_fgArgInfo branch from b434159 to 5aeeb41 Compare November 21, 2018 22:43
@jashook jashook force-pushed the update_fgCanFastTailCall_to_use_new_fgArgInfo branch from 5aeeb41 to 27cf4aa Compare January 25, 2019 21:30
@jashook jashook force-pushed the update_fgCanFastTailCall_to_use_new_fgArgInfo branch from 27cf4aa to 485a707 Compare February 21, 2019 18:32
@jashook jashook force-pushed the update_fgCanFastTailCall_to_use_new_fgArgInfo branch from 485a707 to 999d1a6 Compare July 15, 2019 17:42
Copy link

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. If possible, it would be great to add the following to the test case:

  • Document what 'NotExplicit' means in the structs. It's not clear to me.
  • For the cases that indicate that they will fail tailcall, add comments to indicate why, and in particular whether it's a specific platform that has the limitation.

@jashook jashook force-pushed the update_fgCanFastTailCall_to_use_new_fgArgInfo branch from f87651e to d4e95d0 Compare July 27, 2019 00:00
@jashook
Copy link
Author

jashook commented Jul 29, 2019

Rebased on #25885. Testing failed with OSX, issue seems to be https://github.com/dotnet/core-eng/issues/7155. I have verified the tests pass locally on OSX.

@jashook jashook merged commit f8f0c55 into dotnet:master Jul 29, 2019
@jashook jashook deleted the update_fgCanFastTailCall_to_use_new_fgArgInfo branch July 29, 2019 16:06
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
* Use fgInitArgInfo in fgCanFastTailCall

This leverages the work done in dotnet/coreclr#19658 to use fgInitArgInfo in fgCanFastTailCall. This removes much of the duplicate ABI code that was required before dotnet/coreclr#19658. In addition, the change more clearly explains the issues with LowerFastTailCall and more clearly bails on when it can and cannot fastTailCall.

Note:

Part of the old logic would report no to a fastTailCall on Windows AMD64, if the struct was a power of two (engregisterable). This has been removed; however, it is unclear currently whether LowerFastTailCall/Codgen will support the change.

* Address pr feedback for FastTailCallCandidates

* Update variable name based on rebase

* Fix comment headers based on feedback

* More feedback addressed

* Fix rebase error

* Address feedback and correctly bail on byref args

* Restrict to only bail on byref struct args

* Correctly used computations from fgArgInfo

* Correctly re-init arg infor for explicit tailcalls

This is required because morphTailCall modifies the argument list.

It would be nice in the future to go through each argument on subsequent calls to fgInitArgInfo to verify there have been no changes.

* Fix x86 assert

* Address PR feedback related to recomputing initArgs

* Fix comment

* Update fgCanFastTailCall header

* Address PR feedback

* Apply format patch

* Correctly reset arginfo after morphtailcall

* Commit does several small things, and fixes a bug

Addresses PR feedback
Removes dead code
Changes to bail a fast tail call when the caller has a multi slot struct

* Fix windows build

* Address pr feedback

* Fix arm and x86 builds

* apply format patch and remove dotnet/coreclr#25885

* Rebase on master

* Apply format patch


Commit migrated from dotnet/coreclr@f8f0c55
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants