Extract argInfo building from fgMorphArgs#19658
Conversation
|
I'm not yet sure what this change does exactly but any change around |
|
@mikedn - this is mainly a refactoring to separate the determination of how the arguments are passed from the work of modifying the argument trees for passing. Theoretically nothing changed except a small change to x64/ux, and I had zero pmi diffs for x86 and x64 windows, as well as for the arm and arm64 altjits. This should lay the groundwork for 1) reducing the cases where we needlessly make copies or mark args as address-taken, and 2) extracting the parts of |
9139296 to
ed9a8ac
Compare
95e366c to
c7014cf
Compare
|
@dotnet/jit-contrib PTAL |
|
@dotnet-bot help |
|
Welcome to the dotnet/coreclr Perf help The following is a list of valid commands on this PR. To invoke a command, comment the indicated phrase on the PR The following commands are valid for all PRs and repositories. Click to expand
The following jobs are launched by default for each PR against dotnet/coreclr:master. Click to expand
The following optional jobs are available in PRs against dotnet/coreclr:master. Click to expand
Have a nice day! |
|
Welcome to the dotnet/coreclr Repository The following is a list of valid commands on this PR. To invoke a command, comment the indicated phrase on the PR The following commands are valid for all PRs and repositories. Click to expand
The following jobs are launched by default for each PR against dotnet/coreclr:master. Click to expand
The following optional jobs are available in PRs against dotnet/coreclr:master. Click to expand
Have a nice day! |
|
Welcome to the dotnet/coreclr Repository The following is a list of valid commands on this PR. To invoke a command, comment the indicated phrase on the PR The following commands are valid for all PRs and repositories. Click to expand
The following jobs are launched by default for each PR against dotnet/coreclr:master. Click to expand
The following optional jobs are available in PRs against dotnet/coreclr:master. Click to expand
Have a nice day! |
|
@dotnet-bot test Windows_NT x64 Checked jitstress1 |
|
@dotnet-bot test Windows_NT x86 Checked jitstress1 |
|
@dotnet-bot test Ubuntu arm Cross Checked jitstress1 Build and Test |
|
@dotnet-bot test Ubuntu arm Cross Checked minopts Build and Test |
|
@dotnet/jit-contrib PTAL - I've removed the "WIP". The x86 failures are #19683. The arm failures all appear to be infrastructure/process issues. |
|
@dotnet-bot test Ubuntu x64 Checked jitstress1 |
|
Is it possible to separate this PR on several commits? |
| assert(regIndex == regArgInx); | ||
| assert(regArgInx == curArgTabEntry->lateArgInx); | ||
| assert(listInx == lateArgInx); | ||
| assert(lateArgInx == curArgTabEntry->lateArgInx); |
There was a problem hiding this comment.
Is lateArgInx assigned into curArgTabEntry if it is -1 originally? Seems like this assert may compare -1 and lateArgInx
There was a problem hiding this comment.
I did some simplification in this method, and it looks like more is needed - that is, I think that the lateArgInx != -1 should always be true if isLateArg. I'll take a deeper look at this.
src/jit/morph.cpp
Outdated
|
|
||
| if (curArgTabEntry->node != argx) | ||
| { | ||
| curArgTabEntry->node = argx; | ||
| } | ||
| } | ||
| else | ||
| #endif |
There was a problem hiding this comment.
Nit: can you add a comment for the endif name eg #endif // FEATURE_FIXED_OUT_ARGS
src/jit/morph.cpp
Outdated
| #ifdef FEATURE_HFA | ||
| if (reMorphing) | ||
| hfaType = GetHfaType(argx); | ||
| if (varTypeIsFloating(hfaType) && !callIsVararg) |
There was a problem hiding this comment.
if (varTypeIsFloating(hfaType) && !callIsVararg)
This is not strictly true, this is target dependent. Only on Windows arm/arm64 this is a true statement. The existing code had this under an ifdef correctly. Specifically, unix arm/arm64 will still need to classify HFA types for native varargs methods.
There was a problem hiding this comment.
@jashook - If I understand correctly, even arm64 has to always classify hfa types, because they are passed by value even for varargs, though by size they might be too large. The existing code was separately tracking that. Are you saying that the setting of compFloatingPointUsed needs to be done on non-windows varargs?
Note that this code unconditionally gets the HFA type, and then the isHfaArg boolean tracks whether it's actually passed in floating point registers.
Could you look at the comment near line 3650 in the new code and see if it is correct?
There was a problem hiding this comment.
// will still be passed by value, so we need to record their hfaType.
// On ARM, we don't treat HFA types differently if they are not passed in registers,
// but in both cases we still keep track of whether we have an HFA.
Is correct for everything but arm64 windows varargs. The HFA type will be dropped/ignored for varargs arm64 Windows, we will classify the struct as if they were not HFAs. They will instead follow the calling convention for passing an non-HFA struct, and will force the floating point values to be passed in integer registers for <16 byte structs. However, unix arm64 does not have this restriction, and will treat a vararg call with no difference from the regular HFA classification, therefore pass in floating point registers.
There was a problem hiding this comment.
I thought I'd parsed through the logic to determine that (for a varargs call) HFAs larger than 16 bytes would be passed by-value, even if they were not passed in floating point regs. But I looked at the Windows ARM64 ABI doc, and it appears that they should, as you say, be treated as a non-HFA composite (i.e. passed by reference).
| #pragma warning(disable : 21000) // Suppress PREFast warning about overly large function | ||
| #endif | ||
| GenTreeCall* Compiler::fgMorphArgs(GenTreeCall* call) | ||
| void Compiler::fgInitArgInfo(GenTreeCall* call) |
There was a problem hiding this comment.
This is truly a huge step forward. That being said I have a quick question. There was an expectation to call this several time? Upon the second call it would return as a no-op correct?
This would enable a call to compiler::fgInitArgInfo inside compiler::canFastTailCall
There was a problem hiding this comment.
Yes, that's the idea, though I confess that I wouldn't be at all surprised if getting that to work correctly required additional changes. Below, it checks whether the arg info has already been created (if (call->fgArgInfo != nullptr)), and simply returns.
| CLANG_FORMAT_COMMENT_ANCHOR; | ||
|
|
||
| #if defined(_TARGET_X86_) || defined(_TARGET_ARM_) | ||
| #if defined(_TARGET_X86_) || defined(_TARGET_ARM_) || defined(UNIX_AMD64_ABI) |
There was a problem hiding this comment.
Is this change a part of this refactoring? Should not it cause unix amd64 asm diffs?
There was a problem hiding this comment.
This change has zero diffs. Some of the changes involved deleting x64/ux-specific code and making minor modifications to existing methods to achieve the same result. Future changes will change the arg passing code, but not this one.
I actually tried to make this as minimal as possible - but the code in |
|
In order to not regress arm64 varargs it is worth manually running |
c7014cf to
e0d1a9c
Compare
|
@dotnet-bot test Ubuntu arm Cross Checked Innerloop Build and Test |
|
@dotnet-bot test Windows_NT x64 Checked jitstress1 |
|
@dotnet-bot test Windows_NT x64 Checked jitstress1 |
|
@dotnet-bot help |
|
Welcome to the dotnet/coreclr Repository The following is a list of valid commands on this PR. To invoke a command, comment the indicated phrase on the PR The following commands are valid for all PRs and repositories. Click to expand
The following jobs are launched by default for each PR against dotnet/coreclr:master. Click to expand
The following optional jobs are available in PRs against dotnet/coreclr:master. Click to expand
Have a nice day! |
|
Welcome to the dotnet/coreclr Perf help The following is a list of valid commands on this PR. To invoke a command, comment the indicated phrase on the PR The following commands are valid for all PRs and repositories. Click to expand
The following jobs are launched by default for each PR against dotnet/coreclr:master. Click to expand
The following optional jobs are available in PRs against dotnet/coreclr:master. Click to expand
Have a nice day! |
|
@dotnet-bot test Windows_NT x64 Checked jitstress1 |
|
@dotnet-bot test Windows_NT x64 Checked jitstress1_tiered |
|
The x64 stress failures all appear to be #20322 |
|
@dotnet-bot test Ubuntu x64 Checked jitstress1 |
This extracts the code to build the `fgArgInfo` on a call from the code that modifies the arguments. Eliminated a pre-existing repeated traversal of the argList by changing `EvalToTmp` to take the `fgArgTabEntry` which the caller always has available.
f6a3fda to
e0b2a9f
Compare
|
@dotnet-bot test this please |
|
@CarolEidt Are you planning to do further refactoring in the call area? I think I'm going to try to get rid of GT_LIST and it would be nice to know how big the risk of conflict is, especially that it's likely going to take a while. |
|
@mikedn - Yes, I plan to extract the argInfo such that it can be used for both call sites as well as incoming args. However, I am planning to take some time off from refactoring to work on getting some actual improvement in struct passing (i.e. eliminating copies). |
Likely still a linked list but not of
Hmm, it seems difficult to avoid conflicts in this area and there's no hurry anyway, I can wait. However, there's one thing that may be mentioned before additional refactoring is done. I kind of hope that once |
|
I would like to unify the handling of incoming and outgoing args. That is, I would like to split the |
Sorry but I don't quite understand how we go from the "split" to "just one type", they seem opposite :)
Ha ha, I have lots of those. I even dream that perhaps we can get rid of "late args", they're rather strange. Well, I'm pretty sure it's doable, the main problem is how to get there in small enough incremental changes. BTW, I downloaded the changes in this PR and it looks great. For some reason I couldn't quite make much of it in the GitHub diff view. |
This leverages the work done in dotnet#19658 to use fgInitArgInfo in fgCanFastTailCall. This removes much of the duplicate ABI code that was required before dotnet#19658. In addition, the change more clearly explains the issues with LowerFastTailCall and more clearly bails on when it can and cannot fastTailCall. Note: Part of the old logic would report no to a fastTailCall on Windows AMD64, if the struct was a power of two (engregisterable). This has been removed; however, it is unclear currently whether LowerFastTailCall/Codgen will support the change.
This leverages the work done in dotnet#19658 to use fgInitArgInfo in fgCanFastTailCall. This removes much of the duplicate ABI code that was required before dotnet#19658. In addition, the change more clearly explains the issues with LowerFastTailCall and more clearly bails on when it can and cannot fastTailCall. Note: Part of the old logic would report no to a fastTailCall on Windows AMD64, if the struct was a power of two (engregisterable). This has been removed; however, it is unclear currently whether LowerFastTailCall/Codgen will support the change.
* Extract argInfo building from fgMorphArgs This extracts the code to build the `fgArgInfo` on a call from the code that modifies the arguments. Eliminated a pre-existing repeated traversal of the argList by changing `EvalToTmp` to take the `fgArgTabEntry` which the caller always has available.
This leverages the work done in dotnet#19658 to use fgInitArgInfo in fgCanFastTailCall. This removes much of the duplicate ABI code that was required before dotnet#19658. In addition, the change more clearly explains the issues with LowerFastTailCall and more clearly bails on when it can and cannot fastTailCall. Note: Part of the old logic would report no to a fastTailCall on Windows AMD64, if the struct was a power of two (engregisterable). This has been removed; however, it is unclear currently whether LowerFastTailCall/Codgen will support the change.
This leverages the work done in dotnet#19658 to use fgInitArgInfo in fgCanFastTailCall. This removes much of the duplicate ABI code that was required before dotnet#19658. In addition, the change more clearly explains the issues with LowerFastTailCall and more clearly bails on when it can and cannot fastTailCall. Note: Part of the old logic would report no to a fastTailCall on Windows AMD64, if the struct was a power of two (engregisterable). This has been removed; however, it is unclear currently whether LowerFastTailCall/Codgen will support the change.
This leverages the work done in dotnet#19658 to use fgInitArgInfo in fgCanFastTailCall. This removes much of the duplicate ABI code that was required before dotnet#19658. In addition, the change more clearly explains the issues with LowerFastTailCall and more clearly bails on when it can and cannot fastTailCall. Note: Part of the old logic would report no to a fastTailCall on Windows AMD64, if the struct was a power of two (engregisterable). This has been removed; however, it is unclear currently whether LowerFastTailCall/Codgen will support the change.
This leverages the work done in dotnet#19658 to use fgInitArgInfo in fgCanFastTailCall. This removes much of the duplicate ABI code that was required before dotnet#19658. In addition, the change more clearly explains the issues with LowerFastTailCall and more clearly bails on when it can and cannot fastTailCall. Note: Part of the old logic would report no to a fastTailCall on Windows AMD64, if the struct was a power of two (engregisterable). This has been removed; however, it is unclear currently whether LowerFastTailCall/Codgen will support the change.
This leverages the work done in dotnet#19658 to use fgInitArgInfo in fgCanFastTailCall. This removes much of the duplicate ABI code that was required before dotnet#19658. In addition, the change more clearly explains the issues with LowerFastTailCall and more clearly bails on when it can and cannot fastTailCall. Note: Part of the old logic would report no to a fastTailCall on Windows AMD64, if the struct was a power of two (engregisterable). This has been removed; however, it is unclear currently whether LowerFastTailCall/Codgen will support the change.
* Use fgInitArgInfo in fgCanFastTailCall This leverages the work done in #19658 to use fgInitArgInfo in fgCanFastTailCall. This removes much of the duplicate ABI code that was required before #19658. In addition, the change more clearly explains the issues with LowerFastTailCall and more clearly bails on when it can and cannot fastTailCall. Note: Part of the old logic would report no to a fastTailCall on Windows AMD64, if the struct was a power of two (engregisterable). This has been removed; however, it is unclear currently whether LowerFastTailCall/Codgen will support the change. * Address pr feedback for FastTailCallCandidates * Update variable name based on rebase * Fix comment headers based on feedback * More feedback addressed * Fix rebase error * Address feedback and correctly bail on byref args * Restrict to only bail on byref struct args * Correctly used computations from fgArgInfo * Correctly re-init arg infor for explicit tailcalls This is required because morphTailCall modifies the argument list. It would be nice in the future to go through each argument on subsequent calls to fgInitArgInfo to verify there have been no changes. * Fix x86 assert * Address PR feedback related to recomputing initArgs * Fix comment * Update fgCanFastTailCall header * Address PR feedback * Apply format patch * Correctly reset arginfo after morphtailcall * Commit does several small things, and fixes a bug Addresses PR feedback Removes dead code Changes to bail a fast tail call when the caller has a multi slot struct * Fix windows build * Address pr feedback * Fix arm and x86 builds * apply format patch and remove #25885 * Rebase on master * Apply format patch
* Use fgInitArgInfo in fgCanFastTailCall This leverages the work done in dotnet/coreclr#19658 to use fgInitArgInfo in fgCanFastTailCall. This removes much of the duplicate ABI code that was required before dotnet/coreclr#19658. In addition, the change more clearly explains the issues with LowerFastTailCall and more clearly bails on when it can and cannot fastTailCall. Note: Part of the old logic would report no to a fastTailCall on Windows AMD64, if the struct was a power of two (engregisterable). This has been removed; however, it is unclear currently whether LowerFastTailCall/Codgen will support the change. * Address pr feedback for FastTailCallCandidates * Update variable name based on rebase * Fix comment headers based on feedback * More feedback addressed * Fix rebase error * Address feedback and correctly bail on byref args * Restrict to only bail on byref struct args * Correctly used computations from fgArgInfo * Correctly re-init arg infor for explicit tailcalls This is required because morphTailCall modifies the argument list. It would be nice in the future to go through each argument on subsequent calls to fgInitArgInfo to verify there have been no changes. * Fix x86 assert * Address PR feedback related to recomputing initArgs * Fix comment * Update fgCanFastTailCall header * Address PR feedback * Apply format patch * Correctly reset arginfo after morphtailcall * Commit does several small things, and fixes a bug Addresses PR feedback Removes dead code Changes to bail a fast tail call when the caller has a multi slot struct * Fix windows build * Address pr feedback * Fix arm and x86 builds * apply format patch and remove dotnet/coreclr#25885 * Rebase on master * Apply format patch Commit migrated from dotnet/coreclr@f8f0c55
This extracts the code to build the
fgArgInfoon a call from the code that modifies the arguments.