Skip to content

Conversation

@kunalspathak
Copy link
Contributor

@kunalspathak kunalspathak commented Oct 11, 2022

After building intervals, check if any of them interferes with a call. In other words, scan all the intervals and check if there are refpositions that are created before a call and used after the call. If yes, then update such intervals preference to use callee-save registers. To accomplish this, I keep track of LsraLocation just before adding RefTypeKill and also record the killmask. After building intervals, go through all the call locations and see if any of the interval collides and if yes, then set its preferCalleeSave = true.

I am assuming a TP cost to do this because we will be doing non-linear search, but hopefully, it will be not as bad, and we do get some good results out of it. I do add various checks to skip iteration if we know that scanning it won't make sense. We cannot do this at the time of building interval, because we do not have information about the future uses of the interval under consideration.

Related conversation: #75565 (comment)

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 11, 2022
@ghost ghost assigned kunalspathak Oct 11, 2022
@ghost
Copy link

ghost commented Oct 11, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

After building intervals, check if any of them interferes with a call. In other words, scan all the intervals and check if there are refpositions that are created before a call and used after the call. If yes, then update such intervals preference to use callee-save registers. To accomplish this, I keep track of LsraLocation just before adding RefTypeKill and also record the killmask. After building intervals, go through all the call locations and see if any of the interval collides and if yes, then set its preferCalleeSave = true.

I am assuming a TP cost to do this because we will be doing non-linear search, but hopefully, it will be not as bad, and we do get some good results out of it. I do add various checks to skip iteration if we know that scanning it won't make sense.

Author: kunalspathak
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@kunalspathak
Copy link
Contributor Author

I am assuming a TP cost to do this because we will be doing non-linear search, but hopefully, it will be not as bad,

It is significantly worse. Need to think of some better way to handle it.

image

I could probably do something smarter:

  1. Track recentKillPosition and update it every time we build RefTypeKill.
  2. Whenever we create Refpositions other than RefTypeDef and probably few others like RefTypeBB, etc., get the corresponding definition node and check if its location is after recentKillPosition. If yes, then no-op, but if it was before recentKillPosition, then that interval interferes with the call and we mark that interval as preferCalleeSave.

I will replace this PR with above strategy.

@kunalspathak
Copy link
Contributor Author

The throughput has increased from last time, but is still regressing.

image

@ghost ghost closed this Nov 13, 2022
@ghost
Copy link

ghost commented Nov 13, 2022

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 13, 2022
@kunalspathak
Copy link
Contributor Author

I took a stab at this today with better handling of call interference and marking preferCalleeSave and I was seeing lot of asmdiff regression. Want to check what was the state like for the implementation in this PR.

@kunalspathak kunalspathak reopened this Sep 27, 2023
@kunalspathak
Copy link
Contributor Author

Replaced by #92744

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant