x64: Break more data dependencies in float-related instructions#7818
x64: Break more data dependencies in float-related instructions#7818fitzgen merged 3 commits intobytecodealliance:mainfrom
Conversation
This commit takes a stab at bytecodealliance#7816 without diving a whole lot into it. I noticed that the loop started with `vcvtss2sd` which is along the same lines as previous false dependencies found earlier in PRs such as bytecodealliance#7098. I had forgotten these instructions at the time and meant to go back and touch them up and bytecodealliance#7731 has provided sufficient motivation to do so! Locally this takes that test case from 1.6s to 0.4s for me.
Subscribe to Label ActionDetailsThis issue or pull request has been labeled: "cranelift", "cranelift:area:x64", "winch"Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
fitzgen
left a comment
There was a problem hiding this comment.
Nice!! Out of curiosity, how did you end up root causing that perf bug to this false dependency?
|
Ah it was mostly from previous experience. I knew there were a set of instructions in the back of my mind which we still did the "fake the output register as the input" for AVX (e.g. the instructions modified here) and when I ran |
This commit takes a stab at #7816 without diving a whole lot into it. I noticed that the loop started with
vcvtss2sdwhich is along the same lines as previous false dependencies found earlier in PRs such as #7098. I had forgotten these instructions at the time and meant to go back and touch them up and #7731 has provided sufficient motivation to do so!Locally this takes that test case from 1.6s to 0.4s for me.