-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Minor JIT improvements #11654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor JIT improvements #11654
Conversation
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apaszke has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
288640d to
71140ab
Compare
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apaszke has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
71140ab to
a95ec20
Compare
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apaszke has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apaszke has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
zdevito
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still looks good.
Summary: - Disable addmm fusion. The reason for this is explained in the comment. - Tiny change in `stack.h` that lets us avoid constructing an unnecessary temporary `IValue` on the (C++) stack (it will only get created on the interpreter stack directly). - Fixed a correctness issue in requires grad propagation Pull Request resolved: pytorch/pytorch#11654 Reviewed By: colesbury Differential Revision: D9813739 Pulled By: apaszke fbshipit-source-id: 23e83bc8605802f39bfecf447efad9239b9421c3
|
@apaszke, this is causing a ~200ms latency regression for NMT models. Here are the top lines from perf before/after this change: Before:
After:
Is there more info I can provide to help debug this? Also cc @jamesr66a who is familiar with these models and how to optimize them. |
|
@jmp84 can you please provide me with some example tensor sizes that appear as inputs to your GEMMs? It looks like we started triggering the transposed kernels in some cases which was not the case previously (TN vs NN). |
|
Finally, how do you run those models? Is that via ONNX export or what? |
|
@apaszke I think what's going on here is that we are not hitting the special quantized implementation of FC in the caffe2 backend due to the ONNX-caffe2 backend emitting "MatMul" + "Add" instead of "FC". This can be seen from |
|
So for the addmm optimization - “not helpful at all” means neutral or negative? Of neutral - why not kee it if it matters for ONNX export? |
|
It means negative in some RNN use cases I saw |
stack.hthat lets us avoid constructing an unnecessary temporaryIValueon the (C++) stack (it will only get created on the interpreter stack directly).