-
Notifications
You must be signed in to change notification settings - Fork 26.3k
addmm: port to structured kernel
#57417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
💊 CI failures summary and remediationsAs of commit 335025a (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
c2ecf29 to
5f54466
Compare
Codecov Report
@@ Coverage Diff @@
## master #57417 +/- ##
=======================================
Coverage 76.84% 76.84%
=======================================
Files 1986 1986
Lines 197902 197902
=======================================
+ Hits 152079 152081 +2
+ Misses 45823 45821 -2 |
aten/src/ATen/NamedTensorUtils.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little bit of a pessimization but not by much since the internals of this functional already always allocated a vector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems already discussed in: https://github.com/pytorch/pytorch/pull/55746/files#r613660482. I can create a separate PR for that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ngimel I forgot... did I sign up to fix this in structured kernels itself 😂 🤣 (this is OK for now but shouldn't be necessary as structured kernel's set_output should do this test automatically)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We didn't figure out robust way of deduplicating size checks in structured kernel's set_output and TensorIterator right away, and I haven't looked at it yet in more detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok filed an issue for this #57827
ezyang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this isn't the cleanest port but it is probably about as good as it can get for now.
|
Rong, can you please run instruction counts on it (on cpu and gpu)? |
5f54466 to
797fd04
Compare
|
@walterddr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
|
sorry for the late follow up: I used |
|
Awesome, thanks! |
- move one TORCH_CHECK back to impl since it is used in other mm funcs - skip LSTM/GRU meta test because it was reusing output then resized - added in-place checker for output
797fd04 to
335025a
Compare
|
@walterddr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
|
@walterddr merged this pull request in 002ce5c. |
Summary: Port addmm to structure kernel Follow ups - migrate `mm` and `addbmm` to structure - move TORCH_CHECKS currently in `addmm_cpu_impl_` and `addmm_out_cuda_impl` to meta Pull Request resolved: pytorch#57417 Reviewed By: bdhirsh Differential Revision: D28291001 Pulled By: walterddr fbshipit-source-id: 4eafaa30a465e225fbb4d2a69a36f1e037df9122
addmm: port to structured kernel
|
|
||
| auto names = at::namedinference::propagate_names_for_addmm(mat1, mat2, self); | ||
| set_output(0, IntArrayRef({mat1.sizes()[0], mat2.sizes()[1]}), {}, self.options(), names); | ||
| auto result = maybe_get_output(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for future reference, this should have been const auto& to avoid a refcount bump -- maybe_get_output returns const Tensor&
Port addmm to structure kernel
Follow ups
mmandaddbmmto structureaddmm_cpu_impl_andaddmm_out_cuda_implto meta