Skip to content

Conversation

@walterddr
Copy link
Contributor

@walterddr walterddr commented May 1, 2021

Port addmm to structure kernel

Follow ups

  • migrate mm and addbmm to structure
  • move TORCH_CHECKS currently in addmm_cpu_impl_ and addmm_out_cuda_impl to meta

@facebook-github-bot facebook-github-bot added oncall: jit Add this issue/PR to JIT oncall triage queue cla signed labels May 1, 2021
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented May 1, 2021

💊 CI failures summary and remediations

As of commit 335025a (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@walterddr walterddr removed the oncall: jit Add this issue/PR to JIT oncall triage queue label May 1, 2021
@walterddr walterddr force-pushed the structure_kernel_addmm branch 3 times, most recently from c2ecf29 to 5f54466 Compare May 5, 2021 23:32
@codecov
Copy link

codecov bot commented May 6, 2021

Codecov Report

Merging #57417 (335025a) into master (c911c30) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master   #57417   +/-   ##
=======================================
  Coverage   76.84%   76.84%           
=======================================
  Files        1986     1986           
  Lines      197902   197902           
=======================================
+ Hits       152079   152081    +2     
+ Misses      45823    45821    -2     

@walterddr walterddr marked this pull request as ready for review May 6, 2021 15:34
@walterddr walterddr requested a review from ezyang as a code owner May 6, 2021 15:34
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little bit of a pessimization but not by much since the internals of this functional already always allocated a vector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ngimel I forgot... did I sign up to fix this in structured kernels itself 😂 🤣 (this is OK for now but shouldn't be necessary as structured kernel's set_output should do this test automatically)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't figure out robust way of deduplicating size checks in structured kernel's set_output and TensorIterator right away, and I haven't looked at it yet in more detail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok filed an issue for this #57827

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this isn't the cleanest port but it is probably about as good as it can get for now.

@ngimel
Copy link
Collaborator

ngimel commented May 7, 2021

Rong, can you please run instruction counts on it (on cpu and gpu)? addmm is perf sensitive, @swolchok spent a lot of time clamping down on its overhead, so would be bad to regress it.

@facebook-github-bot
Copy link
Contributor

@walterddr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@walterddr
Copy link
Contributor Author

sorry for the late follow up: I used benchmarks/instruction_counts suite to count instructions on (20x40 x 40x10) on regular, in-place and out= versions. here is the instruction count differential before/after.

addmm_cpu      -0.29%
addmm_cpu_     -0.02%
addmm_cpu_out   0.01%
addmm_cuda     -0.86%
addmm_cuda_    -0.53%
addmm_cuda_out  0.15%

@ngimel
Copy link
Collaborator

ngimel commented May 12, 2021

Awesome, thanks!

Rong Rong added 6 commits May 12, 2021 10:00
- move one TORCH_CHECK back to impl since it is used in other mm funcs
- skip LSTM/GRU meta test because it was reusing output then resized
- added in-place checker for output
@walterddr walterddr force-pushed the structure_kernel_addmm branch from 797fd04 to 335025a Compare May 12, 2021 21:12
@facebook-github-bot
Copy link
Contributor

@walterddr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@walterddr merged this pull request in 002ce5c.

krshrimali pushed a commit to krshrimali/pytorch that referenced this pull request May 19, 2021
Summary:
Port addmm to structure kernel

Follow ups
- migrate `mm` and `addbmm` to structure
- move TORCH_CHECKS currently in `addmm_cpu_impl_` and `addmm_out_cuda_impl` to meta

Pull Request resolved: pytorch#57417

Reviewed By: bdhirsh

Differential Revision: D28291001

Pulled By: walterddr

fbshipit-source-id: 4eafaa30a465e225fbb4d2a69a36f1e037df9122
facebook-github-bot pushed a commit that referenced this pull request May 23, 2021
Summary:
relate to #57417.

Pull Request resolved: #57755

Reviewed By: ezyang

Differential Revision: D28426111

Pulled By: walterddr

fbshipit-source-id: 943d3e36433ca846990b940177fb040553961156
@bdhirsh bdhirsh changed the title port addmm to structure kernel addmm: port to structured kernel May 24, 2021

auto names = at::namedinference::propagate_names_for_addmm(mat1, mat2, self);
set_output(0, IntArrayRef({mat1.sizes()[0], mat2.sizes()[1]}), {}, self.options(), names);
auto result = maybe_get_output(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for future reference, this should have been const auto& to avoid a refcount bump -- maybe_get_output returns const Tensor&

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants