speed up kl div loss #10336

li-roy · 2018-08-07T23:33:18Z

Moved kl div loss to aten.

benchmarks for 5000 iterations on input size (1000,100)

New

cuda:
forward [0.9736350309103727, 0.9922929517924786, 0.9694818360731006]
input requires_grad=True:
backward [0.5595634011551738, 0.558339926879853, 0.5546616851352155]
double backward [1.2445648494176567, 1.2245905152522027, 1.2349751549772918]
target requires_grad=True:
backward (new C++) [0.9489959231577814, 0.9553070571273565, 0.9556351029314101]
double backward (new C++) [1.8184774098917842, 1.8164670099504292, 1.845708406995982]

cpu:
forward (new C++) [7.892430987209082, 8.3068826389499, 7.985283812973648]
input requires_grad=True:
backward (new C++) [4.328460982069373, 4.45323242014274, 4.27946363389492]
double backward (new C++) [5.153504415880889, 4.629372010007501, 4.712803596165031]
target requires_grad=True:
backward (new C++) [3.4181493939831853, 3.3771288259886205, 3.7086612950079143]
double backward (new C++) [0.21922698011621833, 0.1858532396145165, 0.19477044604718685]

Old

cuda:
forward [3.101281268056482, 3.068499860819429, 3.0527669726870954]
input requires_grad=True:
backward [0.5650290949270129, 0.5730433077551425, 0.5588279226794839]
double backward [1.1287697306834161, 1.13834543293342, 1.1298578432761133]
target requires_grad=True:
backward [0.9470391101203859, 0.9560198178514838, 0.9750375030562282]
double backward [1.85760727385059, 1.7989214668050408, 1.788982989732176]

cpu:
forward (new C++) [12.474591840058565, 12.511441555805504, 12.666544185951352]
input requires_grad=True:
backward (new C++) [7.660991386976093, 7.449987292289734, 7.513917901087552]
double backward (new C++) [4.073225498665124, 4.264980792999268, 4.429787891916931]
target requires_grad=True:
backward (new C++) [3.448499082121998, 3.9072313378565013, 3.2433970272541046]
double backward (new C++) [2.126378359273076, 1.9045450473204255, 1.7932004742324352]

ssnl · 2018-08-08T00:20:09Z

Nice. Is the benchmark on cpu or cuda?

li-roy · 2018-08-08T04:37:08Z

updated with more complete benchmark info

facebook-github-bot

li-roy has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

li-roy · 2018-08-09T19:22:33Z

@pytorchbot retest this please

aten/src/ATen/nn.yaml

aten/src/ATen/native/Loss.cpp

facebook-github-bot

li-roy has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ssnl

Looks good. But please make sure that maths is correct before merging.

aten/src/ATen/native/Loss.cpp


 namespace at { namespace native {

+Tensor apply_loss_reduction(const Tensor& unreduced, int64_t reduction) {


aten/src/ATen/native/Loss.cpp

+}
+
+Tensor kl_div_backward_cpu(const Tensor& grad, const Tensor& input, const Tensor& target, int64_t reduction) {
+  auto grad_input = grad.type().zeros_like(input);


aten/src/ATen/native/Loss.cpp

-    return output.sum() / output.numel();
-  } else if (reduction == Reduction::Sum) {
-    return output.sum();
+    return grad_input / input.numel();


facebook-github-bot

li-roy has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

li-roy · 2018-08-26T07:15:50Z

@pytorchbot retest this please

Summary: Moved kl div loss to aten. benchmarks for 5000 iterations on input size (1000,100) New ``` cuda: forward [0.9736350309103727, 0.9922929517924786, 0.9694818360731006] input requires_grad=True: backward [0.5595634011551738, 0.558339926879853, 0.5546616851352155] double backward [1.2445648494176567, 1.2245905152522027, 1.2349751549772918] target requires_grad=True: backward (new C++) [0.9489959231577814, 0.9553070571273565, 0.9556351029314101] double backward (new C++) [1.8184774098917842, 1.8164670099504292, 1.845708406995982] cpu: forward (new C++) [7.892430987209082, 8.3068826389499, 7.985283812973648] input requires_grad=True: backward (new C++) [4.328460982069373, 4.45323242014274, 4.27946363389492] double backward (new C++) [5.153504415880889, 4.629372010007501, 4.712803596165031] target requires_grad=True: backward (new C++) [3.4181493939831853, 3.3771288259886205, 3.7086612950079143] double backward (new C++) [0.21922698011621833, 0.1858532396145165, 0.19477044604718685] ``` Old ``` cuda: forward [3.101281268056482, 3.068499860819429, 3.0527669726870954] input requires_grad=True: backward [0.5650290949270129, 0.5730433077551425, 0.5588279226794839] double backward [1.1287697306834161, 1.13834543293342, 1.1298578432761133] target requires_grad=True: backward [0.9470391101203859, 0.9560198178514838, 0.9750375030562282] double backward [1.85760727385059, 1.7989214668050408, 1.788982989732176] cpu: forward (new C++) [12.474591840058565, 12.511441555805504, 12.666544185951352] input requires_grad=True: backward (new C++) [7.660991386976093, 7.449987292289734, 7.513917901087552] double backward (new C++) [4.073225498665124, 4.264980792999268, 4.429787891916931] target requires_grad=True: backward (new C++) [3.448499082121998, 3.9072313378565013, 3.2433970272541046] double backward (new C++) [2.126378359273076, 1.9045450473204255, 1.7932004742324352] ``` Pull Request resolved: pytorch/pytorch#10336 Differential Revision: D9213636 Pulled By: li-roy fbshipit-source-id: 27cc530f6276f58d35dc7a1d56dfc758a0fc4a7b

Summary: Moved kl div loss to aten. benchmarks for 5000 iterations on input size (1000,100) New ``` cuda: forward [0.9736350309103727, 0.9922929517924786, 0.9694818360731006] input requires_grad=True: backward [0.5595634011551738, 0.558339926879853, 0.5546616851352155] double backward [1.2445648494176567, 1.2245905152522027, 1.2349751549772918] target requires_grad=True: backward (new C++) [0.9489959231577814, 0.9553070571273565, 0.9556351029314101] double backward (new C++) [1.8184774098917842, 1.8164670099504292, 1.845708406995982] cpu: forward (new C++) [7.892430987209082, 8.3068826389499, 7.985283812973648] input requires_grad=True: backward (new C++) [4.328460982069373, 4.45323242014274, 4.27946363389492] double backward (new C++) [5.153504415880889, 4.629372010007501, 4.712803596165031] target requires_grad=True: backward (new C++) [3.4181493939831853, 3.3771288259886205, 3.7086612950079143] double backward (new C++) [0.21922698011621833, 0.1858532396145165, 0.19477044604718685] ``` Old ``` cuda: forward [3.101281268056482, 3.068499860819429, 3.0527669726870954] input requires_grad=True: backward [0.5650290949270129, 0.5730433077551425, 0.5588279226794839] double backward [1.1287697306834161, 1.13834543293342, 1.1298578432761133] target requires_grad=True: backward [0.9470391101203859, 0.9560198178514838, 0.9750375030562282] double backward [1.85760727385059, 1.7989214668050408, 1.788982989732176] cpu: forward (new C++) [12.474591840058565, 12.511441555805504, 12.666544185951352] input requires_grad=True: backward (new C++) [7.660991386976093, 7.449987292289734, 7.513917901087552] double backward (new C++) [4.073225498665124, 4.264980792999268, 4.429787891916931] target requires_grad=True: backward (new C++) [3.448499082121998, 3.9072313378565013, 3.2433970272541046] double backward (new C++) [2.126378359273076, 1.9045450473204255, 1.7932004742324352] ``` Pull Request resolved: pytorch#10336 Differential Revision: D9213636 Pulled By: li-roy fbshipit-source-id: 27cc530f6276f58d35dc7a1d56dfc758a0fc4a7b

li-roy requested review from apaszke, colesbury, ezyang, gchanan, soumith and zdevito as code owners August 7, 2018 23:33

facebook-github-bot reviewed Aug 8, 2018

View reviewed changes

ssnl reviewed Aug 14, 2018

View reviewed changes

aten/src/ATen/nn.yaml Outdated

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

ailzhang reviewed Aug 15, 2018

View reviewed changes

aten/src/ATen/native/Loss.cpp Outdated

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

ailzhang added the ready for review (this tag is deprecated) All PRs are ready for review unless they are draft, WIP, or have undismissed requested changes label Aug 21, 2018

Roy Li added 5 commits August 23, 2018 16:13

speed up kl div loss

773ef1d

try to fix compile

0376459

define cuda kernel separately

1a2c6b8

fix backwards

7241711

fix name

02418d9

li-roy force-pushed the kldiv branch from 76bf3b0 to 02418d9 Compare August 24, 2018 00:08

facebook-github-bot reviewed Aug 24, 2018

View reviewed changes

ssnl approved these changes Aug 24, 2018

View reviewed changes

address comments

ab13c70

facebook-github-bot reviewed Aug 25, 2018

View reviewed changes

facebook-github-bot closed this in f2bb9f0 Aug 27, 2018

ezyang added the merged label Jun 26, 2019


		namespace at { namespace native {

		Tensor apply_loss_reduction(const Tensor& unreduced, int64_t reduction) {

speed up kl div loss #10336

speed up kl div loss #10336

Uh oh!

Conversation

li-roy commented Aug 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ssnl commented Aug 8, 2018

Uh oh!

li-roy commented Aug 8, 2018

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

li-roy commented Aug 9, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

ssnl left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

li-roy commented Aug 26, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

li-roy commented Aug 7, 2018 •

edited

Loading