-
Notifications
You must be signed in to change notification settings - Fork 26.3k
add mkldnn batch_norm backward #37147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit 78bb6d6 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 76 times. |
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
albanD
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
Just some comments on the test and missing checks.
| ideep::batch_normalization_backward::compute( | ||
| x, m, v, grady, w, gradx, gradw, gradb, eps); | ||
|
|
||
| if (weight.is_mkldnn()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be a lot of assumptions on the input types here. Can we have the corresponding checks both for the forward and backward functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, just using second path.
| affine=affine, | ||
| track_running_stats=track_running_stats).float().train(train) | ||
| if (train or not track_running_stats): | ||
| mkldnn_bn = copy.deepcopy(bn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why in this case you don't send the module to mkldnn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For training case, module's parameters awalys a dense tensor, not need to call mkldnn_utils.to_mkldnn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ho ok. But why is track_running_stats here?
test/test_mkldnn.py
Outdated
| loss2.backward() | ||
| self.assertEqual(x1.grad, x2.grad.to_dense()) | ||
| np.testing.assert_allclose( | ||
| bn.weight.grad, mkldnn_bn.weight.grad, rtol=1e-3, atol=1e-3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the tolerance so high here give that you assert that y1, y2 are exactly equal below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
diff_weight is computed by Sum_over_MB*H*W(diff_dst[i] * x_normalized[i]), the summation order of float number may get different number, MKLDNN will the whole job to a piece job for using muti-thread, the order sum may different with the native path, I also test cuda case, there also has same problem. but for y1, y2, there just has element wise operation, there has't big difference for MKLDNN and native path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But given that you work with small Tensors and the only problem is ordering, a tolerance of 1e-5 should be enough no?
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
Differential Revision: [D22440965](https://our.internmc.facebook.com/intern/diff/D22440965) [ghstack-poisoned]
Differential Revision: [D22440965](https://our.internmc.facebook.com/intern/diff/D22440965) [ghstack-poisoned]
ghstack-source-id: a122475 Pull Request resolved: pytorch#37147
|
Hi @XiaobingSuper! Thank you for your pull request. We require contributors to sign our Contributor License Agreement, and yours needs attention. You currently have a record in our system, but we do not have a signature on file. In order for us to review and merge your code, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. If you have received this in error or have any questions, please contact us at [email protected]. Thanks! |
Stack from ghstack:
Differential Revision: D22440965