Skip to content

Conversation

@XiaobingSuper
Copy link
Collaborator

@XiaobingSuper XiaobingSuper commented Apr 23, 2020

Stack from ghstack:

Differential Revision: D22440965

@dr-ci
Copy link

dr-ci bot commented Apr 23, 2020

💊 CI failures summary and remediations

As of commit 78bb6d6 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 76 times.

@zhangguanheng66 zhangguanheng66 added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: mkldnn Related to Intel IDEEP or oneDNN (a.k.a. mkldnn) integration labels Apr 23, 2020
XiaobingSuper added a commit that referenced this pull request Apr 24, 2020
ghstack-source-id: c7d4b27
Pull Request resolved: #37147
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!
Just some comments on the test and missing checks.

ideep::batch_normalization_backward::compute(
x, m, v, grady, w, gradx, gradw, gradb, eps);

if (weight.is_mkldnn()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be a lot of assumptions on the input types here. Can we have the corresponding checks both for the forward and backward functions?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, just using second path.

affine=affine,
track_running_stats=track_running_stats).float().train(train)
if (train or not track_running_stats):
mkldnn_bn = copy.deepcopy(bn)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why in this case you don't send the module to mkldnn?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For training case, module's parameters awalys a dense tensor, not need to call mkldnn_utils.to_mkldnn

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ho ok. But why is track_running_stats here?

loss2.backward()
self.assertEqual(x1.grad, x2.grad.to_dense())
np.testing.assert_allclose(
bn.weight.grad, mkldnn_bn.weight.grad, rtol=1e-3, atol=1e-3)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the tolerance so high here give that you assert that y1, y2 are exactly equal below?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diff_weight is computed by Sum_over_MB*H*W(diff_dst[i] * x_normalized[i]), the summation order of float number may get different number, MKLDNN will the whole job to a piece job for using muti-thread, the order sum may different with the native path, I also test cuda case, there also has same problem. but for y1, y2, there just has element wise operation, there has't big difference for MKLDNN and native path.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But given that you work with small Tensors and the only problem is ordering, a tolerance of 1e-5 should be enough no?

qiuxin2012 pushed a commit to qiuxin2012/pytorch that referenced this pull request Jul 27, 2020
ghstack-source-id: a122475
Pull Request resolved: pytorch#37147
@facebook-github-bot
Copy link
Contributor

Hi @XiaobingSuper!

Thank you for your pull request. We require contributors to sign our Contributor License Agreement, and yours needs attention.

You currently have a record in our system, but we do not have a signature on file.

In order for us to review and merge your code, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

@facebook-github-bot facebook-github-bot deleted the gh/xiaobingsuper/15/head branch February 12, 2021 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed module: mkldnn Related to Intel IDEEP or oneDNN (a.k.a. mkldnn) integration open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants