Skip to content

Conversation

@XiaobingSuper
Copy link
Collaborator

@XiaobingSuper XiaobingSuper commented Apr 7, 2020

Stack from ghstack:

Differential Revision: D22440969

@XiaobingSuper XiaobingSuper requested review from VitalyFedyunin and removed request for albanD and apaszke April 7, 2020 03:14
@dr-ci
Copy link

dr-ci bot commented Apr 7, 2020

💊 CI failures summary and remediations

As of commit 81fc4cc (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 87 times.

@XiaobingSuper
Copy link
Collaborator Author

@VitalyFedyunin , those PRs #20567 #20570 #20571 #20572 are tool old , I will re-pull them. thanks!

@XiaobingSuper
Copy link
Collaborator Author

XiaobingSuper commented Apr 7, 2020

@gchanan gchanan added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Apr 8, 2020
@XiaobingSuper
Copy link
Collaborator Author

@VitalyFedyunin , could you help review this code?

@VitalyFedyunin
Copy link
Contributor

Are we fine with BC-incompatible changes in mkldnn_convolution_backward_weights operator?

What is the purpose of ideep update?

@XiaobingSuper
Copy link
Collaborator Author

XiaobingSuper commented Apr 26, 2020

The purpose of ideep update:

  1. support conv3d forward and backward
  2. solve group convolution backward perfromance degradation
  3. solve batchnorm backward issue when input is nchw4c

Why I change mkldnn_convolution_backward_weights? for bf16 backward path, the weight can be a fp32(just convert input to float32 MKLDNN tensor) or bf16 tensor(convert input to a bf16 MKLDNN tensor and call model.to_mkldnn(torch.bfloat16)), so I need get the data_type from this weight parameters.

@XiaobingSuper XiaobingSuper requested a review from albanD April 26, 2020 02:12
@VitalyFedyunin
Copy link
Contributor

Test errors seems to be mkldnn related

Traceback (most recent call last):
  File "test_mkldnn.py", line 149, in test_conv2d
    rtol=1e-5)
  File "C:\Users\circleci\project\build\win_tmp\build\torch\testing\_internal\common_utils.py", line 928, in assertEqual
    assertTensorsEqual(x, y)
  File "C:\Users\circleci\project\build\win_tmp\build\torch\testing\_internal\common_utils.py", line 890, in assertTensorsEqual
    torch.testing.assert_allclose(a, b, atol=atol, rtol=rtol, equal_nan=True, msg=message)
  File "C:\Users\circleci\project\build\win_tmp\build\torch\testing\__init__.py", line 60, in assert_allclose
    raise AssertionError(msg)
AssertionError: Not within tolerance rtol=1e-05 atol=1e-05 at input[0, 1, 1, 2] (-1572.8182373046875 vs. -1572.835693359375) and 0 other locations (5.00%)

@VitalyFedyunin
Copy link
Contributor

Please whitelist changed operator in

mkldnn_conv2d = mkldnn_utils.to_mkldnn(copy.deepcopy(conv2d))
for train in [True, False]:
for bias in [True, False]:
conv2d = torch.nn.Conv2d(in_channels=C,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like indent problem for me.

Copy link
Contributor

@VitalyFedyunin VitalyFedyunin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, need some tests alterations.

('aten::__or__', datetime.date(2020, 6, 30)),
('aten::__xor__', datetime.date(2020, 6, 30)),
('aten::split', datetime.date(2020, 6, 30)),
('aten::mkldnn_convolution_backward_weights', datetime.date(2020, 6, 30)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry you need to update this date before landing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed.

@VitalyFedyunin
Copy link
Contributor

Land fails with:

Command failed with exit code 1.

stderr: caffe2/aten/src/ATen/native/mkldnn/Conv.cpp:159:5: error: no matching function for call to 'compute'
    ideep::convolution_backward_weights::compute(
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 [removed]/build/ideep/include/ideep/operators/conv.hpp:476:15: note: candidate function not viable: no known conversion from 'dnnl::memory::data_type' to 'ideep::algorithm' (aka 'dnnl::algorithm') for 11th argument
  static void compute(const tensor& src,
              ^
[removed]/build/ideep/include/ideep/operators/conv.hpp:493:15: note: candidate function not viable: no known conversion from 'ideep::tensor' to 'const ideep::dims' (aka 'const vector<long>') for 5th argument
  static void compute(const tensor& src,
              ^
caffe2/aten/src/ATen/native/mkldnn/Conv.cpp:172:5: error: no matching function for call to 'compute'
    ideep::convolution_backward_weights::compute(
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[removed]/build/ideep/include/ideep/operators/conv.hpp:493:15: note: candidate function not viable: no known conversion from 'dnnl::memory::data_type' to 'ideep::algorithm' (aka 'dnnl::algorithm') for 10th argument
  static void compute(const tensor& src,
              ^
[removed]/build/ideep/include/ideep/operators/conv.hpp:476:15: note: candidate function not viable: no known conversion from 'std::vector<long>' to 'ideep::tensor &' for 5th argument
  static void compute(const tensor& src,
              ^
2 errors generated.

@XiaobingSuper
Copy link
Collaborator Author

@VitalyFedyunin , this error seems not find the corresponding function, I can reproduce this error by checkout old ideep commit(ca7b718), could you help check the ideep commit id which used by the failed case? PyTorch master use 938cc68 now. Thanks!

qiuxin2012 pushed a commit to qiuxin2012/pytorch that referenced this pull request Jul 27, 2020
@ZhuJewel
Copy link

@VitalyFedyunin , please let us know if you need any help for the errors related to the ideep. Thanks.

@facebook-github-bot
Copy link
Contributor

Hi @XiaobingSuper!

Thank you for your pull request. We require contributors to sign our Contributor License Agreement, and yours needs attention.

You currently have a record in our system, but we do not have a signature on file.

In order for us to review and merge your code, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

@VitalyFedyunin
Copy link
Contributor

Hi! I'm ready to land it. Can you please rebase to make sure we avoid merge conflicts.

@jgong5
Copy link
Collaborator

jgong5 commented Dec 12, 2020

Hi! I'm ready to land it. Can you please rebase to make sure we avoid merge conflicts.

@VitalyFedyunin Thank you. We created a new PR #48994 to ease the rebase. I copied you there. This old one is supposed to be closed. Thanks.

@VitalyFedyunin
Copy link
Contributor

Are you planning to port the remaining 4 PRs?

@Jianhui-Li
Copy link

Are you planning to port the remaining 4 PRs?

I think so. @XiaobingSuper

@facebook-github-bot facebook-github-bot deleted the gh/xiaobingsuper/12/head branch January 15, 2021 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants