Skip to content

Conversation

@gujinghui
Copy link
Collaborator

@gujinghui gujinghui commented May 16, 2019

  1. reduce the overhead of mkldnn-bridge itself
  2. remove redundant code and useless APIs
  3. provide new operators, including int8 inner_product, ND permute/transpose, elem_add/mul, and etc.
  4. improve inner_product to support io format weights without implicit reorder
  5. add SoftMax support

@gujinghui
Copy link
Collaborator Author

@Jianhui-Li
@jgong5
@uyongw

@gujinghui
Copy link
Collaborator Author

The failure is not related to this change.

May 17 10:59:12 ======================================================================
May 17 10:59:12 FAIL: test_HalfTensor_tril_zero_stride (main.TestCuda)
May 17 10:59:12 ----------------------------------------------------------------------
May 17 10:59:12 Traceback (most recent call last):
May 17 10:59:12 File "/var/lib/jenkins/workspace/test/common_utils.py", line 338, in wrapper
May 17 10:59:12 method(*args, **kwargs)
May 17 10:59:12 File "test_cuda.py", line 648, in tmp
May 17 10:59:12 self.assertEqual(cpu_tensor, gpu_tensor, precision)
May 17 10:59:12 File "/var/lib/jenkins/workspace/test/common_utils.py", line 483, in assertEqual
May 17 10:59:12 assertTensorsEqual(x, y)
May 17 10:59:12 File "/var/lib/jenkins/workspace/test/common_utils.py", line 475, in assertTensorsEqual
May 17 10:59:12 self.assertLessEqual(max_err, prec, message)
May 17 10:59:12 AssertionError: tensor(nan, dtype=torch.float32) not less than or equal to 1e-05 :
May 17 10:59:12

@gujinghui
Copy link
Collaborator Author

@yinghai
Could you help review this PR.

Thanks a lot.

@yinghai
Copy link
Contributor

yinghai commented May 21, 2019

We need to update internal ideep before accepting.

@gujinghui gujinghui changed the title upgrade mkldnn-bridge upgrade MKLDNN to v0.19 and mkldnn-bridge May 27, 2019
@gujinghui
Copy link
Collaborator Author

@yinghai
Could you help review this PR?
The MKLDNN is upgraded to v0.19.

Thanks a lot.

@gujinghui gujinghui force-pushed the upgrade_bridge branch 2 times, most recently from 7fec349 to be814f0 Compare May 28, 2019 06:29
@gujinghui
Copy link
Collaborator Author

rebased to latest code base

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@bddppq
Copy link
Contributor

bddppq commented Jun 3, 2019

We have run into the this error when compiling mkl-dnn v0.19 from source:

cpu/ref_softmax.cpp:171:25: error: 'VML_ERRMODE_NOERR' was not declared in this scope
         vmsExp(n, a, r, VML_ERRMODE_NOERR);
                         ^~~~~~~~~~~~~~~~~

@gujinghui
Copy link
Collaborator Author

@bddppq looks like the mkl version is diff. could you help check which version is your mkl on your machine?

@bddppq
Copy link
Contributor

bddppq commented Jun 4, 2019

@gujinghui it's 2017.0.098

@pytorchbot pytorchbot added the module: build Build system issues label Jun 4, 2019
@gujinghui
Copy link
Collaborator Author

@bddppq @VitalyFedyunin

Should be fixed. Pls try again. Thanks.

@bddppq
Copy link
Contributor

bddppq commented Jun 4, 2019

@gujinghui Where did you put the fix? The release tag v0.19 in mkl-dnn repo is still pointing to the same commit.

@yinghai
Copy link
Contributor

yinghai commented Jun 4, 2019

#define INTEL_MKL_VERSION 20170003

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bddppq
disabled this flag for MKLDNN, which caused the build failure.
This flag didn't impact on pytorch any more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does disabling MKL have perf implications on MKLDNN?

Copy link
Collaborator Author

@gujinghui gujinghui Jun 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to below table from MKLDNN, should be no perf change.
Anyway, we use jit impl.

  /* USE_MKL      USE_CBLAS       effect
   * -------      ---------       ------
   * yes          yes             use Intel(R) MKL CBLAS
   * yes          no              use jit
   * no           yes             system-dependent CBLAS
   * no           no              use jit
   */

@gujinghui
Copy link
Collaborator Author

@bddppq @yinghai
Move softmax into mkldnn-bridge. Pls help review.

@gujinghui
Copy link
Collaborator Author

@pytorchbot rebase this please

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bddppq has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bddppq has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@jeffreyksmithjr jeffreyksmithjr added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 10, 2019
@bddppq
Copy link
Contributor

bddppq commented Jun 11, 2019

@gujinghui We have built mkldnn v0.19 from source (with -DMKLDNN_USE_MKL=OFF), and have seen noticeable perf drop when running resnext101 inference run. I think a better "fix" to the issue would be adding mkl version detection in mkldnn and fallback to old api call in environment with older version of mkl.

1. reduce the overhead of mkldnn-bridge itself
2. remove redundant code and useless APIs
3. provide new operators, including int8 inner_product,
   nD permute/transpose, ele_add/mul, and etc.
4. improve inner_product to support io format weights
   without implicit reorder
5. add softmax support

Signed-off-by: Gu, Jinghui <[email protected]>
@gujinghui gujinghui changed the title upgrade MKLDNN to v0.19 and mkldnn-bridge upgrade mkldnn-bridge Jun 11, 2019
@gujinghui
Copy link
Collaborator Author

@bddppq
The MKLDNN v0.19 is reverted. And the USE_MKL flag is set as before.
Only ideep update in this PR now. Pls help review.

Per compatibility issue bwt v0.19 and old MKL, we're working on it now.
Will return to you if any update.

Thanks a lot.

@bddppq
Copy link
Contributor

bddppq commented Jun 11, 2019

@gujinghui Thanks!
Thought currently in this PR the mkl-dnn submodule (within the ideep submodule) is still updated.

@gujinghui
Copy link
Collaborator Author

@gujinghui Thanks!
Thought currently in this PR the mkl-dnn submodule (within the ideep submodule) is still updated.

Yes. The trivial change is seen in ideep in this PR.
But the commit ID of mkl-dnn should be identical as before.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bddppq has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Jun 11, 2019
Summary:
1. reduce the overhead of mkldnn-bridge itself
2. remove redundant code and useless APIs
3. provide new operators, including int8 inner_product,  ND permute/transpose, elem_add/mul, and etc.
4. improve inner_product to support io format weights without implicit reorder
5. add SoftMax support
Pull Request resolved: pytorch/pytorch#20569

Reviewed By: houseroad

Differential Revision: D15558663

Pulled By: bddppq

fbshipit-source-id: 79a63aa139037924e9ffb1069f7e7f1d334efe3a
@facebook-github-bot
Copy link
Contributor

@bddppq merged this pull request in 731670f.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

caffe2 Merged module: build Build system issues module: mkldnn Related to Intel IDEEP or oneDNN (a.k.a. mkldnn) integration module: third_party open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants