-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Add aten mkldnn conv2d backward operator #20567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@bddppq Hi, the backward integration is done and please review the code. |
a6870f6 to
35ef05b
Compare
Summary: ### mkldnn backward ops list: - [ ] \(#20567) Add aten mkldnn conv2d backward operator 💛 - [ ] \(#20570) Add aten mkldnn backward ops: relu, linear and reshape 💛 - [ ] \(#20571) Add aten mkldnn backward ops: max_pool2d, avg_pool2d and adaptive_avg_poo2d 💛 - [ ] \(#20572) Add aten mkldnn batchnorm backward operator 💛 - [ ] \(#20573) Add aten mkldnn zero_ operator:yellow_heart: - [ ] \(#20575) Add mkldnn mul operator 💛 Pull Request resolved: #20575 Differential Revision: D15799529 Pulled By: bddppq fbshipit-source-id: 4887d8ef1a0e316ad9db199b657d9481fc13e486
Summary: ### mkldnn backward ops list: - [ ] \(pytorch/pytorch#20567) Add aten mkldnn conv2d backward operator 💛 - [ ] \(pytorch/pytorch#20570) Add aten mkldnn backward ops: relu, linear and reshape 💛 - [ ] \(pytorch/pytorch#20571) Add aten mkldnn backward ops: max_pool2d, avg_pool2d and adaptive_avg_poo2d 💛 - [ ] \(pytorch/pytorch#20572) Add aten mkldnn batchnorm backward operator 💛 - [ ] \(pytorch/pytorch#20573) Add aten mkldnn zero_ operator:yellow_heart: - [ ] \(pytorch/pytorch#20575) Add mkldnn mul operator 💛 Pull Request resolved: pytorch/pytorch#20575 Differential Revision: D15799529 Pulled By: bddppq fbshipit-source-id: 4887d8ef1a0e316ad9db199b657d9481fc13e486
Summary: ### mkldnn backward ops list: - [ ] \(#20567) Add aten mkldnn conv2d backward operator 💛 - [ ] \(#20570) Add aten mkldnn backward ops: relu, linear and reshape 💛 - [ ] \(#20571) Add aten mkldnn backward ops: max_pool2d, avg_pool2d and adaptive_avg_poo2d 💛 - [ ] \(#20572) Add aten mkldnn batchnorm backward operator 💛 - [ ] \(#20573) Add aten mkldnn zero_ operator:yellow_heart: - [ ] \(#20575) Add mkldnn mul operator 💚 Pull Request resolved: #20573 Differential Revision: D15820477 Pulled By: bddppq fbshipit-source-id: 35d95f5b4e013c8db1911f52148550a2e40a2e68
Summary: ### mkldnn backward ops list: - [ ] \(pytorch/pytorch#20567) Add aten mkldnn conv2d backward operator 💛 - [ ] \(pytorch/pytorch#20570) Add aten mkldnn backward ops: relu, linear and reshape 💛 - [ ] \(pytorch/pytorch#20571) Add aten mkldnn backward ops: max_pool2d, avg_pool2d and adaptive_avg_poo2d 💛 - [ ] \(pytorch/pytorch#20572) Add aten mkldnn batchnorm backward operator 💛 - [ ] \(pytorch/pytorch#20573) Add aten mkldnn zero_ operator:yellow_heart: - [ ] \(pytorch/pytorch#20575) Add mkldnn mul operator 💚 Pull Request resolved: pytorch/pytorch#20573 Differential Revision: D15820477 Pulled By: bddppq fbshipit-source-id: 35d95f5b4e013c8db1911f52148550a2e40a2e68
dzhulgakov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general looks ok. Two questions:
- CI is failing
- Did you benchmark potential perf change from introducing ideep in the backward path? it should be ok, but best to double check
|
@dzhulgakov so what is the proposed system config for training? double sockets or single socket? However, we have a known issue here that memory buffer is not cached (output tensors need to be allocated everytime), this is going to add significant overhead for the training scenario. Customized memory allocator is a solution but seems to be against the current design. Suggestions? |
@dzhulgakov The performance issue @mingfeima pointed out is caused by |
35ef05b to
d19eaab
Compare
|
Runing this benchmark on SKX-6148 with 2 sockets, we can get 2X performance improvement using mkldnn compared to native path:
The following is the details logs:
Next step, I will share the performance using caching allocator. Thanks! |
|
Adding the performance using idepp caching allocator on SKX-6148 with 2 sockets, there has 1.36x performance improvement at least for inference with large batch size and has 1.32x performance improvement at least for training.
|
d19eaab to
1db67f7
Compare
|
Adding CPU caching allocator is something we also discussed. The problem is mostly alleviated by using better malloc - e.g. using jemalloc (http://jemalloc.net/jemalloc.3.html). Unfortunately, I don't think there's a safe way to package it with prebuilt pytorch binaries - it has to be preloaded for the entire Python process. Overall, this PR looks pretty good to me, also cc @zheng-xq to take a look |
|
@zheng-xq , can you help review it? |
1db67f7 to
dc4a8bd
Compare
dc4a8bd to
82ac1f3
Compare
|
@bddppq, can you help review this code, perharps we can first merge this PR which can unify the mkldnn convolution code. Thanks! |
|
@pytorchbot rebase this please |
82ac1f3 to
704376d
Compare
704376d to
3144fc6
Compare
|
@VitalyFedyunin, I rebase the code again, can you help review it when you have free time? Thanks! |
|
@VitalyFedyunin, the test failed cases are not related to this PR. Thanks! |
86c3d42 to
69e75e9
Compare
69e75e9 to
66d8a92
Compare
|
@dzhulgakov regarding the jemalloc. We tried out Jemalloc and TCmalloc, but both creates extra dependencies and doesn’t support NUMA. TF expose NUMA as subdevice and user can do NUMA-aware memory allocation. The memory allocator makes big difference when running trainning or offline inference with large batch size, since malloc may trigger large overhead since the malloc of large chunk memory causes clear page overhead. We observed +30% benefit on throughput. Are there any plan for Pytorch to support similar capability, if not, do you think it would work for us to implement one and PR it? |
|
This PR reopened in #36121. |
…backward operator.
refs:pytorch#14 feat: [v1.5.0][pytorch#20567] Add aten mkldnn conv2d backward operator. See merge request postk_dl/pytorch!12
mkldnn backward ops list:
Enable mkldnn backward which can improve the traning performance about 2x for mode resnext101.