-
Notifications
You must be signed in to change notification settings - Fork 26.3k
add mkldnn linear backward #36122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add mkldnn linear backward #36122
Conversation
[ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit 1651eef (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 99 times. |
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
| grad3 = grad3.t(); | ||
| } | ||
| } else { | ||
| grad1 = maybe_multiply(grad_output, beta); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to do the same checks based on grad_input_mask here to avoid computing gradients that are not required.
tools/autograd/derivatives.yaml
Outdated
| - name: mkldnn_convolution_backward(Tensor self, Tensor grad_output, Tensor weight, int[] padding, int[] stride, int[] dilation, int groups, bool[3] output_mask) -> (Tensor, Tensor, Tensor) | ||
| grad_output, self, weight: _convolution_double_backward(grads[0], grads[1], grads[2], grad_output, weight, self, stride, padding, dilation, false, std::vector<int64_t>(padding.size(), 0), groups, false, false, false, grad_input_mask) | ||
|
|
||
| - name: mkldnn_linear(Tensor self, Tensor weight, Tensor? bias=None) -> Tensor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need this one here? Can this be called outside of the addmm function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not used now, but when we port aten::linear, this will be needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we keep this out until then? Every entry here adds a lot of generated code. So if we can keep it low, it's better :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
| "mkldnn_linear: input needs to has dim at least 2, input dim ", self.dim()); | ||
| TORCH_CHECK(self.is_mkldnn(), | ||
| "mkldnn_linear: input needs to be mkldnn layout"); | ||
| TORCH_CHECK(weight.is_mkldnn() && bias.is_mkldnn(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The backward formulas seems to actually assume that the weights are no mkldnn Tensors (because mkldnn_linear_backward_weights always convert to dense() the outputs before returning them). Is that true? If not, the backward formula should be updated to make sure it returns the same type as the weights.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For backward path, we want to support two method.
- input is a MKLDNN tensor, weight is a dense tensor, this our primary method, but this method always has weight reorder at each iteration.
- input is a MKLDNN tensor, weight is a MKLDNN tensor by call model.to_mkldnn() to transfer weight to a MKLDNN tensor before training, \this method will reduce the weight reorder time, but for the optimizer step, we may meet some ops not support MKLDNN tensor, and also have a problem when we save the model: torch.save can't save MKLDNN tensor, we may can solve those problems to get more better performance in the future .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, So the backward formula needs to be updated to properly support case 2 right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for confuse you, for the first step, backward formula support case 1, I remove this check because: for inference, input , weight and bias are MKLDNN tensor, for training, input is MKLDNN tensor, weight and bias are dense tensor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
Could you add the corresponding checks for the backward function to be sure that you have what you want. Nothing else prevent the user from doing training with weight/bias as MKLDNN Tensors at the moment right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add some checks for backward function.
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
albanD
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. Thanks for updating.
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
Differential Revision: [D22440966](https://our.internmc.facebook.com/intern/diff/D22440966) [ghstack-poisoned]
Differential Revision: [D22440966](https://our.internmc.facebook.com/intern/diff/D22440966) [ghstack-poisoned]
@albanD , may I know if the PR can be merged? Thanks |
|
Hey, Yes this is in the process. |
ghstack-source-id: 5ca906b Pull Request resolved: pytorch#36122
|
Hey, @albanD , anything we can help in our side for the CI errors, please feel free to let us know. |
|
Hi @XiaobingSuper! Thank you for your pull request. We require contributors to sign our Contributor License Agreement, and yours needs attention. You currently have a record in our system, but we do not have a signature on file. In order for us to review and merge your code, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. If you have received this in error or have any questions, please contact us at [email protected]. Thanks! |
Stack from ghstack:
Differential Revision: D22440966