Skip to content

Conversation

@XiaobingSuper
Copy link
Collaborator

@XiaobingSuper XiaobingSuper commented Apr 7, 2020

Stack from ghstack:

Differential Revision: D22440966

[ghstack-poisoned]
@dr-ci
Copy link

dr-ci bot commented Apr 7, 2020

💊 CI failures summary and remediations

As of commit 1651eef (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 99 times.

XiaobingSuper added a commit that referenced this pull request Apr 7, 2020
ghstack-source-id: fc475c3
Pull Request resolved: #36122
XiaobingSuper added a commit that referenced this pull request Apr 8, 2020
ghstack-source-id: 3964037
Pull Request resolved: #36122
@gchanan gchanan added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Apr 8, 2020
XiaobingSuper added a commit that referenced this pull request Apr 9, 2020
ghstack-source-id: 5ee3fb5
Pull Request resolved: #36122
XiaobingSuper added a commit that referenced this pull request Apr 23, 2020
ghstack-source-id: 84980ad
Pull Request resolved: #36122
grad3 = grad3.t();
}
} else {
grad1 = maybe_multiply(grad_output, beta);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to do the same checks based on grad_input_mask here to avoid computing gradients that are not required.

- name: mkldnn_convolution_backward(Tensor self, Tensor grad_output, Tensor weight, int[] padding, int[] stride, int[] dilation, int groups, bool[3] output_mask) -> (Tensor, Tensor, Tensor)
grad_output, self, weight: _convolution_double_backward(grads[0], grads[1], grads[2], grad_output, weight, self, stride, padding, dilation, false, std::vector<int64_t>(padding.size(), 0), groups, false, false, false, grad_input_mask)

- name: mkldnn_linear(Tensor self, Tensor weight, Tensor? bias=None) -> Tensor
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this one here? Can this be called outside of the addmm function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not used now, but when we port aten::linear, this will be needed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we keep this out until then? Every entry here adds a lot of generated code. So if we can keep it low, it's better :D

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

"mkldnn_linear: input needs to has dim at least 2, input dim ", self.dim());
TORCH_CHECK(self.is_mkldnn(),
"mkldnn_linear: input needs to be mkldnn layout");
TORCH_CHECK(weight.is_mkldnn() && bias.is_mkldnn(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backward formulas seems to actually assume that the weights are no mkldnn Tensors (because mkldnn_linear_backward_weights always convert to dense() the outputs before returning them). Is that true? If not, the backward formula should be updated to make sure it returns the same type as the weights.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For backward path, we want to support two method.

  1. input is a MKLDNN tensor, weight is a dense tensor, this our primary method, but this method always has weight reorder at each iteration.
  2. input is a MKLDNN tensor, weight is a MKLDNN tensor by call model.to_mkldnn() to transfer weight to a MKLDNN tensor before training, \this method will reduce the weight reorder time, but for the optimizer step, we may meet some ops not support MKLDNN tensor, and also have a problem when we save the model: torch.save can't save MKLDNN tensor, we may can solve those problems to get more better performance in the future .

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, So the backward formula needs to be updated to properly support case 2 right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for confuse you, for the first step, backward formula support case 1, I remove this check because: for inference, input , weight and bias are MKLDNN tensor, for training, input is MKLDNN tensor, weight and bias are dense tensor.

Copy link
Collaborator

@albanD albanD Apr 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.
Could you add the corresponding checks for the backward function to be sure that you have what you want. Nothing else prevent the user from doing training with weight/bias as MKLDNN Tensors at the moment right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some checks for backward function.

Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Thanks for updating.

@ZhuJewel
Copy link

lgtm. Thanks for updating.

@albanD , may I know if the PR can be merged? Thanks

@albanD
Copy link
Collaborator

albanD commented Jul 10, 2020

Hey,

Yes this is in the process.
We have some timeout issues on intern CI and with such a big stack it delays the land a bit. But it's on the way!

qiuxin2012 pushed a commit to qiuxin2012/pytorch that referenced this pull request Jul 27, 2020
ghstack-source-id: 5ca906b
Pull Request resolved: pytorch#36122
@ZhuJewel
Copy link

Hey, @albanD , anything we can help in our side for the CI errors, please feel free to let us know.

@facebook-github-bot
Copy link
Contributor

Hi @XiaobingSuper!

Thank you for your pull request. We require contributors to sign our Contributor License Agreement, and yours needs attention.

You currently have a record in our system, but we do not have a signature on file.

In order for us to review and merge your code, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

@facebook-github-bot facebook-github-bot deleted the gh/xiaobingsuper/13/head branch January 15, 2021 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants