-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Improve legacy QuantizedLinear functions to reduce overhead #29773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This pull request was exported from Phabricator. Differential Revision: D18494988 |
735d147 to
060b10a
Compare
|
This pull request was exported from Phabricator. Differential Revision: D18494988 |
1 similar comment
|
This pull request was exported from Phabricator. Differential Revision: D18494988 |
060b10a to
2969fb5
Compare
2969fb5 to
0fce0d9
Compare
|
This pull request was exported from Phabricator. Differential Revision: D18494988 |
0fce0d9 to
0f4de7d
Compare
|
This pull request was exported from Phabricator. Differential Revision: D18494988 |
jianyuh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Any optimizations applied here can be also applied to https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp ? Maybe another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: the spaces added in the comment: Is it some CPP coding style standard? Changing this might mess up the blame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonder if the bias.contiguous(); implementation under the hood has got such checks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the difference is if we use reference here, there will be no tensor shallow copy at all. But anyway we can change it back since it is not a serious thing.
…29773) Summary: Pull Request resolved: pytorch#29773 Improve legacy QuantizedLinear functions to reduce overhead. Separate from the stack of D18381988. Test Plan: buck test mode/dev-nosan //caffe2/test:jit -- "quant" Reviewed By: lly-zero-one Differential Revision: D18494988 fbshipit-source-id: 081687ea8d1e9c67f1213930a40a482e0090a029
0f4de7d to
d05b479
Compare
|
This pull request was exported from Phabricator. Differential Revision: D18494988 |
|
This pull request has been merged in 0995929. |
Summary: Pull Request resolved: pytorch/pytorch#29773 Improve legacy QuantizedLinear functions to reduce overhead. Separate from the stack of D18381988. Test Plan: buck test mode/dev-nosan //caffe2/test:jit -- "quant" Reviewed By: lly-zero-one Differential Revision: D18494988 fbshipit-source-id: 5627d7e8b0b7a750852eead9e28c5a9b3fa70559
jamesr66a
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the unrelated style changes make it very hard to tell what the actual functional changes from this PR are, and it messes up the blame. Please either refrain from making these changes or separate them out into a separate PR in the future
| Tensor quantized = at::native::empty_like( | ||
| weight_contig, weight_contig.options().dtype(at::kChar)); | ||
| // Tensor quantized = at::native::empty_cpu( | ||
| // weight_contig.sizes(), weight_contig.options().dtype(at::kChar)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably shouldn't be here
Summary:
Improve legacy QuantizedLinear functions to reduce overhead.
The legacy QuantizedLinear functions contains may unnecessary zeros and view ops which brings a large amount of overhead. This PR reduces them and improve the existing model performance.
Differential Revision: D18494988