-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[Pytorch][AO] Update choose_qparams_per_token op to output correct shape for scales and zp #136807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ape for scales and zp - also makes scales and zp dtype reconcile with meta impl as well as other quantized ops representation of scales and zero point - make sure qunatize_per_token's output_dtype is respected There are a few places where we need to reconcile on scale and zero point dtype but that will come later. This fixes are mainly being done to enable quantized kv cache though ET stack Differential Revision: [D62301840](https://our.internmc.facebook.com/intern/diff/D62301840/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136807
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 Cancelled JobAs of commit 4e93552 with merge base 2421344 ( CANCELLED JOB - The following job was cancelled. Please retry:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This pull request was exported from Phabricator. Differential Revision: D62301840 |
| input (torch.Tensor): quantized Tensor (uint8, int8 etc.) | ||
| scales (float32 torch.Tensor): quantization parameter for per token affine quantization | ||
| scales (float64 torch.Tensor): quantization parameter for per token affine quantization | ||
| zero_points (int32 torch.Tensor): quantization parameter for per token affine quantization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should zero_points be int64?
… correct shape for scales and zp" - also makes scales and zp dtype reconcile with meta impl as well as other quantized ops representation of scales and zero point - make sure qunatize_per_token's output_dtype is respected There are a few places where we need to reconcile on scale and zero point dtype but that will come later. This fixes are mainly being done to enable quantized kv cache though ET stack Differential Revision: [D62301840](https://our.internmc.facebook.com/intern/diff/D62301840/) [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D62301840 |
…ape for scales and zp Pull Request resolved: #136807 - also makes scales and zp dtype reconcile with meta impl as well as other quantized ops representation of scales and zero point - make sure qunatize_per_token's output_dtype is respected There are a few places where we need to reconcile on scale and zero point dtype but that will come later. This fixes are mainly being done to enable quantized kv cache though ET stack Differential Revision: [D62301840](https://our.internmc.facebook.com/intern/diff/D62301840/) ghstack-source-id: 245083869
|
@pytorchbot merge -f 'Landed internally' (Initiating merge automatically since Phabricator Diff has merged, using force because this PR might not pass merge_rules.json but landed internally) |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Summary: Fixes: pytorch#970 The test was broken by a recent refactor in pytorch: pytorch/pytorch#136807 Test Plan: CI Reviewers: Subscribers: Tasks: Tags:
Summary: Fixes: pytorch#970 The test was broken by a recent refactor in pytorch: pytorch/pytorch#136807 Test Plan: CI Reviewers: Subscribers: Tasks: Tags:
* Unskip `test_choose_qparams_token_asym` in 2.6 Summary: Fixes: #970 The test was broken by a recent refactor in pytorch: pytorch/pytorch#136807 Test Plan: CI Reviewers: Subscribers: Tasks: Tags: * fix
* Unskip `test_choose_qparams_token_asym` in 2.6 Summary: Fixes: pytorch#970 The test was broken by a recent refactor in pytorch: pytorch/pytorch#136807 Test Plan: CI Reviewers: Subscribers: Tasks: Tags: * fix
Stack from ghstack (oldest at bottom):
quantized ops representation of scales and zero point
There are a few places where we need to reconcile on scale and zero point dtype
but that will come later. This fixes are mainly being done to enable quantized
kv cache though ET stack
Differential Revision: D62301840