Skip to content

Conversation

@kimishpatel
Copy link
Contributor

@kimishpatel kimishpatel commented Sep 27, 2024

Stack from ghstack (oldest at bottom):

  • also makes scales and zp dtype reconcile with meta impl as well as other
    quantized ops representation of scales and zero point
  • make sure qunatize_per_token's output_dtype is respected

There are a few places where we need to reconcile on scale and zero point dtype
but that will come later. This fixes are mainly being done to enable quantized
kv cache though ET stack

Differential Revision: D62301840

…ape for scales and zp

- also makes scales and zp dtype reconcile with meta impl as well as other
quantized ops representation of scales and zero point
- make sure qunatize_per_token's output_dtype is respected

There are a few places where we need to reconcile on scale and zero point dtype
but that will come later. This fixes are mainly being done to enable quantized
kv cache though ET stack

Differential Revision: [D62301840](https://our.internmc.facebook.com/intern/diff/D62301840/)

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 27, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136807

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Cancelled Job

As of commit 4e93552 with merge base 2421344 (image):

CANCELLED JOB - The following job was cancelled. Please retry:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D62301840

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 27, 2024
input (torch.Tensor): quantized Tensor (uint8, int8 etc.)
scales (float32 torch.Tensor): quantization parameter for per token affine quantization
scales (float64 torch.Tensor): quantization parameter for per token affine quantization
zero_points (int32 torch.Tensor): quantization parameter for per token affine quantization
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should zero_points be int64?

… correct shape for scales and zp"

- also makes scales and zp dtype reconcile with meta impl as well as other
quantized ops representation of scales and zero point
- make sure qunatize_per_token's output_dtype is respected

There are a few places where we need to reconcile on scale and zero point dtype
but that will come later. This fixes are mainly being done to enable quantized
kv cache though ET stack

Differential Revision: [D62301840](https://our.internmc.facebook.com/intern/diff/D62301840/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D62301840

kimishpatel added a commit that referenced this pull request Sep 27, 2024
…ape for scales and zp

Pull Request resolved: #136807

- also makes scales and zp dtype reconcile with meta impl as well as other
quantized ops representation of scales and zero point
- make sure qunatize_per_token's output_dtype is respected

There are a few places where we need to reconcile on scale and zero point dtype
but that will come later. This fixes are mainly being done to enable quantized
kv cache though ET stack

Differential Revision: [D62301840](https://our.internmc.facebook.com/intern/diff/D62301840/)
ghstack-source-id: 245083869
@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -f 'Landed internally'

(Initiating merge automatically since Phabricator Diff has merged, using force because this PR might not pass merge_rules.json but landed internally)

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

jerryzh168 added a commit to jerryzh168/ao that referenced this pull request Oct 3, 2024
Summary:
Fixes: pytorch#970

The test was broken by a recent refactor in pytorch: pytorch/pytorch#136807

Test Plan:
CI

Reviewers:

Subscribers:

Tasks:

Tags:
jerryzh168 added a commit to jerryzh168/ao that referenced this pull request Oct 4, 2024
Summary:
Fixes: pytorch#970

The test was broken by a recent refactor in pytorch: pytorch/pytorch#136807

Test Plan:
CI

Reviewers:

Subscribers:

Tasks:

Tags:
msaroufim pushed a commit to pytorch/ao that referenced this pull request Oct 4, 2024
* Unskip `test_choose_qparams_token_asym` in 2.6

Summary:
Fixes: #970

The test was broken by a recent refactor in pytorch: pytorch/pytorch#136807

Test Plan:
CI

Reviewers:

Subscribers:

Tasks:

Tags:

* fix
melvinebenezer pushed a commit to melvinebenezer/ao that referenced this pull request Oct 7, 2024
* Unskip `test_choose_qparams_token_asym` in 2.6

Summary:
Fixes: pytorch#970

The test was broken by a recent refactor in pytorch: pytorch/pytorch#136807

Test Plan:
CI

Reviewers:

Subscribers:

Tasks:

Tags:

* fix
@github-actions github-actions bot deleted the gh/kimishpatel/185/head branch October 28, 2024 02:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants