quantized tensor: add support for advanced indexing#49129
quantized tensor: add support for advanced indexing#49129vkuzo wants to merge 5 commits intogh/vkuzo/184/basefrom
Conversation
Summary: Implements support for the indexing of quantized tensors with lists of dims, such as ``` xq_slice = xq[:, [0], :, :] ``` At least a few things need to happen before this being ready for review: 1. Verify if the general idea of `TensorAdditionalOptions` is the right abstraction 2. Verify if `TensorAdditionalOptions` needs the same style as `TensorOptions` (no optionals, etc) Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Implements support for the indexing of quantized tensors with lists of dims, such as ``` xq_slice = xq[:, [0], :, :] ``` At least a few things need to happen before this being ready for review: 1. Verify if the general idea of `TensorAdditionalOptions` is the right abstraction 2. Verify if `TensorAdditionalOptions` needs the same style as `TensorOptions` (no optionals, etc) Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 9242ec9 Pull Request resolved: #49129
💊 CI failures summary and remediationsAs of commit a75af59 (more details on the Dr. CI page):
🕵️ 3 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
Summary: Implements support for the indexing of quantized tensors with lists of dims, such as ``` xq_slice = xq[:, [0], :, :] ``` At least a few things need to happen before this being ready for review: 1. Verify if the general idea of `TensorAdditionalOptions` is the right abstraction 2. Verify if `TensorAdditionalOptions` needs the same style as `TensorOptions` (no optionals, etc) Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D25451651](https://our.internmc.facebook.com/intern/diff/D25451651) [ghstack-poisoned]
Summary: Implements support for the indexing of quantized tensors with lists of dims, such as ``` xq_slice = xq[:, [0], :, :] ``` At least a few things need to happen before this being ready for review: 1. Verify if the general idea of `TensorAdditionalOptions` is the right abstraction 2. Verify if `TensorAdditionalOptions` needs the same style as `TensorOptions` (no optionals, etc) Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 10e51c5 Pull Request resolved: #49129
| if (options.dtype() == at::kQInt8 || options.dtype() == at::kQUInt8 || | ||
| options.dtype() == at::kQInt32) { | ||
| // get the first input and copy its quantization parameters | ||
| const auto& first_input = operands_[num_outputs_]; |
There was a problem hiding this comment.
So what if the quantization parameters on the two tensors are different? That seems like a totally reasonable situation, and then copying the quantization parameters from the first input seems like a totally arbitrary choice.
There was a problem hiding this comment.
If the correct quantization of the output tensor is something that varies from operator to operator, it may make sense to force kernels to do the computation of the quantization. There is some precedent for doing this in TensorIterator with the static_shape parameter, where kernels can say "Hey, I know you've got some built-in shape computation, but I actually happen to know what the output shape is, please use this." I could easily imagine something similar for quantization.
There was a problem hiding this comment.
yeah, I don't think the current solution is super clean, I'm also not sure if the alternative would be better.
The two high level ways I saw to do this were:
- add a quantized path to the current fp32 indexing logic (current PR). This reuses the size calculations and the kernels, but has to add some metadata for quantized tensors in places where it previously did not exist.
- dispatch to something like
quantized_index, which would have to reimplement the indexing logic (with some pieces potentially reusable from fp32, but likely not all)
Thoughts on any options I might be missing here? Does 1 sound like the right call?
There was a problem hiding this comment.
for now, it's option 1, with the non-relevant parts asserted out, but I'm flexible
There was a problem hiding this comment.
@vkuzo The nuance here is that the code paths you are editing in TensorIterator aren't just for fp32 indexing: they are for every unary, binary and reduction operation in PyTorch. So if you want to add something here, there is an obligation to at least have some plausible argument why the logic here is appropriate for all other operations.
I don't mind option one, but this PR ain't it. An implementation that more closely follows the idea of (1) is to take the existing indexing kernel (not in TensorIterator) and program TensorIterator on what the intended quantization result is.
| } | ||
| } else { | ||
| op.tensor = at::empty_strided(sizes, strides, options); | ||
| } |
There was a problem hiding this comment.
Below, there is logic for handling what happens in the out= case. Imagine that you do indexing into a pre-existing quantized tensor. What should the quantization parameters be? What if the tensor is to be resized in this situation?
There was a problem hiding this comment.
asserted it out for now
|
The change to MetaBase::set_output looks incomplete. Oddly, the CI is passing, this is probably because none of the quantized operations have been made structured yet. |
Summary: Implements support for the indexing of quantized tensors with lists of dims, such as ``` xq_slice = xq[:, [0], :, :] ``` If helpful for reviewers, the things originally broken were, in order: 1. `computeDeviceType` did not handle `DispatchKey::QuantizedCPU` (fix: added) 2. quantization params were not present in `TensorIterator::set_output` so they can be used to properly create the quantized tensor (fix: created `TensorQuantizationOptions` and threaded it through the relevant places) 3. `index` kernel was not enabled for quantized dtypes (fix: enable it) Note: this PR only handles per-Tensor qparams. We don't expect to need this for per-channel qparams any time soon, ideally we can pay the eng cost for enabling that when it is needed. Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D25451651](https://our.internmc.facebook.com/intern/diff/D25451651) [ghstack-poisoned]
to clarify, do you mean accounting for the changes in #48718? Yes, that was not in the original PR, but is there in the latest update. |
Summary: Implements support for the indexing of quantized tensors with lists of dims, such as ``` xq_slice = xq[:, [0], :, :] ``` If helpful for reviewers, the things originally broken were, in order: 1. `computeDeviceType` did not handle `DispatchKey::QuantizedCPU` (fix: added) 2. quantization params were not present in `TensorIterator::set_output` so they can be used to properly create the quantized tensor (fix: created `TensorQuantizationOptions` and threaded it through the relevant places) 3. `index` kernel was not enabled for quantized dtypes (fix: enable it) Note: this PR only handles per-Tensor qparams. We don't expect to need this for per-channel qparams any time soon, ideally we can pay the eng cost for enabling that when it is needed. Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D25451651](https://our.internmc.facebook.com/intern/diff/D25451651) [ghstack-poisoned]
Summary: Implements support for the indexing of quantized tensors with lists of dims, such as ``` xq_slice = xq[:, [0], :, :] ``` At least a few things need to happen before this being ready for review: 1. Verify if the general idea of `TensorAdditionalOptions` is the right abstraction 2. Verify if `TensorAdditionalOptions` needs the same style as `TensorOptions` (no optionals, etc) Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 0966cb1 Pull Request resolved: #49129
…nced indexing" Summary: Implements support for the indexing of quantized tensors with lists of dims, such as ``` xq_slice = xq[:, [0], :, :] ``` If helpful for reviewers, the things originally broken were, in order: 1. `computeDeviceType` did not handle `DispatchKey::QuantizedCPU` (fix: added) 2. quantization params were not present in `TensorIterator::set_output` so they can be used to properly create the quantized tensor (fix: created `TensorQuantizationOptions` and threaded it through the relevant places) 3. `index` kernel was not enabled for quantized dtypes (fix: enable it) Note: this PR only handles per-Tensor qparams. We don't expect to need this for per-channel qparams any time soon, ideally we can pay the eng cost for enabling that when it is needed. Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D25451651](https://our.internmc.facebook.com/intern/diff/D25451651) [ghstack-poisoned]
Summary: Implements support for the indexing of quantized tensors with lists of dims, such as ``` xq_slice = xq[:, [0], :, :] ``` At least a few things need to happen before this being ready for review: 1. Verify if the general idea of `TensorAdditionalOptions` is the right abstraction 2. Verify if `TensorAdditionalOptions` needs the same style as `TensorOptions` (no optionals, etc) Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: d4ed3de Pull Request resolved: #49129
Summary: This is less ambitious redo of #49129. We make the ``` xq_slice = xq[:, [0], :, :] ``` indexing syntax work if `xq` is a quantized Tensor. For now, we are making the code not crash, with an in efficient `dq -> index -> q` implementation. A future PR can optimize performance by removing the unnecessary memory copies (which will require some non-trivial changes to TensorIterator). Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
…for advanced indexing, try 2" Summary: This is less ambitious redo of #49129. We make the ``` xq_slice = xq[:, [0], :, :] ``` indexing syntax work if `xq` is a quantized Tensor. For now, we are making the code not crash, with an in efficient `dq -> index -> q` implementation. A future PR can optimize performance by removing the unnecessary memory copies (which will require some non-trivial changes to TensorIterator). Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D25539365](https://our.internmc.facebook.com/intern/diff/D25539365) [ghstack-poisoned]
…dexing, try 2" Summary: This is less ambitious redo of #49129. We make the ``` xq_slice = xq[:, [0], :, :] ``` indexing syntax work if `xq` is a quantized Tensor. For now, we are making the code not crash, with an in efficient `dq -> index -> q` implementation. A future PR can optimize performance by removing the unnecessary memory copies (which will require some non-trivial changes to TensorIterator). Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D25539365](https://our.internmc.facebook.com/intern/diff/D25539365) [ghstack-poisoned]
Summary: This is less ambitious redo of #49129. We make the ``` xq_slice = xq[:, [0], :, :] ``` indexing syntax work if `xq` is a quantized Tensor. For now, we are making the code not crash, with an in efficient `dq -> index -> q` implementation. A future PR can optimize performance by removing the unnecessary memory copies (which will require some non-trivial changes to TensorIterator). Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 9540251 Pull Request resolved: #49346
…ed indexing, try 2" Summary: This is less ambitious redo of #49129. We make the ``` xq_slice = xq[:, [0], :, :] ``` indexing syntax work if `xq` is a quantized Tensor. For now, we are making the code not crash, with an in efficient `dq -> index -> q` implementation. A future PR can optimize performance by removing the unnecessary memory copies (which will require some non-trivial changes to TensorIterator). Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D25539365](https://our.internmc.facebook.com/intern/diff/D25539365) [ghstack-poisoned]
Summary: This is less ambitious redo of #49129. We make the ``` xq_slice = xq[:, [0], :, :] ``` indexing syntax work if `xq` is a quantized Tensor. For now, we are making the code not crash, with an in efficient `dq -> index -> q` implementation. A future PR can optimize performance by removing the unnecessary memory copies (which will require some non-trivial changes to TensorIterator). Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: c319b52 Pull Request resolved: #49346
…exing, try 2" Summary: This is less ambitious redo of #49129. We make the ``` xq_slice = xq[:, [0], :, :] ``` indexing syntax work if `xq` is a quantized Tensor. For now, we are making the code not crash, with an in efficient `dq -> index -> q` implementation. A future PR can optimize performance by removing the unnecessary memory copies (which will require some non-trivial changes to TensorIterator). Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D25539365](https://our.internmc.facebook.com/intern/diff/D25539365) [ghstack-poisoned]
…exing, try 2" Summary: This is less ambitious redo of #49129. We make the ``` xq_slice = xq[:, [0], :, :] ``` indexing syntax work if `xq` is a quantized Tensor. For now, we are making the code not crash, with an in efficient `dq -> index -> q` implementation. A future PR can optimize performance by removing the unnecessary memory copies (which will require some non-trivial changes to TensorIterator). Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D25539365](https://our.internmc.facebook.com/intern/diff/D25539365) [ghstack-poisoned]
Summary: This is less ambitious redo of #49129. We make the ``` xq_slice = xq[:, [0], :, :] ``` indexing syntax work if `xq` is a quantized Tensor. For now, we are making the code not crash, with an in efficient `dq -> index -> q` implementation. A future PR can optimize performance by removing the unnecessary memory copies (which will require some non-trivial changes to TensorIterator). Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 7591b5f Pull Request resolved: #49346
#49346) Summary: Pull Request resolved: #49346 This is less ambitious redo of #49129. We make the ``` xq_slice = xq[:, [0], :, :] ``` indexing syntax work if `xq` is a quantized Tensor. For now, we are making the code not crash, with an in efficient `dq -> index -> q` implementation. A future PR can optimize performance by removing the unnecessary memory copies (which will require some non-trivial changes to TensorIterator). Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25539365 fbshipit-source-id: 98485875aaaf5743e1a940e170258057691be4fa
|
thanks for the review! We went with a simpler and less performant approach of using the fp32 kernel for now in #49346, we might revisit supporting this natively in the future. |
pytorch#49346) Summary: Pull Request resolved: pytorch#49346 This is less ambitious redo of pytorch#49129. We make the ``` xq_slice = xq[:, [0], :, :] ``` indexing syntax work if `xq` is a quantized Tensor. For now, we are making the code not crash, with an in efficient `dq -> index -> q` implementation. A future PR can optimize performance by removing the unnecessary memory copies (which will require some non-trivial changes to TensorIterator). Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25539365 fbshipit-source-id: 98485875aaaf5743e1a940e170258057691be4fa
Stack from ghstack:
Summary:
Implements support for the indexing of quantized tensors with lists of
dims, such as
If helpful for reviewers, the things originally broken were, in order:
computeDeviceTypedid not handleDispatchKey::QuantizedCPU(fix: added)TensorIterator::set_outputso they can be used to properly create the quantized tensor (fix: createdTensorQuantizationOptionsand threaded it through the relevant places)indexkernel was not enabled for quantized dtypes (fix: enable it)Note: this PR only handles per-Tensor qparams. We don't expect to need this for per-channel qparams any time soon, ideally we can pay the eng cost for enabling that when it is needed.
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: D25451651