-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[pt1][quant] QTensor #18230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pt1][quant] QTensor #18230
Conversation
Differential Revision: D14524641 Differential Version: 76207283
Differential Revision: D14524641 Differential Version: 76232282
Differential Revision: D14524641 Differential Version: 76268727
Differential Revision: D14524641 Differential Version: 76274055
Differential Revision: D14524641 Differential Version: 76276578
|
Need to add doc for |
Differential Revision: D14524641 Differential Version: 76396588
gchanan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like a nice start! I didn't look into the quantization details, I was mainly looking at the ATen/c10 structure. The main things I noticed are:
- We should not have quantized device. As I mention below, I believe we need to change how we do TensorTypeId / Backend lookup to take the dtype into account. You might want to split this out into a separate PR, because it's useful anyway and doesn't need to be held up by the rest.
- The AT_FORALL macros are getting to complicated (not your fault). I'll take this as an action item to see what I can do.
- There's some code around exposing quantizer information to python which doesn't seem like a good idea yet.
Differential Revision: D14524641 Differential Version: 76418300
Differential Revision: D14524641 Differential Version: 76514776
Differential Revision: D14524641 Differential Version: 76527628
|
@bddppq Is this expected? |
Differential Revision: D14524641 Differential Version: 76744384
aten/src/ATen/Quantizer.h
Outdated
| #include <memory> | ||
|
|
||
| // TODO: move to c10 namespace after we | ||
| // unified caffe2::Tensor and at::Tensor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this TODO is stale
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? we don't plan merge them anymore?
Differential Revision: D14524641 Differential Version: 77428769
Summary:
Problem:
```cpp
// This function expects a `Variable` as input
inline PyObject* wrap(at::Tensor tensor) {
return THPVariable_Wrap(Variable(std::move(tensor)));
}
inline PyObject* wrap(at::Scalar scalar) {
// This function calls `wrap(at::Tensor tensor)` (the function above), but since
// `scalar_to_tensor(...)` returns a `Tensor` and not a `Variable`, the call to
// `wrap(at::Tensor tensor)` will fail with "Tensor that was converted to Variable
// was not actually a Variable", which is not what we want.
return wrap(scalar_to_tensor(scalar));
}
```
The right fix is to call `make_variable(...)` with the tensor returned from `scalar_to_tensor(scalar)`.
This unblocks #18230 as it is the only patch that hits this code path now. All other native functions that return Scalar (such as `item()` or `_local_scalar_dense()`) either has custom-defined implementation that doesn't go through this path, or is not exposed to Python at all.
Pull Request resolved: #18632
Differential Revision: D14689293
Pulled By: yf225
fbshipit-source-id: be7ba5d3de83a69533a2997de97ad92989ff78ee
|
@jerryzh168 #18632 is merged, feel free to pull master into your branch again :) |
dzhulgakov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, modulo some non-blocking comments (please address those!)
aten/src/ATen/quantized/Quantizer.h
Outdated
| struct CAFFE2_API PerChannelSymmetricQuantizer: public SymmetricQuantizer { | ||
| PerChannelSymmetricQuantizer() {} | ||
| PerChannelSymmetricQuantizer(std::vector<float> scales, std::vector<int64_t> axis): SymmetricQuantizer(kPerChannelSymmetric), scales_(scales), axis_(axis) { | ||
| AT_ASSERT(axis_.size() == 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AT_CHECK and add descriptive error message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AT_CHECK or AT_ASSERTM? I saw a comment:
// TODO: merge AT_CHECK with AT_ASSERTM. CHECK in fbcode means strict failure if
// not met.
|
@dzhulgakov For qint type conversion, I added a conversion to uint8_t, since it is the underlying type, and I think it is better than converting to uint32_t. |
Differential Revision: D14524641 Differential Version: 77682011
Differential Revision: D14524641 Differential Version: 77702132
ZolotukhinM
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work! A couple of minor nitpicks inline.
| // setters/getters for QTensorImpl fields; otherwise, you should use | ||
| // the low level setters/getters that were implemented using this. | ||
| // This may be called repeatedly, so make sure it's pretty cheap. | ||
| CAFFE2_API QTensorImpl* get_qtensorimpl(const QTensor& self); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to make this a private method of QTensor (and adding all users to friends)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have a separate QTensor class, it's the same as Tensor..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, please ignore this then :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it's a good idea to make QTensorImpl as a friend class of Tensor, which might contain a intrusive_ptr to QTensorImpl
|
|
||
| // define the scalar.to<int64_t>() specializations | ||
| template<typename T> | ||
| template <typename T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if you could commit clang-format changes separately. This is a big important patch, and it would be nice to not dilute it with whitespace changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I didn't expect there were so many whitespace changes, is there a way to undo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be fine in this case, the added files are more important than changed files for this diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could clang-format this entire file in a separate PR and rebase this PR on top, or you can git reset HEAD^ and then git add -p and git commit to select only meaningful parts (I hope clang-format hook would not fire on parts you don't touch).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, just updated
Differential Revision: D14524641 Differential Version: 77786239
Differential Revision: D14524641 Differential Version: 77793094
Differential Revision: D14524641 Differential Version: 77816109
Summary: Pull Request resolved: pytorch#18230 Implementing minimum qtensor API to unblock other workstreams in quantization Changes: - Added Quantizer which represents different quantization schemes - Added qint8 as a data type for QTensor - Added a new ScalarType QInt8 - Added QTensorImpl for QTensor - Added following user facing APIs - quantize_linear(scale, zero_point) - dequantize() - q_scale() - q_zero_point() Reviewed By: dzhulgakov Differential Revision: D14524641 fbshipit-source-id: 4a4e4bb6dd485cffdd3b0e064cadb18b746373b3
Differential Revision: D14524641 Differential Version: 77823420
Differential Revision: D14524641 Differential Version: 77958995
|
This pull request has been merged in dfcd7b0. |
| - func: quantize_linear(Tensor self, float scale, int zero_point) -> Tensor | ||
| matches_jit_signature: True | ||
| variants: function, method | ||
| requires_tensor: True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jerryzh168 @gchanan - why do we need/have requires_tensor: True here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be removed in next pr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added when I was trying to make the python API work since it was complaining something related to Variable.
| - func: dequantize(Tensor self) -> Tensor | ||
| matches_jit_signature: True | ||
| variants: function, method | ||
| requires_tensor: True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't we put
dispatch:
QuantizedQInt8: dequantize
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is in next pr
|
How do we express channel wise quantization, by supplying a different quantizer? |
|
@yinghai - yes |
| * data types in the future. | ||
| */ | ||
| struct alignas(1) qint8 { | ||
| uint8_t val_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the uint8_t assume the quantization is asymmetric? For symmetric quantization, the value could be either unsigned or signed. How do we know? Will you put a flag in the symmetric quantizer for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative is to share the same class for symmetric and asymmetric quantizer and decide whether the quantization is symmetric via the zero point value, i.e. 0 for unsigned symmetric and 128 for signed symmetric.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can have new data types for symmetric quantization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can add these support later if there is a need, right now we only have per tensor affine quantization with unsigned int8 as data type.
Stack:
:white_circle: #18765 [clang-format] For some files that are touched by the QTensor diff 💚
:black_circle: #18230 [pt1][quant] QTensor 💚
Implementing minimum qtensor API to unblock other workstreams in quantization
Changes:
Differential Revision: D14524641