-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[pt][quant] Update the misleading comments for zero_points and scale in dynamic quant linear module #28767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…in dynamic quant linear module The scale and zero_point are for the output activation tensor, not for the weight tensor; Differential Revision: [D18164949](https://our.internmc.facebook.com/intern/diff/D18164949/) [ghstack-poisoned]
…in dynamic quant linear module The scale and zero_point are for the output activation tensor, not for the weight tensor; Differential Revision: [D18164949](https://our.internmc.facebook.com/intern/diff/D18164949/) ghstack-source-id: 92712041 Pull Request resolved: #28767
z-a-f
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| scale: `scale` parameter of weight Quantized Tensor, type: double | ||
| zero_point: `zero_point` parameter for weight Quantized Tensor, type: long | ||
| scale: `scale` parameter of output activation Quantized Tensor, type: double | ||
| zero_point: `zero_point` parameter for output activation Quantized Tensor, type: long |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no reason to capitalize the "Quantized Tensor" -- let's keep it as "quantized tensor"
raghuramank100
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment is misleading, please see my comments below
| If :attr:`bias` is ``True``, the values are initialized to zero. | ||
| scale: `scale` parameter of weight Quantized Tensor, type: double | ||
| zero_point: `zero_point` parameter for weight Quantized Tensor, type: long | ||
| scale: `scale` parameter of output activation Quantized Tensor, type: double |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For dynamic quantization there is no output scale and zero-point. We should not be exposing these in the comments. The activations are in floating point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have this PR, mainly because when we print the modules, we print the scale and zero points:
https://github.com/pytorch/pytorch/blob/master/torch/nn/quantized/modules/linear.py#L47-L48
One example is for the RoBERTa model after dynamic quantization:
(19): TransformerEncoderLayer(
(dropout): Dropout(p=0.1, inplace=False)
(attention): MultiheadAttention(
(dropout): Dropout(p=0.1, inplace=False)
(input_projection): DynamicQuantizedLinear(in_features=1024, out_features=3072, scale=1.0, zero_point=0)
(output_projection): DynamicQuantizedLinear(in_features=1024, out_features=1024, scale=1.0, zero_point=0)
)
(residual_mlp): ResidualMLP(
(mlp): Sequential(
(0): DynamicQuantizedLinear(in_features=1024, out_features=4096, scale=1.0, zero_point=0)
(1): GeLU()
(2): Dropout(p=0.1, inplace=False)
(3): DynamicQuantizedLinear(in_features=4096, out_features=1024, scale=1.0, zero_point=0)
(4): Dropout(p=0.1, inplace=False)
)
)
(attention_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(20): TransformerEncoderLayer(
(dropout): Dropout(p=0.1, inplace=False)
(attention): MultiheadAttention(
(dropout): Dropout(p=0.1, inplace=False)
(input_projection): DynamicQuantizedLinear(in_features=1024, out_features=3072, scale=1.0, zero_point=0)
(output_projection): DynamicQuantizedLinear(in_features=1024, out_features=1024, scale=1.0, zero_point=0)
)
(residual_mlp): ResidualMLP(
(mlp): Sequential(
(0): DynamicQuantizedLinear(in_features=1024, out_features=4096, scale=1.0, zero_point=0)
(1): GeLU()
(2): Dropout(p=0.1, inplace=False)
(3): DynamicQuantizedLinear(in_features=4096, out_features=1024, scale=1.0, zero_point=0)
(4): Dropout(p=0.1, inplace=False)
)
)
(attention_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
In the comment, we would like to tell the users that the scale=1.0 and zero_point=0 for DynamicQuantizedLinear module above are for the output activation (to be consistent with the static quantization).
… and scale in dynamic quant linear module" The scale and zero_point are for the output activation tensor, not for the weight tensor; Differential Revision: [D18164949](https://our.internmc.facebook.com/intern/diff/D18164949/) [ghstack-poisoned]
|
Per @raghuramank100 's request, we removed the comments for |
|
This pull request has been merged in b1ea19c. |
Stack from ghstack:
The scale and zero_point are for the output activation tensor, not for the weight tensor;
Differential Revision: D18164949