Skip to content

Conversation

@jianyuh
Copy link
Member

@jianyuh jianyuh commented Oct 27, 2019

Stack from ghstack:

The scale and zero_point are for the output activation tensor, not for the weight tensor;

Differential Revision: D18164949

…in dynamic quant linear module

The scale and zero_point are for the output activation tensor, not for the weight tensor;

Differential Revision: [D18164949](https://our.internmc.facebook.com/intern/diff/D18164949/)

[ghstack-poisoned]
@jianyuh jianyuh requested a review from apaszke as a code owner October 27, 2019 22:33
jianyuh added a commit that referenced this pull request Oct 27, 2019
…in dynamic quant linear module

The scale and zero_point are for the output activation tensor, not for the weight tensor;

Differential Revision: [D18164949](https://our.internmc.facebook.com/intern/diff/D18164949/)

ghstack-source-id: 92712041
Pull Request resolved: #28767
Copy link

@z-a-f z-a-f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

scale: `scale` parameter of weight Quantized Tensor, type: double
zero_point: `zero_point` parameter for weight Quantized Tensor, type: long
scale: `scale` parameter of output activation Quantized Tensor, type: double
zero_point: `zero_point` parameter for output activation Quantized Tensor, type: long
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no reason to capitalize the "Quantized Tensor" -- let's keep it as "quantized tensor"

Copy link
Contributor

@raghuramank100 raghuramank100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is misleading, please see my comments below

If :attr:`bias` is ``True``, the values are initialized to zero.
scale: `scale` parameter of weight Quantized Tensor, type: double
zero_point: `zero_point` parameter for weight Quantized Tensor, type: long
scale: `scale` parameter of output activation Quantized Tensor, type: double
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For dynamic quantization there is no output scale and zero-point. We should not be exposing these in the comments. The activations are in floating point.

Copy link
Member Author

@jianyuh jianyuh Oct 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this PR, mainly because when we print the modules, we print the scale and zero points:
https://github.com/pytorch/pytorch/blob/master/torch/nn/quantized/modules/linear.py#L47-L48

One example is for the RoBERTa model after dynamic quantization:

      (19): TransformerEncoderLayer(
        (dropout): Dropout(p=0.1, inplace=False)
        (attention): MultiheadAttention(
          (dropout): Dropout(p=0.1, inplace=False)
          (input_projection): DynamicQuantizedLinear(in_features=1024, out_features=3072, scale=1.0, zero_point=0)
          (output_projection): DynamicQuantizedLinear(in_features=1024, out_features=1024, scale=1.0, zero_point=0)
        )
        (residual_mlp): ResidualMLP(
          (mlp): Sequential(
            (0): DynamicQuantizedLinear(in_features=1024, out_features=4096, scale=1.0, zero_point=0)
            (1): GeLU()
            (2): Dropout(p=0.1, inplace=False)
            (3): DynamicQuantizedLinear(in_features=4096, out_features=1024, scale=1.0, zero_point=0)
            (4): Dropout(p=0.1, inplace=False)
          )
        )
        (attention_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (20): TransformerEncoderLayer(
        (dropout): Dropout(p=0.1, inplace=False)
        (attention): MultiheadAttention(
          (dropout): Dropout(p=0.1, inplace=False)
          (input_projection): DynamicQuantizedLinear(in_features=1024, out_features=3072, scale=1.0, zero_point=0)
          (output_projection): DynamicQuantizedLinear(in_features=1024, out_features=1024, scale=1.0, zero_point=0)
        )
        (residual_mlp): ResidualMLP(
          (mlp): Sequential(
            (0): DynamicQuantizedLinear(in_features=1024, out_features=4096, scale=1.0, zero_point=0)
            (1): GeLU()
            (2): Dropout(p=0.1, inplace=False)
            (3): DynamicQuantizedLinear(in_features=4096, out_features=1024, scale=1.0, zero_point=0)
            (4): Dropout(p=0.1, inplace=False)
          )
        )
        (attention_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )

In the comment, we would like to tell the users that the scale=1.0 and zero_point=0 for DynamicQuantizedLinear module above are for the output activation (to be consistent with the static quantization).

@jianyuh jianyuh added the oncall: quantization Quantization support in PyTorch label Oct 28, 2019
… and scale in dynamic quant linear module"

The scale and zero_point are for the output activation tensor, not for the weight tensor;

Differential Revision: [D18164949](https://our.internmc.facebook.com/intern/diff/D18164949/)

[ghstack-poisoned]
@jianyuh
Copy link
Member Author

jianyuh commented Oct 29, 2019

Per @raghuramank100 's request, we removed the comments for scale and zero_points, and added the extra_repr in #28827.

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in b1ea19c.

@facebook-github-bot facebook-github-bot deleted the gh/jianyuh/38/head branch November 2, 2019 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged oncall: quantization Quantization support in PyTorch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants