[Intel GPU] XPUInductorQuantizer for XPU int8 recipe customization #139578

ZhiweiYan-96 · 2024-11-03T09:00:08Z

Motivation

This PR add XPUInductorQuantizer, which would defined the recipe of int8 quantization at XPU backend.

Detailed

The XPUInductorQuantizer is class derived from X86InductorQuantizer as both quantizer would take the advantage of highly optimized operators in oneDNN library(qconv, qlinear, qconv/qlinear fusion).

We share the same recipe as X86InductorQuantizer, so we would have same annotate_xxxx methods. So, in ideal situation, the XPUInductorQuantizer would have no class body as all implementation can inherit from base class.

In this PR, we override the annotate_xxx method for operators that has NOT be implemented. All operators XPU backend does not implement would be fallbacked to fp32 implementation as the node in graph is a dq-op-q pairs. This would help provide good OOB usability for XPU backend. On the other hand, the implemented operators would uses annotate_op implemented in base class and could be lowered successfully.

Stack from ghstack (oldest at bottom):

-> [Intel GPU] XPUInductorQuantizer for XPU int8 recipe customization #139578

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

[ghstack-poisoned]

pytorch-bot · 2024-11-03T09:00:11Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139578

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit d4de5ec with merge base b18bbc9 ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor_torchbench, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (trunk failure)
detectron2_fcos_r_50_fpn

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

EikanWang · 2024-11-04T07:36:17Z

torch/ao/quantization/quantizer/xpu_inductor_quantizer.py

+        quantization_config: Optional[QuantizationConfig],
+        filter_fn: Optional[FilterFn] = None,
+    ):
+        return None


raise NotImplemented

We overrides the implementation here for not practicing annotate here. We do not annotate as we expect these ops would fallback to fp32. All ops not supported would be in dq-fp32 op-q format generated by _generate_qdq_quantized_model. So, we prefer not raising error here to ensure better oob experience.

[ghstack-poisoned]

ghstack-source-id: ff13fd9 Pull Request resolved: #139578

[ghstack-poisoned]

ghstack-source-id: 1d8dd1b Pull Request resolved: #139578

ZhiweiYan-96 · 2024-11-25T06:42:10Z

torch/ao/quantization/quantizer/xpu_inductor_quantizer.py

+
+
+@functools.lru_cache
+def get_default_xpu_inductor_quantization_config():


hi, @jerryzh168 @EikanWang , I’ve made some updates in the latest commit(056dfaf). Would you mind taking a look？ Your feedback would be greatly appreciated.

Just to provide some context: on the GPU side we prefer using the s8 data dtype over u8. This is because oneDNN has better performance when zero-point is set to 0. So when we use s8+symmetric, we can achieve the optimal performance on GPU. Currently, we've set the qscheme to torch.per_tensor_affine , but users can further adjusts qscheme as torch.per_tensor_symmetric if they are looking to further improve performance.

# Motivation This PR enables the XPU quantized convolution. The operators it registers are `onednn::qconv_prepack`, `onednn::qconv1d_pointwise`, `onednn::qconv2d_pointwise`, `onednn::qconv3d_pointwise`. We share same operator schemas as Intel CPU backend as both would call kernels implemented in oneDNN library. # Details The implemented operators would be further integrated into pt2e quant flow. In this PR, we validated the kernel functionality via the UT in `test/inductor/test_mkldnn_pattern_matcher.py` where CPU backend defines a series of UT for quantized convolution. Also, we extend the device support for inductor lowering pass and inductor IR defined in `torch/_inductor/fx_passes/quantization.py` and `torch/_inductor/mkldnn_ir.py`. The overall picture would be that, CPU and GPU backend could share the general optimization pass(op fusion) and quantization inductor IR. After lowering, the final kernel would be dispatched to different implementation in oneDNN library. In this PR, we share the same int8 quantizer in CPU, namely, `X68InductorQuantizer`. In next PR #139578, we will append a `XPUIndcutorQuantizer` which will customized the pt2e behaviors at XPU backend. The capability of `XPUInductorQuantizer` would gradually grow along with the development of quantized operators in XPU. # Validation * UT testing ```bash python test/inductor/test_mkldnn_pattern_matcher.py -v \ -k test_qconv2d_xpu \ -k test_qconv2d_silu_xpu \ -k test_qconv2d_relu6_xpu \ -k test_qconv2d_hardtanh_xpu \ -k test_qconv2d_hardswish_xpu ``` * Runtime exemplification ```bash #qconv2d onednn_verbose,primitive,exec,gpu:0,convolution,jit:ir,forward_training,src_u8::blocked:acdb::f0 wei_s8::blocked:acdb::f0 bia_undef::undef::: dst_f32::blocked:acdb::f0,attr-scratchpad:user attr-scales:src0:0:f32+wei:1:f32 attr-zero-points:src0:0:s32 attr-post-ops:binary_add:f32:2+eltwise_linear:1,alg:convolution_direct,mb1_ic128oc128_ih6oh4kh3sh1dh0ph0_iw6ow4kw3sw1dw0pw0,0.0668945 #qconv2d_silu onednn_verbose,primitive,exec,gpu:0,convolution,jit:ir,forward_training,src_u8::blocked:acdb::f0 wei_s8::blocked:acdb::f0 bia_undef::undef::: dst_u8::blocked:acdb::f0,attr-scratchpad:user attr-scales:src0:0:f32+wei:1:f32 attr-zero-points:src0:0:s32 attr-post-ops:eltwise_swish:1+binary_add:f32:2+eltwise_linear:0.0124779:22,alg:convolution_direct,mb1_ic3oc128_ih8oh6kh3sh1dh0ph0_iw8ow6kw3sw1dw0pw0,0.0881348 ``` Pull Request resolved: #133080 Approved by: https://github.com/guangyey, https://github.com/EikanWang, https://github.com/atalman

EikanWang · 2024-11-26T02:26:31Z

@pytorchbot rebase

pytorchmergebot · 2024-11-26T02:27:57Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

[ghstack-poisoned]

pytorchmergebot · 2024-11-26T02:28:09Z

Successfully rebased gh/ZhiweiYan-96/34/orig onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/139578)

ghstack-source-id: 0c6d03b Pull Request resolved: #139578

EikanWang · 2024-11-26T06:41:28Z

@pytorchbot merge

pytorchmergebot · 2024-11-26T06:43:19Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

# Motivation This PR enables the XPU quantized convolution. The operators it registers are `onednn::qconv_prepack`, `onednn::qconv1d_pointwise`, `onednn::qconv2d_pointwise`, `onednn::qconv3d_pointwise`. We share same operator schemas as Intel CPU backend as both would call kernels implemented in oneDNN library. # Details The implemented operators would be further integrated into pt2e quant flow. In this PR, we validated the kernel functionality via the UT in `test/inductor/test_mkldnn_pattern_matcher.py` where CPU backend defines a series of UT for quantized convolution. Also, we extend the device support for inductor lowering pass and inductor IR defined in `torch/_inductor/fx_passes/quantization.py` and `torch/_inductor/mkldnn_ir.py`. The overall picture would be that, CPU and GPU backend could share the general optimization pass(op fusion) and quantization inductor IR. After lowering, the final kernel would be dispatched to different implementation in oneDNN library. In this PR, we share the same int8 quantizer in CPU, namely, `X68InductorQuantizer`. In next PR pytorch#139578, we will append a `XPUIndcutorQuantizer` which will customized the pt2e behaviors at XPU backend. The capability of `XPUInductorQuantizer` would gradually grow along with the development of quantized operators in XPU. # Validation * UT testing ```bash python test/inductor/test_mkldnn_pattern_matcher.py -v \ -k test_qconv2d_xpu \ -k test_qconv2d_silu_xpu \ -k test_qconv2d_relu6_xpu \ -k test_qconv2d_hardtanh_xpu \ -k test_qconv2d_hardswish_xpu ``` * Runtime exemplification ```bash #qconv2d onednn_verbose,primitive,exec,gpu:0,convolution,jit:ir,forward_training,src_u8::blocked:acdb::f0 wei_s8::blocked:acdb::f0 bia_undef::undef::: dst_f32::blocked:acdb::f0,attr-scratchpad:user attr-scales:src0:0:f32+wei:1:f32 attr-zero-points:src0:0:s32 attr-post-ops:binary_add:f32:2+eltwise_linear:1,alg:convolution_direct,mb1_ic128oc128_ih6oh4kh3sh1dh0ph0_iw6ow4kw3sw1dw0pw0,0.0668945 #qconv2d_silu onednn_verbose,primitive,exec,gpu:0,convolution,jit:ir,forward_training,src_u8::blocked:acdb::f0 wei_s8::blocked:acdb::f0 bia_undef::undef::: dst_u8::blocked:acdb::f0,attr-scratchpad:user attr-scales:src0:0:f32+wei:1:f32 attr-zero-points:src0:0:s32 attr-post-ops:eltwise_swish:1+binary_add:f32:2+eltwise_linear:0.0124779:22,alg:convolution_direct,mb1_ic3oc128_ih8oh6kh3sh1dh0ph0_iw8ow6kw3sw1dw0pw0,0.0881348 ``` Pull Request resolved: pytorch#133080 Approved by: https://github.com/guangyey, https://github.com/EikanWang, https://github.com/atalman

…ytorch#139578) # Motivation This PR add `XPUInductorQuantizer`, which would defined the recipe of int8 quantization at XPU backend. # Detailed The `XPUInductorQuantizer` is class derived from `X86InductorQuantizer` as both quantizer would take the advantage of highly optimized operators in oneDNN library(qconv, qlinear, qconv/qlinear fusion). We share the same recipe as `X86InductorQuantizer`, so we would have same `annotate_xxxx` methods. So, in ideal situation, the `XPUInductorQuantizer` would have no class body as all implementation can inherit from base class. In this PR, we override the `annotate_xxx` method for operators that has NOT be implemented. All operators XPU backend does not implement would be fallbacked to fp32 implementation as the node in graph is a `dq-op-q` pairs. This would help provide good OOB usability for XPU backend. On the other hand, the implemented operators would uses `annotate_op` implemented in base class and could be lowered successfully. Pull Request resolved: pytorch#139578 Approved by: https://github.com/EikanWang, https://github.com/leslie-fang-intel, https://github.com/CuiYifeng, https://github.com/jerryzh168 ghstack dependencies: pytorch#133080

Fixes issues introduced in #141348 and #139578 Pull Request resolved: #142514 Approved by: https://github.com/malfet Co-authored-by: Nikita Shulga <[email protected]>

Update

49fc138

[ghstack-poisoned]

pytorch-bot bot added module: inductor release notes: quantization release notes category release notes: AO frontend labels Nov 3, 2024

ZhiweiYan-96 marked this pull request as draft November 3, 2024 09:01

pytorchbot added the open source label Nov 3, 2024

Update

540d055

[ghstack-poisoned]

ZhiweiYan-96 mentioned this pull request Nov 3, 2024

[Intel GPU] Extract common utils for conv&qconv #139580

Closed

ZhiweiYan-96 added 6 commits November 4, 2024 02:56

Update

ec94fcf

[ghstack-poisoned]

Update

2890173

[ghstack-poisoned]

Update

a50c8ef

[ghstack-poisoned]

Update

25e37c4

[ghstack-poisoned]

Update

8f2cf1b

[ghstack-poisoned]

Update

58a1096

[ghstack-poisoned]

EikanWang requested changes Nov 4, 2024

View reviewed changes

ZhiweiYan-96 added 2 commits November 4, 2024 07:48

Update

6db6e92

[ghstack-poisoned]

Update

58aaf98

[ghstack-poisoned]

ZhiweiYan-96 mentioned this pull request Nov 5, 2024

[Intel GPU] format XPU oneDNN integration codes #139721

Closed

Update

d1c48bc

[ghstack-poisoned]

Update

72d5c2b

[ghstack-poisoned]

Update

056dfaf

[ghstack-poisoned]

pytorch-bot bot added the ciflow/inductor label Nov 21, 2024

Update

b1a5973

[ghstack-poisoned]

ZhiweiYan-96 added a commit that referenced this pull request Nov 21, 2024

[Intel GPU] XPUInductorQuantizer for XPU int8 recipe customization

967a6a6

ghstack-source-id: ff13fd9 Pull Request resolved: #139578

Update

7bc38b3

[ghstack-poisoned]

ZhiweiYan-96 added a commit that referenced this pull request Nov 23, 2024

[Intel GPU] XPUInductorQuantizer for XPU int8 recipe customization

a0ed73d

ghstack-source-id: 1d8dd1b Pull Request resolved: #139578

ZhiweiYan-96 commented Nov 25, 2024

View reviewed changes

Update

d4de5ec

[ghstack-poisoned]

pytorchmergebot pushed a commit that referenced this pull request Nov 26, 2024

[Intel GPU] XPUInductorQuantizer for XPU int8 recipe customization

4f48cf9

ghstack-source-id: 0c6d03b Pull Request resolved: #139578

pytorchmergebot added the merging label Nov 26, 2024

pytorchmergebot added the Merged label Nov 26, 2024

pytorchmergebot closed this in c418a9a Nov 26, 2024

pytorchmergebot removed the merging label Nov 26, 2024

r-barnes mentioned this pull request Dec 10, 2024

c10::optional -> std::optional #142514

Closed

github-actions bot deleted the gh/ZhiweiYan-96/34/head branch December 27, 2024 02:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Intel GPU] XPUInductorQuantizer for XPU int8 recipe customization #139578

[Intel GPU] XPUInductorQuantizer for XPU int8 recipe customization #139578

Uh oh!

ZhiweiYan-96 commented Nov 3, 2024 •

edited by pytorchmergebot

Loading

Uh oh!

pytorch-bot bot commented Nov 3, 2024 •

edited

Loading

Uh oh!

EikanWang Nov 4, 2024

Uh oh!

ZhiweiYan-96 Nov 5, 2024 •

edited

Loading

Uh oh!

ZhiweiYan-96 Nov 25, 2024 •

edited

Loading

Uh oh!

EikanWang commented Nov 26, 2024

Uh oh!

pytorchmergebot commented Nov 26, 2024

Uh oh!

pytorchmergebot commented Nov 26, 2024

Uh oh!

EikanWang commented Nov 26, 2024

Uh oh!

pytorchmergebot commented Nov 26, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants



		@functools.lru_cache
		def get_default_xpu_inductor_quantization_config():

[Intel GPU] XPUInductorQuantizer for XPU int8 recipe customization #139578

[Intel GPU] XPUInductorQuantizer for XPU int8 recipe customization #139578

Uh oh!

Conversation

ZhiweiYan-96 commented Nov 3, 2024 • edited by pytorchmergebot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Detailed

Uh oh!

pytorch-bot bot commented Nov 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139578

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

EikanWang Nov 4, 2024

Choose a reason for hiding this comment

Uh oh!

ZhiweiYan-96 Nov 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ZhiweiYan-96 Nov 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

EikanWang commented Nov 26, 2024

Uh oh!

pytorchmergebot commented Nov 26, 2024

Uh oh!

pytorchmergebot commented Nov 26, 2024

Uh oh!

EikanWang commented Nov 26, 2024

Uh oh!

pytorchmergebot commented Nov 26, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

ZhiweiYan-96 commented Nov 3, 2024 •

edited by pytorchmergebot

Loading

pytorch-bot bot commented Nov 3, 2024 •

edited

Loading

ZhiweiYan-96 Nov 5, 2024 •

edited

Loading

ZhiweiYan-96 Nov 25, 2024 •

edited

Loading