[Inductor][Quant] Change the schema of QLinear Binary #129049

leslie-fang-intel · 2024-06-19T08:54:38Z

Stack from ghstack (oldest at bottom):

Summary
We change the schema of QLinear Binary, so it will be easier to enable the corresponding gemm template.

Extra input of binary post-op is a tensor which needs to be an input node of autotuning, we need to move it at front of output_scale which is a scalar.
We also move it at front of bias, since bias is optional tensor for this fusion, but other is a must to have for linear binary fusion.

Test Plan

python -u -m pytest -s -v test/quantization/core/test_quantized_op.py -k qlinear
python -u -m pytest -s -v test/inductor/test_mkldnn_pattern_matcher.py -k qlinear

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

[ghstack-poisoned]

pytorch-bot · 2024-06-19T08:54:41Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/129049

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (4 Unrelated Failures)

As of commit 66237d6 with merge base dabaebd ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

pull / before-test / llm-retrieval (gh) (matched llm-retrieval rule in flaky-rules.json)
Unexpected HTTP response: 429
trunk / linux-focal-cuda12.4-py3.10-gcc9-sm86 / test (default, 5, 5, linux.g5.4xlarge.nvidia.gpu) (gh) (matched linux rule in flaky-rules.json)
The process '/usr/bin/git' failed with exit code 1
trunk / macos-py3-arm64-mps / test (mps, 1, 1, macos-m1-13) (gh) (matched macos rule in flaky-rules.json)
Failure: There is only 6099216KB free space left in /, which is less than the minimum requirement of
trunk / macos-py3-arm64-mps / test (mps, 1, 1, macos-m1-14) (gh) (similar failure)
test_mps.py::TestMPS::test_mps_allocator_module

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: a5645c5 Pull Request resolved: #129049

**Summary** We change the schema of QLinear Binary, so it will be easy to enable the corresponding gemm template **Test Plan** ``` python -u -m pytest -s -v test/quantization/core/test_quantized_op.py -k qlinear python -u -m pytest -s -v test/inductor/test_mkldnn_pattern_matcher.py -k qlinear ``` cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

ghstack-source-id: add6622 Pull Request resolved: pytorch#129049

**Summary** We change the schema of QLinear Binary, so it will be easy to enable the corresponding gemm template **Test Plan** ``` python -u -m pytest -s -v test/quantization/core/test_quantized_op.py -k qlinear python -u -m pytest -s -v test/inductor/test_mkldnn_pattern_matcher.py -k qlinear ``` cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

**Summary** We change the schema of QLinear Binary, so it will be easier to enable the corresponding gemm template. - Extra input of binary post-op is a tensor which needs to be an input node of autotuning, we need to move it at front of `output_scale` which is a scalar. - We also move it at front of `bias`, since `bias` is optional tensor for this fusion, but `other` is a must to have for linear binary fusion. **Test Plan** ``` python -u -m pytest -s -v test/quantization/core/test_quantized_op.py -k qlinear python -u -m pytest -s -v test/inductor/test_mkldnn_pattern_matcher.py -k qlinear ``` cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

leslie-fang-intel · 2024-06-28T01:25:33Z

Hi @jansel, this PR may also need your approval.

**Summary** We change the schema of QLinear Binary, so it will be easier to enable the corresponding gemm template. - Extra input of binary post-op is a tensor which needs to be an input node of autotuning, we need to move it at front of `output_scale` which is a scalar. - We also move it at front of `bias`, since `bias` is optional tensor for this fusion, but `other` is a must to have for linear binary fusion. **Test Plan** ``` python -u -m pytest -s -v test/quantization/core/test_quantized_op.py -k qlinear python -u -m pytest -s -v test/inductor/test_mkldnn_pattern_matcher.py -k qlinear ``` cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

leslie-fang-intel · 2024-06-30T09:54:14Z

@pytorchbot merge

pytorchmergebot · 2024-06-30T09:56:47Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-06-30T15:55:38Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

leslie-fang-intel · 2024-07-01T00:27:11Z

@pytorchbot merge

pytorchmergebot · 2024-07-01T00:28:45Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-07-01T06:27:38Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

leslie-fang-intel · 2024-07-02T12:34:37Z

@pytorchbot merge

pytorchmergebot · 2024-07-02T12:36:20Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…ion (#129103) **Summary** Based on previous PR, add the config to support quantized linear binary - optional(unary) post op fusion. - Activation dtype: uint8 - Weight dtype: int8 - Output dtype: float32/bfloat16/uint8 - Post Op Fusion: with binary and optional[Unary] post operator fusion **Test Plan** ``` clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_with_pointwise_binary ``` **Next Step** - [✓] Unary post op fusion - [✓] Int8 output - [✓] Binary Fusion - [ ] AMX int8 MicroGEMM Kernel Pull Request resolved: #129103 Approved by: https://github.com/jgong5, https://github.com/jansel ghstack dependencies: #128825, #129048, #129049

**Summary** Add the AMX micro gemm kernel with int8 data type. **Test Plan** ``` clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_amx ``` **Next Step** - [✓] Unary post op fusion - [✓] Int8 output - [✓] Binary Fusion - [✓] AMX int8 MicroGEMM Kernel Pull Request resolved: #129220 Approved by: https://github.com/jgong5 ghstack dependencies: #128825, #129048, #129049, #129103

…129221) **Summary** This PR mainly refactor 2 things: 1. Passing in weight's data type explicitly in `create_micro_gemm` as `input2.dtype`. When registering `CppMicroGemmConfig`, we will reuse `input.dtype` if `input2.dtype` is not explicitly registered. 2. Add an util function to get the output data type and compute data type from input data type. Pull Request resolved: #129221 Approved by: https://github.com/jgong5, https://github.com/jansel ghstack dependencies: #128825, #129048, #129049, #129103, #129220

… template (#129470) **Summary** Remove redundant INT8-specific logic in the INT8 GEMM template to unify the code structure with FP32/BF16/FP16 GEMM Template. **Test Plan** ``` numactl -C 56-111 -m 1 python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear ``` Pull Request resolved: #129470 Approved by: https://github.com/jgong5 ghstack dependencies: #128825, #129048, #129049, #129103, #129220, #129221

[Inductor][Quant] Change the schema of QLinear Binary

d0fb7e0

[ghstack-poisoned]

leslie-fang-intel requested review from digantdesai, jerryzh168, jianyuh, kimishpatel and salilsdesai as code owners June 19, 2024 08:54

This was referenced Jun 19, 2024

[Inductor][CPP] Fix the symbolic size cast issue in GEMM Benchmark #128824

Closed

[Inductor][Quant] Use output dtype torch.uint8 explicitly #128804

Closed

This was referenced Jun 19, 2024

[Inductor][CPP] Fallback QLinear Binaryfusion from postop sum to binary add when others is view #128808

Closed

[Inductor][CPP] Enable Quantized Linear GEMM Template with FP32 output #128825

Closed

pytorch-bot bot added ciflow/inductor module: cpu CPU specific problem (e.g., perf, algorithm) module: inductor release notes: quantization release notes category labels Jun 19, 2024

leslie-fang-intel mentioned this pull request Jun 19, 2024

[Inductor][CPP] Enable Quantized Linear GEMM Template with INT8 output and Unary Post Op #129048

Closed

2 tasks

leslie-fang-intel added a commit that referenced this pull request Jun 19, 2024

[Inductor][Quant] Change the schema of QLinear Binary

44e0481

ghstack-source-id: a5645c5 Pull Request resolved: #129049

leslie-fang-intel marked this pull request as draft June 19, 2024 08:57

pytorchbot added the open source label Jun 19, 2024

leslie-fang-intel mentioned this pull request Jun 20, 2024

[Inductor][CPP] Enable Quantized Linear GEMM Template with Binary Fusion #129103

Closed

1 task

leslie-fang-intel mentioned this pull request Jun 20, 2024

Add mkldnn_ir.py into merge rule #129107

Closed

leslie-fang-intel mentioned this pull request Jun 20, 2024

[Test] Int8 nemurical with reduce range of activation #129125

Closed

leslie-fang-intel added a commit to leslie-fang-intel/pytorch that referenced this pull request Jun 21, 2024

[Inductor][Quant] Change the schema of QLinear Binary

4713813

ghstack-source-id: add6622 Pull Request resolved: pytorch#129049

leslie-fang-intel mentioned this pull request Jun 21, 2024

[Inductor][CPP] Enable Quantized Linear with AMX MicroGEMM #129220

Closed

leslie-fang-intel added 4 commits June 23, 2024 23:55

leslie-fang-intel mentioned this pull request Jun 25, 2024

[Inductor][CPP] Remove redundant INT8-specific logic in the INT8 GEMM template #129470

Closed

leslie-fang-intel requested a review from jansel June 28, 2024 01:24

jansel approved these changes Jun 28, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 30, 2024

pytorchmergebot added the merging label Jun 30, 2024

pytorchmergebot added the Merged label Jul 2, 2024

pytorchmergebot closed this in 86e2d16 Jul 2, 2024

pytorchmergebot removed the merging label Jul 2, 2024

github-actions bot deleted the gh/leslie-fang-intel/119/head branch August 2, 2024 01:56

desertfire mentioned this pull request Mar 11, 2025

[AOTI] Remove aoti_torch_cpu__weight_int4pack_mm_cpu_tensor #148907

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Inductor][Quant] Change the schema of QLinear Binary #129049

[Inductor][Quant] Change the schema of QLinear Binary #129049

Uh oh!

leslie-fang-intel commented Jun 19, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jun 19, 2024 •

edited

Loading

Uh oh!

leslie-fang-intel commented Jun 28, 2024

Uh oh!

leslie-fang-intel commented Jun 30, 2024

Uh oh!

pytorchmergebot commented Jun 30, 2024

Uh oh!

pytorchmergebot commented Jun 30, 2024

Uh oh!

leslie-fang-intel commented Jul 1, 2024

Uh oh!

pytorchmergebot commented Jul 1, 2024

Uh oh!

pytorchmergebot commented Jul 1, 2024

Uh oh!

leslie-fang-intel commented Jul 2, 2024

Uh oh!

pytorchmergebot commented Jul 2, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[Inductor][Quant] Change the schema of QLinear Binary #129049

[Inductor][Quant] Change the schema of QLinear Binary #129049

Uh oh!

Conversation

leslie-fang-intel commented Jun 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/129049

✅ You can merge normally! (4 Unrelated Failures)

Uh oh!

leslie-fang-intel commented Jun 28, 2024

Uh oh!

leslie-fang-intel commented Jun 30, 2024

Uh oh!

pytorchmergebot commented Jun 30, 2024

Merge started

Uh oh!

pytorchmergebot commented Jun 30, 2024

Uh oh!

leslie-fang-intel commented Jul 1, 2024

Uh oh!

pytorchmergebot commented Jul 1, 2024

Merge started

Uh oh!

pytorchmergebot commented Jul 1, 2024

Uh oh!

leslie-fang-intel commented Jul 2, 2024

Uh oh!

pytorchmergebot commented Jul 2, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

leslie-fang-intel commented Jun 19, 2024 •

edited

Loading

pytorch-bot bot commented Jun 19, 2024 •

edited

Loading