[Inductor][CPP] Enable Quantized Linear GEMM Template with Binary Fusion #129103

leslie-fang-intel · 2024-06-20T01:54:50Z

Stack from ghstack (oldest at bottom):

Summary
Based on previous PR, add the config to support quantized linear binary - optional(unary) post op fusion.

Activation dtype: uint8
Weight dtype: int8
Output dtype: float32/bfloat16/uint8
Post Op Fusion: with binary and optional[Unary] post operator fusion

Test Plan

clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_with_pointwise_binary

Next Step

[✓] Unary post op fusion
[✓] Int8 output
[✓] Binary Fusion
AMX int8 MicroGEMM Kernel

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

[ghstack-poisoned]

pytorch-bot · 2024-06-20T01:54:54Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/129103

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (3 Unrelated Failures)

As of commit 4f24cf8 with merge base dabaebd ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

pull / before-test / llm-retrieval (gh) (matched llm-retrieval rule in flaky-rules.json)
Unexpected HTTP response: 429
trunk / before-test / llm-retrieval (gh) (matched llm-retrieval rule in flaky-rules.json)
Unexpected HTTP response: 429

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

inductor / linux-jammy-cpu-py3.8-gcc11-inductor / test (inductor_torchbench_cpu_smoketest_perf, 1, 1, linux.24xl.spr-metal, unstable) (gh) (#126993)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…Binary Fusion" **Summary** Based on previous PR, add the config to support quantized linear binary - optional(unary) post op fusion. - Activation dtype: uint8 - Weight dtype: int8 - Output dtype: float32/bfloat16/uint8 - Post Op Fusion: with binary and optional[Unary] post operator fusion **Test Plan** ``` clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_with_pointwise_binary ``` **Next Step** - [✓] Unary post op fusion - [✓] Int8 output - [✓] Binary Fusion - [ ] AMX int8 MicroGEMM Kernel cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

ghstack-source-id: 17dde76 Pull Request resolved: #129103

…Binary Fusion" **Summary** Based on previous PR, add the config to support quantized linear binary - optional(unary) post op fusion. - Activation dtype: uint8 - Weight dtype: int8 - Output dtype: float32/bfloat16/uint8 - Post Op Fusion: with binary and optional[Unary] post operator fusion **Test Plan** ``` clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_with_pointwise_binary ``` **Next Step** - [✓] Unary post op fusion - [✓] Int8 output - [✓] Binary Fusion - [ ] AMX int8 MicroGEMM Kernel cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

ghstack-source-id: 1e0f015 Pull Request resolved: #129103

test/inductor/test_cpu_select_algorithm.py

torch/_inductor/codegen/cpp_gemm_template.py

… Binary Fusion" **Summary** Based on previous PR, add the config to support quantized linear binary - optional(unary) post op fusion. - Activation dtype: uint8 - Weight dtype: int8 - Output dtype: float32/bfloat16/uint8 - Post Op Fusion: with binary and optional[Unary] post operator fusion **Test Plan** ``` clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_with_pointwise_binary ``` **Next Step** - [✓] Unary post op fusion - [✓] Int8 output - [✓] Binary Fusion - [ ] AMX int8 MicroGEMM Kernel cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

torch/_inductor/codegen/cpp_gemm_template.py

… Binary Fusion" **Summary** Based on previous PR, add the config to support quantized linear binary - optional(unary) post op fusion. - Activation dtype: uint8 - Weight dtype: int8 - Output dtype: float32/bfloat16/uint8 - Post Op Fusion: with binary and optional[Unary] post operator fusion **Test Plan** ``` clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_with_pointwise_binary ``` **Next Step** - [✓] Unary post op fusion - [✓] Int8 output - [✓] Binary Fusion - [ ] AMX int8 MicroGEMM Kernel cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

torch/_inductor/codegen/cpp_gemm_template.py

… Binary Fusion" **Summary** Based on previous PR, add the config to support quantized linear binary - optional(unary) post op fusion. - Activation dtype: uint8 - Weight dtype: int8 - Output dtype: float32/bfloat16/uint8 - Post Op Fusion: with binary and optional[Unary] post operator fusion **Test Plan** ``` clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_with_pointwise_binary ``` **Next Step** - [✓] Unary post op fusion - [✓] Int8 output - [✓] Binary Fusion - [ ] AMX int8 MicroGEMM Kernel cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

leslie-fang-intel · 2024-07-02T12:38:04Z

@pytorchbot merge

pytorchmergebot · 2024-07-02T12:39:44Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

**Summary** Add the AMX micro gemm kernel with int8 data type. **Test Plan** ``` clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_amx ``` **Next Step** - [✓] Unary post op fusion - [✓] Int8 output - [✓] Binary Fusion - [✓] AMX int8 MicroGEMM Kernel Pull Request resolved: #129220 Approved by: https://github.com/jgong5 ghstack dependencies: #128825, #129048, #129049, #129103

…129221) **Summary** This PR mainly refactor 2 things: 1. Passing in weight's data type explicitly in `create_micro_gemm` as `input2.dtype`. When registering `CppMicroGemmConfig`, we will reuse `input.dtype` if `input2.dtype` is not explicitly registered. 2. Add an util function to get the output data type and compute data type from input data type. Pull Request resolved: #129221 Approved by: https://github.com/jgong5, https://github.com/jansel ghstack dependencies: #128825, #129048, #129049, #129103, #129220

… template (#129470) **Summary** Remove redundant INT8-specific logic in the INT8 GEMM template to unify the code structure with FP32/BF16/FP16 GEMM Template. **Test Plan** ``` numactl -C 56-111 -m 1 python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear ``` Pull Request resolved: #129470 Approved by: https://github.com/jgong5 ghstack dependencies: #128825, #129048, #129049, #129103, #129220, #129221

Inductor][CPP] Enable Quantized Linear GEMM Template with Binary Fusion

ed10f34

[ghstack-poisoned]

leslie-fang-intel mentioned this pull request Jun 19, 2024

[Inductor][CPP] Enable Quantized Linear GEMM Template with INT8 output and Unary Post Op #129048

Closed

2 tasks

pytorch-bot bot added ciflow/inductor module: inductor labels Jun 20, 2024

leslie-fang-intel mentioned this pull request Jun 19, 2024

[Inductor][Quant] Change the schema of QLinear Binary #129049

Closed

leslie-fang-intel marked this pull request as draft June 20, 2024 01:55

leslie-fang-intel added ciflow/trunk Trigger trunk jobs on your pull request topic: not user facing topic category labels Jun 20, 2024

pytorchbot added the open source label Jun 20, 2024

leslie-fang-intel mentioned this pull request Jun 20, 2024

Add mkldnn_ir.py into merge rule #129107

Closed

leslie-fang-intel added a commit that referenced this pull request Jun 20, 2024

Inductor][CPP] Enable Quantized Linear GEMM Template with Binary Fusion

cc661e6

ghstack-source-id: 17dde76 Pull Request resolved: #129103

leslie-fang-intel mentioned this pull request Jun 20, 2024

[Test] Int8 nemurical with reduce range of activation #129125

Closed

leslie-fang-intel added 2 commits June 20, 2024 17:52

leslie-fang-intel added a commit that referenced this pull request Jun 21, 2024

Inductor][CPP] Enable Quantized Linear GEMM Template with Binary Fusion

7ba03ec

ghstack-source-id: 1e0f015 Pull Request resolved: #129103

leslie-fang-intel changed the title ~~Inductor][CPP] Enable Quantized Linear GEMM Template with Binary Fusion~~ [Inductor][CPP] Enable Quantized Linear GEMM Template with Binary Fusion Jun 21, 2024

leslie-fang-intel requested a review from chunyuan-w June 21, 2024 08:31

jgong5 requested changes Jun 21, 2024

View reviewed changes

test/inductor/test_cpu_select_algorithm.py Show resolved Hide resolved

torch/_inductor/codegen/cpp_gemm_template.py Outdated Show resolved Hide resolved

torch/_inductor/codegen/cpp_gemm_template.py Outdated Show resolved Hide resolved

leslie-fang-intel added 2 commits June 22, 2024 16:35

leslie-fang-intel requested a review from jgong5 June 24, 2024 07:09

jgong5 requested changes Jun 25, 2024

View reviewed changes

torch/_inductor/codegen/cpp_gemm_template.py Outdated Show resolved Hide resolved

leslie-fang-intel requested a review from jgong5 June 25, 2024 07:51

jgong5 reviewed Jun 25, 2024

View reviewed changes

torch/_inductor/codegen/cpp_gemm_template.py Show resolved Hide resolved

leslie-fang-intel mentioned this pull request Jun 25, 2024

[Inductor][CPP] Remove redundant INT8-specific logic in the INT8 GEMM template #129470

Closed

leslie-fang-intel requested a review from jgong5 June 25, 2024 13:12

jgong5 approved these changes Jun 26, 2024

View reviewed changes

leslie-fang-intel added 3 commits June 26, 2024 02:39

jansel approved these changes Jun 28, 2024

View reviewed changes

pytorchmergebot added the merging label Jul 2, 2024

pytorchmergebot added the Merged label Jul 2, 2024

pytorchmergebot closed this in a796358 Jul 2, 2024

pytorchmergebot removed the merging label Jul 2, 2024

github-actions bot deleted the gh/leslie-fang-intel/120/head branch August 2, 2024 01:56

jgong5 mentioned this pull request Aug 24, 2024

[RFC] Add Cpp Template for GEMM related ops via max-autotune for Inductor CPU #125683

Open

18 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Inductor][CPP] Enable Quantized Linear GEMM Template with Binary Fusion #129103

[Inductor][CPP] Enable Quantized Linear GEMM Template with Binary Fusion #129103

Uh oh!

leslie-fang-intel commented Jun 20, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jun 20, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

leslie-fang-intel commented Jul 2, 2024

Uh oh!

pytorchmergebot commented Jul 2, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[Inductor][CPP] Enable Quantized Linear GEMM Template with Binary Fusion #129103

[Inductor][CPP] Enable Quantized Linear GEMM Template with Binary Fusion #129103

Uh oh!

Conversation

leslie-fang-intel commented Jun 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/129103

✅ You can merge normally! (3 Unrelated Failures)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

leslie-fang-intel commented Jul 2, 2024

Uh oh!

pytorchmergebot commented Jul 2, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

leslie-fang-intel commented Jun 20, 2024 •

edited

Loading

pytorch-bot bot commented Jun 20, 2024 •

edited

Loading