[Inductor][CPP] Enable Quantized Linear with AMX MicroGEMM #129220

leslie-fang-intel · 2024-06-21T07:42:13Z

Stack from ghstack (oldest at bottom):

Summary
Add the AMX micro gemm kernel with int8 data type.

Test Plan

clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_amx

Next Step

[✓] Unary post op fusion
[✓] Int8 output
[✓] Binary Fusion
[✓] AMX int8 MicroGEMM Kernel

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

[ghstack-poisoned]

pytorch-bot · 2024-06-21T07:42:17Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/129220

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 2fe32be with merge base dabaebd ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / before-test / llm-retrieval (gh) (matched llm-retrieval rule in flaky-rules.json)
Unexpected HTTP response: 429

This comment was automatically generated by Dr. CI and updates every 15 minutes.

**Summary** Add the AMX micro gemm kernel with int8 data type. **Test Plan** ``` clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_amx ``` **Next Step** - [✓] Unary post op fusion - [✓] Int8 output - [✓] Binary Fusion - [✓] AMX int8 MicroGEMM Kernel cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

ghstack-source-id: f2a861d Pull Request resolved: #129220

**Summary** Add the AMX micro gemm kernel with int8 data type. **Test Plan** ``` clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_amx ``` **Next Step** - [✓] Unary post op fusion - [✓] Int8 output - [✓] Binary Fusion - [✓] AMX int8 MicroGEMM Kernel cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

test/inductor/test_cpu_select_algorithm.py

torch/_inductor/codegen/cpp_micro_gemm.py

**Summary** Add the AMX micro gemm kernel with int8 data type. **Test Plan** ``` clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_amx ``` **Next Step** - [✓] Unary post op fusion - [✓] Int8 output - [✓] Binary Fusion - [✓] AMX int8 MicroGEMM Kernel cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

test/inductor/test_cpu_select_algorithm.py

**Summary** Add the AMX micro gemm kernel with int8 data type. **Test Plan** ``` clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_amx ``` **Next Step** - [✓] Unary post op fusion - [✓] Int8 output - [✓] Binary Fusion - [✓] AMX int8 MicroGEMM Kernel cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

leslie-fang-intel · 2024-07-02T12:45:58Z

@pytorchbot merge

pytorchmergebot · 2024-07-02T12:48:01Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…129221) **Summary** This PR mainly refactor 2 things: 1. Passing in weight's data type explicitly in `create_micro_gemm` as `input2.dtype`. When registering `CppMicroGemmConfig`, we will reuse `input.dtype` if `input2.dtype` is not explicitly registered. 2. Add an util function to get the output data type and compute data type from input data type. Pull Request resolved: #129221 Approved by: https://github.com/jgong5, https://github.com/jansel ghstack dependencies: #128825, #129048, #129049, #129103, #129220

… template (#129470) **Summary** Remove redundant INT8-specific logic in the INT8 GEMM template to unify the code structure with FP32/BF16/FP16 GEMM Template. **Test Plan** ``` numactl -C 56-111 -m 1 python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear ``` Pull Request resolved: #129470 Approved by: https://github.com/jgong5 ghstack dependencies: #128825, #129048, #129049, #129103, #129220, #129221

[Inductor][CPP] Enable Quantized Linear with AMX MicroGEMM

5fba440

[ghstack-poisoned]

This was referenced Jun 21, 2024

[Inductor][Quant] Use output dtype torch.uint8 explicitly #128804

Closed

[Inductor][CPP] Fallback QLinear Binaryfusion from postop sum to binary add when others is view #128808

Closed

pytorch-bot bot added ciflow/inductor module: inductor labels Jun 21, 2024

This was referenced Jun 21, 2024

[Inductor][CPP] Enable Quantized Linear GEMM Template with FP32 output #128825

Closed

[Inductor][CPP] Enable Quantized Linear GEMM Template with INT8 output and Unary Post Op #129048

Closed

This was referenced Jun 21, 2024

[Inductor][Quant] Change the schema of QLinear Binary #129049

Closed

[Inductor][CPP] Enable Quantized Linear GEMM Template with Binary Fusion #129103

Closed

leslie-fang-intel added ciflow/trunk Trigger trunk jobs on your pull request topic: not user facing topic category labels Jun 21, 2024

leslie-fang-intel marked this pull request as draft June 21, 2024 07:44

pytorchbot added the open source label Jun 21, 2024

leslie-fang-intel added a commit that referenced this pull request Jun 21, 2024

[Inductor][CPP] Enable Quantized Linear with AMX MicroGEMM

109ceb6

ghstack-source-id: f2a861d Pull Request resolved: #129220

leslie-fang-intel mentioned this pull request Jun 21, 2024

[Inductor][CPP] Pass weight dtype explicitly for cpp gemm template #129221

Closed

leslie-fang-intel added 2 commits June 22, 2024 16:35

leslie-fang-intel marked this pull request as ready for review June 24, 2024 07:09

leslie-fang-intel requested review from chunyuan-w and jgong5 June 24, 2024 07:09

jgong5 requested changes Jun 25, 2024

View reviewed changes

leslie-fang-intel requested a review from jgong5 June 25, 2024 07:51

jgong5 approved these changes Jun 25, 2024

View reviewed changes

test/inductor/test_cpu_select_algorithm.py Outdated Show resolved Hide resolved

leslie-fang-intel mentioned this pull request Jun 25, 2024

[Inductor][CPP] Remove redundant INT8-specific logic in the INT8 GEMM template #129470

Closed

leslie-fang-intel added 4 commits June 26, 2024 02:39

pytorchmergebot added the merging label Jul 2, 2024

pytorchmergebot closed this in 72fa864 Jul 2, 2024

pytorchmergebot added Merged and removed merging labels Jul 2, 2024

github-actions bot deleted the gh/leslie-fang-intel/124/head branch August 2, 2024 01:57

jgong5 mentioned this pull request Aug 24, 2024

[RFC] Add Cpp Template for GEMM related ops via max-autotune for Inductor CPU #125683

Open

18 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Inductor][CPP] Enable Quantized Linear with AMX MicroGEMM #129220

[Inductor][CPP] Enable Quantized Linear with AMX MicroGEMM #129220

Uh oh!

leslie-fang-intel commented Jun 21, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jun 21, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

leslie-fang-intel commented Jul 2, 2024

Uh oh!

pytorchmergebot commented Jul 2, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[Inductor][CPP] Enable Quantized Linear with AMX MicroGEMM #129220

[Inductor][CPP] Enable Quantized Linear with AMX MicroGEMM #129220

Uh oh!

Conversation

leslie-fang-intel commented Jun 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/129220

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

leslie-fang-intel commented Jul 2, 2024

Uh oh!

pytorchmergebot commented Jul 2, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

leslie-fang-intel commented Jun 21, 2024 •

edited

Loading

pytorch-bot bot commented Jun 21, 2024 •

edited

Loading