In AMX GEMM micro-kernel, use same dtype for A & B only if B is dequantized #139906

sanchitintel · 2024-11-06T18:22:13Z

@frost-intel discovered that some Inductor auto-tuning UTs for CPU are currently broken on machines supporting AMX ISA. That's because in #136688, I had reverted a change in the AMX GEMM micro-kernel that was introduced in #131887, but it looks like some other implementations introduced after the aforementioned change rely upon it, so it should not have been reverted.

Added a fix.

Ideally, a CI machine that supports AMX should cover these UTs (test/inductor/test_cpu_select_algorithm.py). We do have at least one CI machines that support AMX.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

… of dequantized B Some Inductor auto-tuning UTs for CPU are currently broken on machines supporting AMX because I had reverted a change in the AMX micro-kernel, but it looks like some other implementations also rely upon that change now, so it should not have been reverted. Adding a workaround. Ideally, a CI machine that supports AMX should cover these UTs (test/inductor/test_cpu_select_algorithm.py).

pytorch-bot · 2024-11-06T18:22:18Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139906

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 8cd5bdd with merge base 99deedf ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

sanchitintel · 2024-11-06T18:24:04Z

Hi @chuanqi129 @WeizhuoZhang-intel, can we add test/inductor/test_cpu_select_algorithm.py to the test-plan of the CI machine that supports AMX? These UTs seem to be a bit long-running (~1 hour), so perhaps more suited for a trunk CI job? Thanks!

sanchitintel · 2024-11-07T06:54:27Z

@pytorchbot merge

pytorchmergebot · 2024-11-07T06:56:33Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

@frost-intel

…ntized (pytorch#139906) @frost-intel discovered that some Inductor auto-tuning UTs for CPU are currently broken on machines supporting AMX ISA. That's because in pytorch#136688, I had reverted a change in the AMX GEMM micro-kernel that was introduced in pytorch#131887, but it looks like some other implementations introduced after the aforementioned change rely upon it, so it should not have been reverted. Added a fix. Ideally, a CI machine that supports AMX should cover these UTs (test/inductor/test_cpu_select_algorithm.py). We do have at least one CI machines that support AMX. Pull Request resolved: pytorch#139906 Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5

@frost-intel

…ntized (pytorch#139906) @frost-intel discovered that some Inductor auto-tuning UTs for CPU are currently broken on machines supporting AMX ISA. That's because in pytorch#136688, I had reverted a change in the AMX GEMM micro-kernel that was introduced in pytorch#131887, but it looks like some other implementations introduced after the aforementioned change rely upon it, so it should not have been reverted. Added a fix. Ideally, a CI machine that supports AMX should cover these UTs (test/inductor/test_cpu_select_algorithm.py). We do have at least one CI machines that support AMX. Pull Request resolved: pytorch#139906 Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5

@frost-intel

…ntized (pytorch#139906) @frost-intel discovered that some Inductor auto-tuning UTs for CPU are currently broken on machines supporting AMX ISA. That's because in pytorch#136688, I had reverted a change in the AMX GEMM micro-kernel that was introduced in pytorch#131887, but it looks like some other implementations introduced after the aforementioned change rely upon it, so it should not have been reverted. Added a fix. Ideally, a CI machine that supports AMX should cover these UTs (test/inductor/test_cpu_select_algorithm.py). We do have at least one CI machines that support AMX. Pull Request resolved: pytorch#139906 Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5

@frost-intel

…ntized (pytorch#139906) @frost-intel discovered that some Inductor auto-tuning UTs for CPU are currently broken on machines supporting AMX ISA. That's because in pytorch#136688, I had reverted a change in the AMX GEMM micro-kernel that was introduced in pytorch#131887, but it looks like some other implementations introduced after the aforementioned change rely upon it, so it should not have been reverted. Added a fix. Ideally, a CI machine that supports AMX should cover these UTs (test/inductor/test_cpu_select_algorithm.py). We do have at least one CI machines that support AMX. Pull Request resolved: pytorch#139906 Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5

pytorch-bot bot added ciflow/inductor module: inductor labels Nov 6, 2024

sanchitintel added the topic: not user facing topic category label Nov 6, 2024

sanchitintel requested a review from jgong5 November 6, 2024 18:28

sanchitintel changed the title ~~In AMX GEMM micro-kernel, use same dtype for A & B only when B is dequantized~~ In AMX GEMM micro-kernel, use same dtype for A & B only if B is dequantized Nov 6, 2024

pytorchbot added the open source label Nov 6, 2024

sanchitintel requested a review from leslie-fang-intel November 6, 2024 19:03

leslie-fang-intel approved these changes Nov 7, 2024

View reviewed changes

jgong5 approved these changes Nov 7, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 7, 2024

pytorchmergebot added the merging label Nov 7, 2024

pytorchmergebot added the Merged label Nov 7, 2024

pytorchmergebot closed this in 314aa26 Nov 7, 2024

pytorchmergebot removed the merging label Nov 7, 2024

github-actions bot deleted the sanchitj/patch_amx_gemm_micro_kernel branch December 8, 2024 02:18

sanchitintel mentioned this pull request Jun 9, 2025

High-performance LLM quantization on X86 CPU with native PyTorch #155435

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

In AMX GEMM micro-kernel, use same dtype for A & B only if B is dequantized #139906

In AMX GEMM micro-kernel, use same dtype for A & B only if B is dequantized #139906

Uh oh!

sanchitintel commented Nov 6, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Nov 6, 2024 •

edited

Loading

Uh oh!

sanchitintel commented Nov 6, 2024 •

edited

Loading

Uh oh!

sanchitintel commented Nov 7, 2024

Uh oh!

pytorchmergebot commented Nov 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

In AMX GEMM micro-kernel, use same dtype for A & B only if B is dequantized #139906

In AMX GEMM micro-kernel, use same dtype for A & B only if B is dequantized #139906

Uh oh!

Conversation

sanchitintel commented Nov 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139906

✅ No Failures

Uh oh!

sanchitintel commented Nov 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sanchitintel commented Nov 7, 2024

Uh oh!

pytorchmergebot commented Nov 7, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sanchitintel commented Nov 6, 2024 •

edited

Loading

pytorch-bot bot commented Nov 6, 2024 •

edited

Loading

sanchitintel commented Nov 6, 2024 •

edited

Loading