Skip to content

Conversation

@leslie-fang-intel
Copy link
Collaborator

@leslie-fang-intel leslie-fang-intel commented Jun 20, 2024

Stack from ghstack (oldest at bottom):

Summary
Based on previous PR, add the config to support quantized linear binary - optional(unary) post op fusion.

  • Activation dtype: uint8
  • Weight dtype: int8
  • Output dtype: float32/bfloat16/uint8
  • Post Op Fusion: with binary and optional[Unary] post operator fusion

Test Plan

clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_with_pointwise_binary

Next Step

  • [✓] Unary post op fusion
  • [✓] Int8 output
  • [✓] Binary Fusion
  • AMX int8 MicroGEMM Kernel

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

@pytorch-bot
Copy link

pytorch-bot bot commented Jun 20, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/129103

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (3 Unrelated Failures)

As of commit 4f24cf8 with merge base dabaebd (image):

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…Binary Fusion"


**Summary**
Based on previous PR, add the config to support quantized linear binary - optional(unary) post op fusion.

- Activation dtype: uint8
- Weight dtype: int8
- Output dtype: float32/bfloat16/uint8
- Post Op Fusion: with binary and optional[Unary] post operator fusion

**Test Plan**
```
clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_with_pointwise_binary
```

**Next Step**
- [✓] Unary post op fusion
- [✓] Int8 output
- [✓] Binary Fusion
- [ ] AMX int8 MicroGEMM Kernel

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
…Binary Fusion"


**Summary**
Based on previous PR, add the config to support quantized linear binary - optional(unary) post op fusion.

- Activation dtype: uint8
- Weight dtype: int8
- Output dtype: float32/bfloat16/uint8
- Post Op Fusion: with binary and optional[Unary] post operator fusion

**Test Plan**
```
clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_with_pointwise_binary
```

**Next Step**
- [✓] Unary post op fusion
- [✓] Int8 output
- [✓] Binary Fusion
- [ ] AMX int8 MicroGEMM Kernel

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
…Binary Fusion"


**Summary**
Based on previous PR, add the config to support quantized linear binary - optional(unary) post op fusion.

- Activation dtype: uint8
- Weight dtype: int8
- Output dtype: float32/bfloat16/uint8
- Post Op Fusion: with binary and optional[Unary] post operator fusion

**Test Plan**
```
clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_with_pointwise_binary
```

**Next Step**
- [✓] Unary post op fusion
- [✓] Int8 output
- [✓] Binary Fusion
- [ ] AMX int8 MicroGEMM Kernel

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
…Binary Fusion"


**Summary**
Based on previous PR, add the config to support quantized linear binary - optional(unary) post op fusion.

- Activation dtype: uint8
- Weight dtype: int8
- Output dtype: float32/bfloat16/uint8
- Post Op Fusion: with binary and optional[Unary] post operator fusion

**Test Plan**
```
clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_with_pointwise_binary
```

**Next Step**
- [✓] Unary post op fusion
- [✓] Int8 output
- [✓] Binary Fusion
- [ ] AMX int8 MicroGEMM Kernel

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
…Binary Fusion"


**Summary**
Based on previous PR, add the config to support quantized linear binary - optional(unary) post op fusion.

- Activation dtype: uint8
- Weight dtype: int8
- Output dtype: float32/bfloat16/uint8
- Post Op Fusion: with binary and optional[Unary] post operator fusion

**Test Plan**
```
clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_with_pointwise_binary
```

**Next Step**
- [✓] Unary post op fusion
- [✓] Int8 output
- [✓] Binary Fusion
- [ ] AMX int8 MicroGEMM Kernel

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
…Binary Fusion"


**Summary**
Based on previous PR, add the config to support quantized linear binary - optional(unary) post op fusion.

- Activation dtype: uint8
- Weight dtype: int8
- Output dtype: float32/bfloat16/uint8
- Post Op Fusion: with binary and optional[Unary] post operator fusion

**Test Plan**
```
clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_with_pointwise_binary
```

**Next Step**
- [✓] Unary post op fusion
- [✓] Int8 output
- [✓] Binary Fusion
- [ ] AMX int8 MicroGEMM Kernel

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
…Binary Fusion"


**Summary**
Based on previous PR, add the config to support quantized linear binary - optional(unary) post op fusion.

- Activation dtype: uint8
- Weight dtype: int8
- Output dtype: float32/bfloat16/uint8
- Post Op Fusion: with binary and optional[Unary] post operator fusion

**Test Plan**
```
clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_with_pointwise_binary
```

**Next Step**
- [✓] Unary post op fusion
- [✓] Int8 output
- [✓] Binary Fusion
- [ ] AMX int8 MicroGEMM Kernel

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
leslie-fang-intel added a commit that referenced this pull request Jun 21, 2024
@leslie-fang-intel leslie-fang-intel changed the title Inductor][CPP] Enable Quantized Linear GEMM Template with Binary Fusion [Inductor][CPP] Enable Quantized Linear GEMM Template with Binary Fusion Jun 21, 2024
… Binary Fusion"


**Summary**
Based on previous PR, add the config to support quantized linear binary - optional(unary) post op fusion.

- Activation dtype: uint8
- Weight dtype: int8
- Output dtype: float32/bfloat16/uint8
- Post Op Fusion: with binary and optional[Unary] post operator fusion

**Test Plan**
```
clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_with_pointwise_binary
```

**Next Step**
- [✓] Unary post op fusion
- [✓] Int8 output
- [✓] Binary Fusion
- [ ] AMX int8 MicroGEMM Kernel

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
… Binary Fusion"


**Summary**
Based on previous PR, add the config to support quantized linear binary - optional(unary) post op fusion.

- Activation dtype: uint8
- Weight dtype: int8
- Output dtype: float32/bfloat16/uint8
- Post Op Fusion: with binary and optional[Unary] post operator fusion

**Test Plan**
```
clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_with_pointwise_binary
```

**Next Step**
- [✓] Unary post op fusion
- [✓] Int8 output
- [✓] Binary Fusion
- [ ] AMX int8 MicroGEMM Kernel

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
@leslie-fang-intel leslie-fang-intel requested a review from jgong5 June 24, 2024 07:09
… Binary Fusion"


**Summary**
Based on previous PR, add the config to support quantized linear binary - optional(unary) post op fusion.

- Activation dtype: uint8
- Weight dtype: int8
- Output dtype: float32/bfloat16/uint8
- Post Op Fusion: with binary and optional[Unary] post operator fusion

**Test Plan**
```
clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_with_pointwise_binary
```

**Next Step**
- [✓] Unary post op fusion
- [✓] Int8 output
- [✓] Binary Fusion
- [ ] AMX int8 MicroGEMM Kernel

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
@leslie-fang-intel leslie-fang-intel requested a review from jgong5 June 25, 2024 07:51
… Binary Fusion"


**Summary**
Based on previous PR, add the config to support quantized linear binary - optional(unary) post op fusion.

- Activation dtype: uint8
- Weight dtype: int8
- Output dtype: float32/bfloat16/uint8
- Post Op Fusion: with binary and optional[Unary] post operator fusion

**Test Plan**
```
clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_with_pointwise_binary
```

**Next Step**
- [✓] Unary post op fusion
- [✓] Int8 output
- [✓] Binary Fusion
- [ ] AMX int8 MicroGEMM Kernel

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
… Binary Fusion"


**Summary**
Based on previous PR, add the config to support quantized linear binary - optional(unary) post op fusion.

- Activation dtype: uint8
- Weight dtype: int8
- Output dtype: float32/bfloat16/uint8
- Post Op Fusion: with binary and optional[Unary] post operator fusion

**Test Plan**
```
clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_with_pointwise_binary
```

**Next Step**
- [✓] Unary post op fusion
- [✓] Int8 output
- [✓] Binary Fusion
- [ ] AMX int8 MicroGEMM Kernel

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
… Binary Fusion"


**Summary**
Based on previous PR, add the config to support quantized linear binary - optional(unary) post op fusion.

- Activation dtype: uint8
- Weight dtype: int8
- Output dtype: float32/bfloat16/uint8
- Post Op Fusion: with binary and optional[Unary] post operator fusion

**Test Plan**
```
clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_with_pointwise_binary
```

**Next Step**
- [✓] Unary post op fusion
- [✓] Int8 output
- [✓] Binary Fusion
- [ ] AMX int8 MicroGEMM Kernel

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
… Binary Fusion"


**Summary**
Based on previous PR, add the config to support quantized linear binary - optional(unary) post op fusion.

- Activation dtype: uint8
- Weight dtype: int8
- Output dtype: float32/bfloat16/uint8
- Post Op Fusion: with binary and optional[Unary] post operator fusion

**Test Plan**
```
clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_with_pointwise_binary
```

**Next Step**
- [✓] Unary post op fusion
- [✓] Int8 output
- [✓] Binary Fusion
- [ ] AMX int8 MicroGEMM Kernel

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
… Binary Fusion"


**Summary**
Based on previous PR, add the config to support quantized linear binary - optional(unary) post op fusion.

- Activation dtype: uint8
- Weight dtype: int8
- Output dtype: float32/bfloat16/uint8
- Post Op Fusion: with binary and optional[Unary] post operator fusion

**Test Plan**
```
clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_with_pointwise_binary
```

**Next Step**
- [✓] Unary post op fusion
- [✓] Int8 output
- [✓] Binary Fusion
- [ ] AMX int8 MicroGEMM Kernel

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
@leslie-fang-intel
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Jul 2, 2024
**Summary**
Add the AMX micro gemm kernel with int8 data type.

**Test Plan**
```
clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_amx
```

**Next Step**
- [✓] Unary post op fusion
- [✓] Int8 output
- [✓] Binary Fusion
- [✓] AMX int8 MicroGEMM Kernel

Pull Request resolved: #129220
Approved by: https://github.com/jgong5
ghstack dependencies: #128825, #129048, #129049, #129103
pytorchmergebot pushed a commit that referenced this pull request Jul 2, 2024
…129221)

**Summary**
This PR mainly refactor 2 things:

1. Passing in weight's data type explicitly in `create_micro_gemm` as `input2.dtype`. When registering `CppMicroGemmConfig`, we will reuse `input.dtype` if `input2.dtype` is not explicitly registered.
2. Add an util function to get the output data type and compute data type from input data type.

Pull Request resolved: #129221
Approved by: https://github.com/jgong5, https://github.com/jansel
ghstack dependencies: #128825, #129048, #129049, #129103, #129220
pytorchmergebot pushed a commit that referenced this pull request Jul 2, 2024
… template (#129470)

**Summary**
Remove redundant INT8-specific logic in the INT8 GEMM template to unify the code structure with FP32/BF16/FP16 GEMM Template.

**Test Plan**
```
numactl -C 56-111 -m 1 python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear
```

Pull Request resolved: #129470
Approved by: https://github.com/jgong5
ghstack dependencies: #128825, #129048, #129049, #129103, #129220, #129221
@github-actions github-actions bot deleted the gh/leslie-fang-intel/120/head branch August 2, 2024 01:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants