Skip to content

Conversation

@leslie-fang-intel
Copy link
Collaborator

@leslie-fang-intel leslie-fang-intel commented Jun 19, 2024

Stack from ghstack (oldest at bottom):

Summary
Based on previous PR, add the config to support of int8 output and unary post op fusion with ReLU and GeLU

  • Activation dtype: uint8
  • Weight dtype: int8
  • Output dtype: float32/bfloat16/uint8
  • Post Op Fusion: with unary post operator fusion

Test Plan

clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_with_pointwise

Next Step

  • [✓] Unary post op fusion
  • [✓] Int8 output
  • Binary Fusion
  • AMX int8 MicroGEMM Kernel

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

@pytorch-bot
Copy link

pytorch-bot bot commented Jun 19, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/129048

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (4 Unrelated Failures)

As of commit 1e6d1a7 with merge base dabaebd (image):

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

… INT8 output and Unary Post Op"


**Summary**
Based on previous PR, add the config to support of int8 output and unary post op fusion with `ReLU` and `GeLU`

- Activation dtype: uint8
- Weight dtype: int8
- Output dtype: float32/bfloat16/uint8
- Post Op Fusion: with unary post operator fusion

**Test Plan**
```
clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_with_pointwise
```

**Next Step**
- [✓] Unary post op fusion
- [✓] Int8 output
- [ ] Binary Fusion
- [ ] AMX int8 MicroGEMM Kernel

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
… INT8 output and Unary Post Op"


**Summary**
Based on previous PR, add the config to support of int8 output and unary post op fusion with `ReLU` and `GeLU`

- Activation dtype: uint8
- Weight dtype: int8
- Output dtype: float32/bfloat16/uint8
- Post Op Fusion: with unary post operator fusion

**Test Plan**
```
clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_with_pointwise
```

**Next Step**
- [✓] Unary post op fusion
- [✓] Int8 output
- [ ] Binary Fusion
- [ ] AMX int8 MicroGEMM Kernel

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
… INT8 output and Unary Post Op"


**Summary**
Based on previous PR, add the config to support of int8 output and unary post op fusion with `ReLU` and `GeLU`

- Activation dtype: uint8
- Weight dtype: int8
- Output dtype: float32/bfloat16/uint8
- Post Op Fusion: with unary post operator fusion

**Test Plan**
```
clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_with_pointwise
```

**Next Step**
- [✓] Unary post op fusion
- [✓] Int8 output
- [ ] Binary Fusion
- [ ] AMX int8 MicroGEMM Kernel

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
leslie-fang-intel added a commit to leslie-fang-intel/pytorch that referenced this pull request Jun 21, 2024
…t and Unary Post Op

ghstack-source-id: 234e69c
Pull Request resolved: pytorch#129048
… INT8 output and Unary Post Op"


**Summary**
Based on previous PR, add the config to support of int8 output and unary post op fusion with `ReLU` and `GeLU`

- Activation dtype: uint8
- Weight dtype: int8
- Output dtype: float32/bfloat16/uint8
- Post Op Fusion: with unary post operator fusion

**Test Plan**
```
clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_with_pointwise
```

**Next Step**
- [✓] Unary post op fusion
- [✓] Int8 output
- [ ] Binary Fusion
- [ ] AMX int8 MicroGEMM Kernel

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
exact_dtype=True,
)
self.assertEqual(counters["inductor"]["select_algorithm_autotune"], 1)
self.assertEqual(counters["inductor"]["select_algorithm_autotune"], 2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?
Also, please add check: self.assertEqual(counters["inductor"]["cpp_epilogue_fusion_counter"], 1)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The testcase has been changed from single Linear module to 2 nearby Linear modules. That will hit the autotuning twice.
Thanks and added the check of self.assertEqual(counters["inductor"]["cpp_epilogue_fusion_counter"], 0), since the relu, gelu has been fused into onednn.qlinear as part of template instead of fusion by scheduler.

… INT8 output and Unary Post Op"


**Summary**
Based on previous PR, add the config to support of int8 output and unary post op fusion with `ReLU` and `GeLU`

- Activation dtype: uint8
- Weight dtype: int8
- Output dtype: float32/bfloat16/uint8
- Post Op Fusion: with unary post operator fusion

**Test Plan**
```
clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_with_pointwise
```

**Next Step**
- [✓] Unary post op fusion
- [✓] Int8 output
- [ ] Binary Fusion
- [ ] AMX int8 MicroGEMM Kernel

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
… INT8 output and Unary Post Op"


**Summary**
Based on previous PR, add the config to support of int8 output and unary post op fusion with `ReLU` and `GeLU`

- Activation dtype: uint8
- Weight dtype: int8
- Output dtype: float32/bfloat16/uint8
- Post Op Fusion: with unary post operator fusion

**Test Plan**
```
clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_with_pointwise
```

**Next Step**
- [✓] Unary post op fusion
- [✓] Int8 output
- [ ] Binary Fusion
- [ ] AMX int8 MicroGEMM Kernel

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
… INT8 output and Unary Post Op"


**Summary**
Based on previous PR, add the config to support of int8 output and unary post op fusion with `ReLU` and `GeLU`

- Activation dtype: uint8
- Weight dtype: int8
- Output dtype: float32/bfloat16/uint8
- Post Op Fusion: with unary post operator fusion

**Test Plan**
```
clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_with_pointwise
```

**Next Step**
- [✓] Unary post op fusion
- [✓] Int8 output
- [ ] Binary Fusion
- [ ] AMX int8 MicroGEMM Kernel

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
… INT8 output and Unary Post Op"


**Summary**
Based on previous PR, add the config to support of int8 output and unary post op fusion with `ReLU` and `GeLU`

- Activation dtype: uint8
- Weight dtype: int8
- Output dtype: float32/bfloat16/uint8
- Post Op Fusion: with unary post operator fusion

**Test Plan**
```
clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_with_pointwise
```

**Next Step**
- [✓] Unary post op fusion
- [✓] Int8 output
- [ ] Binary Fusion
- [ ] AMX int8 MicroGEMM Kernel

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
… INT8 output and Unary Post Op"


**Summary**
Based on previous PR, add the config to support of int8 output and unary post op fusion with `ReLU` and `GeLU`

- Activation dtype: uint8
- Weight dtype: int8
- Output dtype: float32/bfloat16/uint8
- Post Op Fusion: with unary post operator fusion

**Test Plan**
```
clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_with_pointwise
```

**Next Step**
- [✓] Unary post op fusion
- [✓] Int8 output
- [ ] Binary Fusion
- [ ] AMX int8 MicroGEMM Kernel

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
… INT8 output and Unary Post Op"


**Summary**
Based on previous PR, add the config to support of int8 output and unary post op fusion with `ReLU` and `GeLU`

- Activation dtype: uint8
- Weight dtype: int8
- Output dtype: float32/bfloat16/uint8
- Post Op Fusion: with unary post operator fusion

**Test Plan**
```
clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_with_pointwise
```

**Next Step**
- [✓] Unary post op fusion
- [✓] Int8 output
- [ ] Binary Fusion
- [ ] AMX int8 MicroGEMM Kernel

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
@leslie-fang-intel leslie-fang-intel requested a review from jansel June 28, 2024 01:23
@leslie-fang-intel
Copy link
Collaborator Author

Hi @jansel, this PR may also need your kind review since I have modified torch/_inductor/utils.py.

… INT8 output and Unary Post Op"


**Summary**
Based on previous PR, add the config to support of int8 output and unary post op fusion with `ReLU` and `GeLU`

- Activation dtype: uint8
- Weight dtype: int8
- Output dtype: float32/bfloat16/uint8
- Post Op Fusion: with unary post operator fusion

**Test Plan**
```
clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_with_pointwise
```

**Next Step**
- [✓] Unary post op fusion
- [✓] Int8 output
- [ ] Binary Fusion
- [ ] AMX int8 MicroGEMM Kernel

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
@leslie-fang-intel
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Jul 2, 2024
**Summary**
We change the schema of QLinear Binary, so it will be easier to enable the corresponding gemm template.

- Extra input of binary post-op is a tensor which needs to be an input node of autotuning, we need to move it at front of `output_scale` which is a scalar.
- We also move it at front of `bias`, since `bias` is optional tensor for this fusion, but `other` is a must to have for linear binary fusion.

**Test Plan**
```
python -u -m pytest -s -v test/quantization/core/test_quantized_op.py -k qlinear
python -u -m pytest -s -v test/inductor/test_mkldnn_pattern_matcher.py -k qlinear
```

Pull Request resolved: #129049
Approved by: https://github.com/jgong5, https://github.com/jansel
ghstack dependencies: #128825, #129048
pytorchmergebot pushed a commit that referenced this pull request Jul 2, 2024
…ion (#129103)

**Summary**
Based on previous PR, add the config to support quantized linear binary - optional(unary) post op fusion.

- Activation dtype: uint8
- Weight dtype: int8
- Output dtype: float32/bfloat16/uint8
- Post Op Fusion: with binary and optional[Unary] post operator fusion

**Test Plan**
```
clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_with_pointwise_binary
```

**Next Step**
- [✓] Unary post op fusion
- [✓] Int8 output
- [✓] Binary Fusion
- [ ] AMX int8 MicroGEMM Kernel

Pull Request resolved: #129103
Approved by: https://github.com/jgong5, https://github.com/jansel
ghstack dependencies: #128825, #129048, #129049
pytorchmergebot pushed a commit that referenced this pull request Jul 2, 2024
**Summary**
Add the AMX micro gemm kernel with int8 data type.

**Test Plan**
```
clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_amx
```

**Next Step**
- [✓] Unary post op fusion
- [✓] Int8 output
- [✓] Binary Fusion
- [✓] AMX int8 MicroGEMM Kernel

Pull Request resolved: #129220
Approved by: https://github.com/jgong5
ghstack dependencies: #128825, #129048, #129049, #129103
pytorchmergebot pushed a commit that referenced this pull request Jul 2, 2024
…129221)

**Summary**
This PR mainly refactor 2 things:

1. Passing in weight's data type explicitly in `create_micro_gemm` as `input2.dtype`. When registering `CppMicroGemmConfig`, we will reuse `input.dtype` if `input2.dtype` is not explicitly registered.
2. Add an util function to get the output data type and compute data type from input data type.

Pull Request resolved: #129221
Approved by: https://github.com/jgong5, https://github.com/jansel
ghstack dependencies: #128825, #129048, #129049, #129103, #129220
pytorchmergebot pushed a commit that referenced this pull request Jul 2, 2024
… template (#129470)

**Summary**
Remove redundant INT8-specific logic in the INT8 GEMM template to unify the code structure with FP32/BF16/FP16 GEMM Template.

**Test Plan**
```
numactl -C 56-111 -m 1 python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear
```

Pull Request resolved: #129470
Approved by: https://github.com/jgong5
ghstack dependencies: #128825, #129048, #129049, #129103, #129220, #129221
pytorchmergebot pushed a commit to khushi-411/pytorch that referenced this pull request Jul 2, 2024
…t and Unary Post Op (pytorch#129048)

**Summary**
Based on previous PR, add the config to support of int8 output and unary post op fusion with `ReLU` and `GeLU`

- Activation dtype: uint8
- Weight dtype: int8
- Output dtype: float32/bfloat16/uint8
- Post Op Fusion: with unary post operator fusion

**Test Plan**
```
clear && python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_quantized_linear_with_pointwise
```

**Next Step**
- [✓] Unary post op fusion
- [✓] Int8 output
- [ ] Binary Fusion
- [ ] AMX int8 MicroGEMM Kernel

Pull Request resolved: pytorch#129048
Approved by: https://github.com/jgong5, https://github.com/jansel
ghstack dependencies: pytorch#128825
@github-actions github-actions bot deleted the gh/leslie-fang-intel/118/head branch July 31, 2024 01:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants