[ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend #134124

nikhil-arm · 2024-08-21T17:42:15Z

Description:

Quantize Linear Layer Weights to 4-bits:
Quantize the weights of the Linear layer to 4 bits, using symmetric quantization.
Pack two 4-bit weights into one uint8 container.
Choose a quantization scheme (channel-wise or group-wise), with the group size being a multiple of 32.
Prepare Quantized Weights, Scales, and Optional Bias:
After quantizing, obtain the quantized_weights, scales, and groupsize.
If the original Linear layer has a bias, prepare it as well.
Pack the Weights Efficiently:
Use torch.ops.aten._dyn_quant_pack_4bit_weight to optimally pack the weights, scales, and optional bias.

packed_weights = torch.ops.aten._dyn_quant_pack_4bit_weight(weight, scales_and_zeros, bias, groupsize, in_features, out_features)

Input parameters should include:
in_features and out_features (the same as the Linear layer’s corresponding parameters).

Perform Dynamic Quantized Matrix Multiplication:
Use torch.ops.aten._dyn_quant_matmul_4bit to perform matrix multiplication with quantized weights.

output = torch.ops.aten._dyn_quant_matmul_4bit(input, packed_weights,  groupsize, in_features, out_features)

Inputs required include:
The input tensor, packed_weights , groupsize, and the in_features and out_features.

API Usage: #143289

Model Perf :
7B Transformer model:
Prefill : 340 t/s
Decode : 40 t/s
2B Transformer model
Prefill : 747 t/s
Decode : 80 t/s

Tests:
python test/test_linalg.py -k test__dyn_quant_pack_4bit_weight
Ran 1 test in 0.016s

OK

python test/test_linalg.py -k test__dyn_quant_matmul_4bit
Ran 8 tests in 0.077s

OK

python test/test_linalg.py -k test_compile_dyn_quant_matmul_4bit
Ran 8 tests in 11.454s

Change-Id: Ia1672bad5e6ec94e64d8bb1971395d60f4b3a452

Fixes #ISSUE_NUMBER

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @malfet @snadampal @milpuz01 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

pytorch-bot · 2024-08-21T17:42:17Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/134124

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit c25abbf with merge base 8136daf ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2024-08-21T17:42:19Z

The committers listed above are authorized under a signed CLA.

✅ login: ng-05 / name: Nikhil Gupta (8bf5c0e, 69bff5a, c25abbf, 7d7d043)

github-actions · 2024-08-21T17:47:04Z

Attention! native_functions.yaml was changed

If you are adding a new function or defaulted argument to native_functions.yaml, you cannot use it from pre-existing Python frontend code until our FC window passes (two weeks). Split your PR into two PRs, one which adds the new C++ functionality, and one that makes use of it from Python, and land them two weeks apart. See https://github.com/pytorch/pytorch/wiki/PyTorch's-Python-Frontend-Backward-and-Forward-Compatibility-Policy#forwards-compatibility-fc for more info.

Caused by:

aten/src/ATen/native/native_functions.yaml

nikhil-arm · 2024-08-21T17:50:44Z

@pytorchbot label "ciflow/linux-aarch64" "module:arm"

pytorch-bot · 2024-08-21T17:50:51Z

Can't add following labels to PR: ciflow/linux-aarch64. Please ping one of the reviewers for help.

nikhil-arm · 2024-08-21T17:51:35Z

@pytorchbot label "module:arm"

pytorch-bot · 2024-08-21T17:51:40Z

Didn't find following labels among repository labels: module:arm

nikhil-arm · 2024-08-21T17:52:12Z

@pytorchbot label "module: arm"

nikhil-arm · 2024-08-21T17:53:44Z

@pytorchbot label "ciflow/linux-aarch64"

pytorch-bot · 2024-08-21T17:53:52Z

Can't add following labels to PR: ciflow/linux-aarch64. Please ping one of the reviewers for help.

aten/src/ATen/Config.h.in

aten/src/ATen/native/kleidiai/kai_kernels.cpp

aten/src/ATen/Context.cpp

snadampal

thanks for the contribution. please find my comments inline.

CMakeLists.txt

aten/src/ATen/native/LinearAlgebra.cpp

aten/src/ATen/native/kleidiai/kai_kernels.cpp

aten/src/ATen/native/kleidiai/kai_pack.h

test/test_linalg.py

aten/src/ATen/Context.cpp

test/test_linalg.py

aten/src/ATen/native/kleidiai/kai_ukernel_interface.h

nikhil-arm · 2024-08-29T13:44:52Z

@malfet can you please help in merging this PR.

malfet · 2024-08-29T17:37:57Z

@malfet can you please help in merging this PR.

I believe pre-requisite for merging is passing build and test for specific target and this PR clearly fails aarch64 build right now, see https://github.com/pytorch/pytorch/actions/runs/10616020253/job/29425421107?pr=134124
Perhaps it's just a matter or rebase, but in general, I would strongly advice against merging a relatively large change if one could not get a clear signal from the platform it targets.

For comparison, here is result of the build/test for the recent trunk commit: https://github.com/pytorch/pytorch/actions/runs/10636148674/job/29487324878

test/test_kleidiai.py

nikhil-arm · 2024-09-02T13:30:47Z

Hello @malfet ,
We are in the process to refactor the PR after your valuable inputs.
We are planning to :

keep all the files , kernel interface and kernel implementation as it is in aten/src/ATen/native/kleidiai/*
Plug kleidiai int4 matmul kernel with _weight_int4pack_mm_cpu() and modify the signature of _weight_int4pack_mm_cpu() to fit our requirements
Plug kleidiai int4 weght pack kernel with _convert_weight_to_int4pack_cpu() and modify the signature of _convert_weight_to_int4pack_cpu() to fit our requirements
Register a new op in torchao for kai_pack_rhs_size() kernel and use it in torchao for quantization and packing step. The implementation will still be in pytroch but op will be registered in torchao
Add / Reuse Symmetric_quantization() in torchao . This will be used for functioning of _weight_int4pack_mm_cpu() and _convert_weight_to_int4pack_cpu() ops
Modify/Add the existing tests for _weight_int4pack_mm_cpu() to accomodate kleidiai kernel and use quantization scheme from torchao directly

Please let me know your thoughts on this and if this addresses all your concerns regarding PR.

huydhn · 2024-12-19T23:43:35Z

@pytorchbot rebase

pytorchmergebot · 2024-12-19T23:45:04Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2024-12-19T23:45:06Z

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch rebase refs/remotes/origin/viable/strict pull/134124/head returned non-zero exit code 1

Rebasing (1/4)
Auto-merging CMakeLists.txt
Auto-merging aten/src/ATen/CMakeLists.txt
Auto-merging aten/src/ATen/Context.cpp
Auto-merging aten/src/ATen/Context.h
Auto-merging aten/src/ATen/native/LinearAlgebra.cpp
CONFLICT (content): Merge conflict in aten/src/ATen/native/LinearAlgebra.cpp
Auto-merging aten/src/ATen/native/cpu/int4mm_kernel.cpp
CONFLICT (content): Merge conflict in aten/src/ATen/native/cpu/int4mm_kernel.cpp
Auto-merging aten/src/ATen/native/cpu/int_mm_kernel.h
CONFLICT (content): Merge conflict in aten/src/ATen/native/cpu/int_mm_kernel.h
Auto-merging aten/src/ATen/native/kleidiai/kai_kernels.cpp
CONFLICT (add/add): Merge conflict in aten/src/ATen/native/kleidiai/kai_kernels.cpp
Auto-merging aten/src/ATen/native/kleidiai/kai_pack.h
CONFLICT (add/add): Merge conflict in aten/src/ATen/native/kleidiai/kai_pack.h
Auto-merging aten/src/ATen/native/native_functions.yaml
CONFLICT (content): Merge conflict in aten/src/ATen/native/native_functions.yaml
Auto-merging cmake/Dependencies.cmake
Auto-merging setup.py
Auto-merging test/expect/HasDecompTest.test_has_decomposition.expect
Auto-merging test/test_linalg.py
CONFLICT (content): Merge conflict in test/test_linalg.py
Auto-merging torch/_C/__init__.pyi.in
CONFLICT (content): Merge conflict in torch/_C/__init__.pyi.in
Auto-merging torch/_dynamo/trace_rules.py
Auto-merging torch/_meta_registrations.py
CONFLICT (content): Merge conflict in torch/_meta_registrations.py
Auto-merging torch/backends/kleidiai/__init__.py
CONFLICT (add/add): Merge conflict in torch/backends/kleidiai/__init__.py
Auto-merging torch/csrc/Module.cpp
CONFLICT (content): Merge conflict in torch/csrc/Module.cpp
Auto-merging torch/csrc/inductor/aoti_torch/generated/c_shim_cpu.h
CONFLICT (content): Merge conflict in torch/csrc/inductor/aoti_torch/generated/c_shim_cpu.h
Auto-merging torch/overrides.py
Auto-merging torch/testing/_internal/common_quantization.py
error: could not apply 1c0ef38138d... [ARM][feat]: Add 4 bit dynamic quantization  matmuls & KleidiAI Backend
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Could not apply 1c0ef38138d... [ARM][feat]: Add 4 bit dynamic quantization  matmuls & KleidiAI Backend

Raised by https://github.com/pytorch/pytorch/actions/runs/12422668007

facebook-github-bot · 2024-12-20T00:01:13Z

@huydhn has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Description: 1. Quantize Linear Layer Weights to 4-bits: Quantize the weights of the Linear layer to 4 bits, using symmetric quantization. Pack two 4-bit weights into one uint8 container. Choose a quantization scheme (channel-wise or group-wise), with the group size being a multiple of 32. 2. Prepare Quantized Weights, Scales, and Optional Bias: After quantizing, obtain the quantized_weights, scales, and groupsize. If the original Linear layer has a bias, prepare it as well. 3. Pack the Weights Efficiently: Use torch.ops.aten._dyn_quant_pack_4bit_weight to optimally pack the weights, scales, and optional bias. packed_weights = torch.ops.aten._dyn_quant_pack_4bit_weight(weight, scales_and_zeros, bias, groupsize, in_features, out_features) Input parameters should include: in_features and out_features (the same as the Linear layer’s corresponding parameters). 4. Perform Dynamic Quantized Matrix Multiplication: Use torch.ops.aten._dyn_quant_matmul_4bit to perform matrix multiplication with quantized weights. output = torch.ops.aten._dyn_quant_matmul_4bit(input, packed_weights, scales_and_zeros, bias, groupsize, in_features, out_features) Inputs required include: The input tensor, packed_weights, scales, bias, groupsize, and the in_features and out_features. Model Perf : 7B Transformer model: Prefill : 340 t/s Decode : 40 t/s 2B Transformer model Prefill : 747 t/s Decode : 80 t/s Tests: python test/test_linalg.py -k test__dyn_quant_pack_4bit_weight Ran 1 test in 0.016s OK python test/test_linalg.py -k test__dyn_quant_matmul_4bit Ran 8 tests in 0.077s OK python test/test_linalg.py -k test_compile_dyn_quant_matmul_4bit Ran 8 tests in 11.454s Signed-off-by: Nikhil Gupta <[email protected]> Change-Id: I0a9c864c56a9d1b4e6179dc3059cd37b11525c8d

Description: 1. The scale, bias tensors are no longer needed in the dynamic quantized 4 bit matmul call. Signed-off-by: Nikhil Gupta <[email protected]> Change-Id: Ia34466eea5fefc5780d418a4321fa9b78c142799

…yn_quant_pack_4bit_weight Tests: python test/inductor/test_torchinductor.py -k test__dyn_quant_matmul_4bit Ran 1 test in 0.326s OK python test/inductor/test_torchinductor.py -k test__dyn_quant_pack_4bit_weight Ran 1 test in 5.664s OK Signed-off-by: Nikhil Gupta <[email protected]> Change-Id: I5ed5d8d761769c1af611f75d05e9fc6e9fc64cb4

Signed-off-by: Nikhil Gupta <[email protected]> Change-Id: I08dcdc5780831771a66325ac5e8b45e0805cf990

facebook-github-bot · 2024-12-20T00:23:35Z

@huydhn has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-12-20T19:18:17Z

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorchmergebot · 2024-12-20T19:20:13Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-12-20T19:20:31Z

Merge failed

Reason: 1 jobs have failed, first few of them are: Meta Internal-Only Changes Check

Details for Dev Infra team

Raised by workflow job

huydhn · 2024-12-20T19:30:22Z

@pytorchbot merge -f 'Diff has been landed internally'

pytorchmergebot · 2024-12-20T19:31:49Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Description: Allow int8_dynamic_activation_intx_weight to work with aten _dyn_quant_matmul_4bit op Needs : pytorch/pytorch#134124 or Pytorch > 2.6.0 Signed-off-by: Nikhil Gupta <[email protected]>

…I Backend (pytorch#134124)" This reverts commit 94737e8.

Mitigation for #145273 Reverting #134124 and #144074 Pull Request resolved: #145392 Approved by: https://github.com/ZainRizvi, https://github.com/malfet, https://github.com/atalman, https://github.com/digantdesai

#134124 was reverted by #145392 due to KleidiAI clone issue. 1. This reverts commit 0940eb6 (#145392 )and Fixes KleidiAI mirror issue. 2. KleidiAI is now cloned from github mirror instead of arm gitlab Change-Id: I7d6eee7214cd117d3057d615936fcc3ee6052fa2 Signed-off-by: Nikhil Gupta <[email protected]>

#145505) #134124 was reverted by #145392 due to KleidiAI clone issue. 1. This reverts commit 0940eb6 (#145392 )and Fixes KleidiAI mirror issue. 2. KleidiAI is now cloned from github mirror instead of arm gitlab Change-Id: I7d6eee7214cd117d3057d615936fcc3ee6052fa2 Fixes #145273 Pull Request resolved: #145505 Approved by: https://github.com/malfet

Description: Allow int8_dynamic_activation_intx_weight to work with aten _dyn_quant_matmul_4bit op Needs : pytorch/pytorch#134124 or Pytorch > 2.6.0 Signed-off-by: Nikhil Gupta <[email protected]>

nikhil-arm requested review from a team, IvanYashchuk, lezcano and nikitaved as code owners August 21, 2024 17:42

pytorch-bot bot added the release notes: linalg_frontend release notes category label Aug 21, 2024

pytorchbot added the open source label Aug 21, 2024

pytorch-bot bot added the module: arm Related to ARM architectures builds of PyTorch. Includes Apple M1 label Aug 21, 2024

malfet reviewed Aug 22, 2024

View reviewed changes

aten/src/ATen/Config.h.in Outdated Show resolved Hide resolved

aten/src/ATen/native/kleidiai/kai_kernels.cpp Outdated Show resolved Hide resolved

aten/src/ATen/Context.cpp Outdated Show resolved Hide resolved

malfet added the ciflow/linux-aarch64 linux aarch64 CI workflow label Aug 22, 2024

snadampal requested changes Aug 26, 2024

View reviewed changes

zou3519 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Aug 27, 2024

nikhil-arm force-pushed the kleidiai_integration_final branch 3 times, most recently from 53a9e89 to fd6f73b Compare August 29, 2024 13:26

malfet reviewed Aug 30, 2024

View reviewed changes

test/test_kleidiai.py Outdated Show resolved Hide resolved

nikhil-arm force-pushed the kleidiai_integration_final branch from fd6f73b to 9cf8231 Compare September 10, 2024 15:39

nikhil-arm requested a review from eqy as a code owner September 10, 2024 15:39

nikhil-arm added 4 commits December 20, 2024 00:10

[Refactor]: Remove scale, bias dependency from dyn_quant_matmul_4bit API

7d7d043

Description: 1. The scale, bias tensors are no longer needed in the dynamic quantized 4 bit matmul call. Signed-off-by: Nikhil Gupta <[email protected]> Change-Id: Ia34466eea5fefc5780d418a4321fa9b78c142799

[Fix]: Fix s390 CI failures and guard cpuinfo inclusions more tightly

c25abbf

Signed-off-by: Nikhil Gupta <[email protected]> Change-Id: I08dcdc5780831771a66325ac5e8b45e0805cf990

nikhil-arm force-pushed the kleidiai_integration_final branch from 6d178af to c25abbf Compare December 20, 2024 00:12

pytorchmergebot added the merging label Dec 20, 2024

pytorchmergebot removed the merging label Dec 20, 2024

pytorchmergebot added the merging label Dec 20, 2024

pytorchmergebot closed this in 94737e8 Dec 20, 2024

pytorchmergebot removed the merging label Dec 20, 2024

malfet mentioned this pull request Jan 21, 2025

cloning third_party/kleidiai fails #145273

Closed

albanD added a commit to albanD/pytorch that referenced this pull request Jan 22, 2025

Revert "[ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiA…

54678fc

…I Backend (pytorch#134124)" This reverts commit 94737e8.

albanD mentioned this pull request Jan 22, 2025

Reverting the PR adding Kleidiai-based int4 kernels #145392

Closed

nikhil-arm mentioned this pull request Jan 23, 2025

Revert "Reverting the PR adding Kleidiai-based int4 kernels (#145392)" #145505

Closed

[ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend #134124

[ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend #134124

Uh oh!

Conversation

nikhil-arm commented Aug 21, 2024 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/134124

✅ No Failures

Uh oh!

linux-foundation-easycla bot commented Aug 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Aug 21, 2024

Attention! native_functions.yaml was changed

Uh oh!

nikhil-arm commented Aug 21, 2024

Uh oh!

pytorch-bot bot commented Aug 21, 2024

Uh oh!

nikhil-arm commented Aug 21, 2024

Uh oh!

pytorch-bot bot commented Aug 21, 2024

Uh oh!

nikhil-arm commented Aug 21, 2024

Uh oh!

nikhil-arm commented Aug 21, 2024

Uh oh!

pytorch-bot bot commented Aug 21, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

snadampal left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nikhil-arm commented Aug 29, 2024

Uh oh!

malfet commented Aug 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

nikhil-arm commented Sep 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

huydhn commented Dec 19, 2024

Uh oh!

pytorchmergebot commented Dec 19, 2024

Uh oh!

pytorchmergebot commented Dec 19, 2024

Uh oh!

facebook-github-bot commented Dec 20, 2024

Uh oh!

facebook-github-bot commented Dec 20, 2024

Uh oh!

facebook-github-bot commented Dec 20, 2024

Uh oh!

pytorchmergebot commented Dec 20, 2024

Merge started

Uh oh!

pytorchmergebot commented Dec 20, 2024

Merge failed

Uh oh!

huydhn commented Dec 20, 2024

Uh oh!

pytorchmergebot commented Dec 20, 2024

Merge started

nikhil-arm commented Aug 21, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Aug 21, 2024 •

edited

Loading

linux-foundation-easycla bot commented Aug 21, 2024 •

edited

Loading

malfet commented Aug 29, 2024 •

edited

Loading

nikhil-arm commented Sep 2, 2024 •

edited

Loading