Enable a fast path for (static) qlinear for AArch64 through ACL directly. #148586

fadara01 · 2025-03-05T18:45:18Z

Stack from ghstack (oldest at bottom):

This enables a fast path for eager mode static quantization for AArch64 through Arm Compute Library (ACL) directly.

PR #145942 addressed the high overhead in qlinear_dynamic on AArch64 (due to redundant weight pretranspositions and reductions) by enabling a path that calls ACL directly.
This does the same thing but for (static) qlinear.

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

[ghstack-poisoned]

pytorch-bot · 2025-03-05T18:45:21Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/148586

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 7018fbb with merge base 6c3492b ():

NEW FAILURE - The following job has failed:

pull / linux-focal-py3_9-clang9-xla / test (xla, 1, 1, linux.12xlarge) (gh)
test_fp8_matmul1 (torch.float8_e4m3fn)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…tly. This enables a fast path for eager mode static quantization for AArch64 through Arm Compute Library (ACL) directly. PR #145942 addressed the high overhead in qlinear_dynamic on AArch64 (due to redundant weight pretranspositions and reductions) by enabling a path that calls ACL directly. This does the same thing but for (static) qlinear. ghstack-source-id: 05435a0 Pull Request resolved: #148586

This enables a fast path for eager mode static/dynamic quantization for AArch64 through Arm Compute Library (ACL) directly. Context: PR #126687, #139887 enabled an optimized implementation for qlinear[_dynamic] for AArch64 through ideep → oneDNN → ACL which improved performance by ~10x compared to the previous implementation. However, the current qlinear[_dynamic] path (ideep → oneDNN → ACL) suffers from high overhead due to the API friction between the stateless oneDNN API and the stateful ACL low-precision GEMM (lowp_gemm) API - for example, ACL's lowp_gemm objects cache information like weights reduction or weights in optimized memory format which oneDNN does not allow due to its stateless nature. Hence, ACL currently runs a (redundant) sum of columns and pre-transposition (to the gemm kerne's optimal format) for each GEMM operation. This PR addresses the sub-optimalities above by introducing PackedLinearWeightsACL (as a subclasses of PackedLinearWeightsOnednn ) with an implementation of qlinear[_dynamic] that uses ACL directly. ghstack-source-id: 05435a0 Pull Request resolved: #148585 ghstack-source-id: 05435a0 Pull Request resolved: #148586

fadara01 · 2025-03-06T09:01:13Z

Sorry, this got raised by mistake.

Update

7018fbb

[ghstack-poisoned]

fadara01 requested review from digantdesai, jerryzh168, jianyuh, kimishpatel and salilsdesai as code owners March 5, 2025 18:45

fadara01 mentioned this pull request Mar 5, 2025

Enable Direct Use of Arm Compute Library (ACL) in ATen #148584

Closed

fadara01 mentioned this pull request Mar 5, 2025

Enable fast qlinear static/dynamic path for AArch64 through ACL directly #148585

Closed

pytorch-bot bot added module: cpu CPU specific problem (e.g., perf, algorithm) release notes: quantization release notes category labels Mar 5, 2025

pytorchbot added the open source label Mar 5, 2025

fadara01 closed this Mar 6, 2025

github-actions bot deleted the gh/fadara01/6/head branch April 11, 2025 02:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable a fast path for (static) qlinear for AArch64 through ACL directly. #148586

Enable a fast path for (static) qlinear for AArch64 through ACL directly. #148586

Uh oh!

fadara01 commented Mar 5, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Mar 5, 2025 •

edited

Loading

Uh oh!

fadara01 commented Mar 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Enable a fast path for (static) qlinear for AArch64 through ACL directly. #148586

Enable a fast path for (static) qlinear for AArch64 through ACL directly. #148586

Uh oh!

Conversation

fadara01 commented Mar 5, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/148586

❌ 1 New Failure

Uh oh!

fadara01 commented Mar 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fadara01 commented Mar 5, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Mar 5, 2025 •

edited

Loading