Skip to content

Add support for NEON ISA in the Inductor C++ backend #104729

@Rohanjames1997

Description

@Rohanjames1997

🚀 The feature, motivation and pitch

Context: The TorchInductor C++ backend currently supports vectorization in C++ Codegen through two Intel ISAs: AVX2 and AVX512, as mentioned in the Update 5 Blog. While the Aten library does support Arm as well, we are yet to leverage its NEON/SVE ISAs to generate optimized kernels. The blog also mentions that the VecISA class can be subclassed in order to support other ISAs.

Proposal: I am working on providing NEON ISA support for the TorchInductor's C++ backend. Particularly, I intend to provide a NEON implementation of the vec_reduce_all() function, which currently has optimized AVX2 and AVX512 intrinsics implementations for x86 processors introduced by @mingfeima in #73953, as well as a slow path implementation for other processors including Arm. I have implemented a NEON version for the function, wired up the Inductor's generated C++ to invoke this NEON path on Arm CPUs & I've seen performance improvements, particularly in the Softmax operation.

Posting this here for any discussion before raising a PR.

Alternatives

No response

Additional context

No response

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler @Xia-Weiwen @ngimel

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureA request for a proper, new feature.module: inductortriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions