Skip to content

Conversation

@swolchok
Copy link
Contributor

@swolchok swolchok commented Oct 10, 2024

Stack from ghstack (oldest at bottom):

The CPU_CAPABILITY system is for rebuilding kernels multiple times with different vector ISA targets. CPU_CAPABILITY_NEON was not being used for that, just as an extra flag for inductor. As a result, CPU_CAPABILITY_NEON-gated code was unnecessarily unavailable outside inductor. Fixes #137704

Differential Revision: D64197046

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

The CPU_CAPABILITY system is for rebuilding kernels multiple times with different vector ISA targets. CPU_CAPABILITY_NEON was not being used for that, just as an extra flag for inductor. As a result, CPU_CAPABILITY_NEON-gated code was unnecessarily unavailable outside inductor. Fixes #137704

Differential Revision: [D64197046](https://our.internmc.facebook.com/intern/diff/D64197046/)

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 10, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137722

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 1f40b24 with merge base 0786b37 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/inductor module: cpu CPU specific problem (e.g., perf, algorithm) module: inductor labels Oct 10, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64197046

@swolchok
Copy link
Contributor Author

swolchok commented Oct 10, 2024

looks like this needs to be identical to the gating surrounding vec256_float_neon

@swolchok
Copy link
Contributor Author

swolchok commented Oct 10, 2024

looks like this needs to be identical to the gating surrounding vec256_float_neon

specifically, 1) we just use __aarch64__ to gate that implementation, so we should follow suit here
2) we also need to check !defined(CPU_CAPABILITY_SVE256) because this apparently messes with SVE

…PABILITY_NEON"

The CPU_CAPABILITY system is for rebuilding kernels multiple times with different vector ISA targets. CPU_CAPABILITY_NEON was not being used for that, just as an extra flag for inductor. As a result, CPU_CAPABILITY_NEON-gated code was unnecessarily unavailable outside inductor. Fixes #137704

Differential Revision: [D64197046](https://our.internmc.facebook.com/intern/diff/D64197046/)

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64197046

@swolchok swolchok changed the title [PyTorch] Check if __ARM_NEON is defined instead of CPU_CAPABILITY_NEON [PyTorch] Check defined(__aarch64__) && !defined(CPU_CAPABILITY_SVE256) instead of defined(CPU_CAPABILITY_NEON) Oct 10, 2024
@swolchok
Copy link
Contributor Author

hooray for green tests

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 11, 2024
@abhishek-iitmadras
Copy link
Collaborator

Hi @swolchok
Wouldn't it be better to define CPU_CAPABILITY_SVE instead of just CPU_CAPABILITY_SVE256?

@swolchok
Copy link
Contributor Author

Hi @swolchok Wouldn't it be better to define CPU_CAPABILITY_SVE instead of just CPU_CAPABILITY_SVE256?

it is CPU_CAPABILITY_SVE256 because that is the gating surrounding vec256_float_neon.h: https://www.internalfb.com/code/fbsource/[0660504a24502363bb08b1b37504bc3ef6d878eb]/fbcode/caffe2/aten/src/ATen/cpu/vec/vec256/vec256.h?lines=10-15

…BILITY_SVE256)` instead of `defined(CPU_CAPABILITY_NEON)`"

The CPU_CAPABILITY system is for rebuilding kernels multiple times with different vector ISA targets. CPU_CAPABILITY_NEON was not being used for that, just as an extra flag for inductor. As a result, CPU_CAPABILITY_NEON-gated code was unnecessarily unavailable outside inductor. Fixes #137704

Differential Revision: [D64197046](https://our.internmc.facebook.com/intern/diff/D64197046/)

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64197046

…BILITY_SVE256)` instead of `defined(CPU_CAPABILITY_NEON)`"

The CPU_CAPABILITY system is for rebuilding kernels multiple times with different vector ISA targets. CPU_CAPABILITY_NEON was not being used for that, just as an extra flag for inductor. As a result, CPU_CAPABILITY_NEON-gated code was unnecessarily unavailable outside inductor. Fixes #137704

Differential Revision: [D64197046](https://our.internmc.facebook.com/intern/diff/D64197046/)

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64197046

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This strips the ability of generating non-NEON accelerated code in torch.comple, which feels like a regression...
I.e. please undo changes to cpp_prefix.h/cpu_vec_isa.py

…h64__) && !defined(CPU_CAPABILITY_SVE256)` instead of `defined(CPU_CAPABILITY_NEON)`"

The CPU_CAPABILITY system is for rebuilding kernels multiple times with different vector ISA targets. CPU_CAPABILITY_NEON was not being used for that, just as an extra flag for inductor. As a result, CPU_CAPABILITY_NEON-gated code was unnecessarily unavailable outside inductor. Fixes #137704

Differential Revision: [D64197046](https://our.internmc.facebook.com/intern/diff/D64197046/)

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64197046

swolchok added a commit that referenced this pull request Oct 15, 2024
…56)` instead of `defined(CPU_CAPABILITY_NEON)`

Pull Request resolved: #137722

The CPU_CAPABILITY system is for rebuilding kernels multiple times with different vector ISA targets. CPU_CAPABILITY_NEON was not being used for that, just as an extra flag for inductor. As a result, CPU_CAPABILITY_NEON-gated code was unnecessarily unavailable outside inductor. Fixes #137704
ghstack-source-id: 248161061
@exported-using-ghexport

Differential Revision: [D64197046](https://our.internmc.facebook.com/intern/diff/D64197046/)
Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thank you for the update

KnAwnime pushed a commit to KnAwnime/Biblioteka that referenced this pull request Oct 16, 2024
…56)` instead of `defined(CPU_CAPABILITY_NEON)`

Pull Request resolved: pytorch/pytorch#137722

The CPU_CAPABILITY system is for rebuilding kernels multiple times with different vector ISA targets. CPU_CAPABILITY_NEON was not being used for that, just as an extra flag for inductor. As a result, CPU_CAPABILITY_NEON-gated code was unnecessarily unavailable outside inductor. Fixes #137704
ghstack-source-id: 247383134
@exported-using-ghexport

Differential Revision: [D64197046](https://our.internmc.facebook.com/intern/diff/D64197046/)
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64197046

pytorchmergebot pushed a commit that referenced this pull request Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request fb-exported Merged module: cpu CPU specific problem (e.g., perf, algorithm) module: inductor topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants