Skip to content

Conversation

@swolchok
Copy link
Contributor

@swolchok swolchok commented Oct 9, 2024

Stack from ghstack (oldest at bottom):

NEON vectors are 128-bit and don't belong with 256 stuff.

Differential Revision: D64143615

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

…vec128_convert

NEON vectors are 128-bit and don't belong with 256 stuff.

Differential Revision: [D64143615](https://our.internmc.facebook.com/intern/diff/D64143615/)

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 9, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137661

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 59a2c71 with merge base b9618c9 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Oct 9, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64143615

swolchok added a commit that referenced this pull request Oct 9, 2024
…vec128_convert

NEON vectors are 128-bit and don't belong with 256 stuff.

Differential Revision: [D64143615](https://our.internmc.facebook.com/intern/diff/D64143615/)

ghstack-source-id: 247178186
Pull Request resolved: #137661
…convert to vec128_convert"

NEON vectors are 128-bit and don't belong with 256 stuff.

Differential Revision: [D64143615](https://our.internmc.facebook.com/intern/diff/D64143615/)

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64143615

…convert to vec128_convert"

NEON vectors are 128-bit and don't belong with 256 stuff.

Differential Revision: [D64143615](https://our.internmc.facebook.com/intern/diff/D64143615/)

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64143615

…convert to vec128_convert"

NEON vectors are 128-bit and don't belong with 256 stuff.

Differential Revision: [D64143615](https://our.internmc.facebook.com/intern/diff/D64143615/)

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64143615

…vec256_convert to vec128_convert"

NEON vectors are 128-bit and don't belong with 256 stuff.

Differential Revision: [D64143615](https://our.internmc.facebook.com/intern/diff/D64143615/)

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64143615

swolchok added a commit that referenced this pull request Oct 10, 2024
…vec128_convert

Pull Request resolved: #137661

NEON vectors are 128-bit and don't belong with 256 stuff.
ghstack-source-id: 247393002
@exported-using-ghexport

Differential Revision: [D64143615](https://our.internmc.facebook.com/intern/diff/D64143615/)
@swolchok swolchok added the topic: not user facing topic category label Oct 11, 2024
@swolchok swolchok requested review from jgong5, kimishpatel and malfet and removed request for kimishpatel and malfet October 11, 2024 01:25
@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 11, 2024
…convert to vec128_convert"

NEON vectors are 128-bit and don't belong with 256 stuff.

Differential Revision: [D64143615](https://our.internmc.facebook.com/intern/diff/D64143615/)

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64143615

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64143615

…convert to vec128_convert"

NEON vectors are 128-bit and don't belong with 256 stuff.

Differential Revision: [D64143615](https://our.internmc.facebook.com/intern/diff/D64143615/)

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64143615

pytorchmergebot pushed a commit that referenced this pull request Oct 29, 2024
…Vectorized (#137912)

Migrated as much as possible and convenient; focusing on fp16
for now. (This is building toward enabling these fast paths on x86 for
machines without AVX-512fp16/bf16 to fix
pytorch/torchchat#1253 .)

Differential Revision: [D64218206](https://our.internmc.facebook.com/intern/diff/D64218206/)

Pull Request resolved: #137912
Approved by: https://github.com/malfet
ghstack dependencies: #137661, #137911
pytorchmergebot pushed a commit that referenced this pull request Oct 29, 2024
…pu/ (#137914)

This is in preparation for supporting x86 as well; we need to
be in this directory so that we can get rebuilt with different
CPU_CAPABILITY settings (AVX2/AVX-512). Also incidentally starts
fulfilling request from @malfet to split the ARM64 fast path stuff
into its own file. BFloat16 will be in a later diff.

Differential Revision: [D64265755](https://our.internmc.facebook.com/intern/diff/D64265755/)

Pull Request resolved: #137914
Approved by: https://github.com/Skylion007, https://github.com/malfet
ghstack dependencies: #137661, #137911, #137912, #137913
pytorchmergebot pushed a commit that referenced this pull request Oct 29, 2024
…whole vector register instead of half (#137916)

The fixup loop doesn't really need to vectorize the last 7 elements, and not doing so will make migrating to x86 simpler.

Differential Revision: [D64280689](https://our.internmc.facebook.com/intern/diff/D64280689/)

Pull Request resolved: #137916
Approved by: https://github.com/malfet
ghstack dependencies: #137661, #137911, #137912, #137913, #137914, #137915
rahulsingh-intel pushed a commit to rahulsingh-intel/pytorch that referenced this pull request Nov 5, 2024
rahulsingh-intel pushed a commit to rahulsingh-intel/pytorch that referenced this pull request Nov 5, 2024
…Vectorized (pytorch#137912)

Migrated as much as possible and convenient; focusing on fp16
for now. (This is building toward enabling these fast paths on x86 for
machines without AVX-512fp16/bf16 to fix
pytorch/torchchat#1253 .)

Differential Revision: [D64218206](https://our.internmc.facebook.com/intern/diff/D64218206/)

Pull Request resolved: pytorch#137912
Approved by: https://github.com/malfet
ghstack dependencies: pytorch#137661, pytorch#137911
rahulsingh-intel pushed a commit to rahulsingh-intel/pytorch that referenced this pull request Nov 5, 2024
…pu/ (pytorch#137914)

This is in preparation for supporting x86 as well; we need to
be in this directory so that we can get rebuilt with different
CPU_CAPABILITY settings (AVX2/AVX-512). Also incidentally starts
fulfilling request from @malfet to split the ARM64 fast path stuff
into its own file. BFloat16 will be in a later diff.

Differential Revision: [D64265755](https://our.internmc.facebook.com/intern/diff/D64265755/)

Pull Request resolved: pytorch#137914
Approved by: https://github.com/Skylion007, https://github.com/malfet
ghstack dependencies: pytorch#137661, pytorch#137911, pytorch#137912, pytorch#137913
rahulsingh-intel pushed a commit to rahulsingh-intel/pytorch that referenced this pull request Nov 5, 2024
…whole vector register instead of half (pytorch#137916)

The fixup loop doesn't really need to vectorize the last 7 elements, and not doing so will make migrating to x86 simpler.

Differential Revision: [D64280689](https://our.internmc.facebook.com/intern/diff/D64280689/)

Pull Request resolved: pytorch#137916
Approved by: https://github.com/malfet
ghstack dependencies: pytorch#137661, pytorch#137911, pytorch#137912, pytorch#137913, pytorch#137914, pytorch#137915
@github-actions github-actions bot deleted the gh/swolchok/651/head branch November 29, 2024 02:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request fb-exported Merged module: cpu CPU specific problem (e.g., perf, algorithm) topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants