Skip to content

Conversation

@swolchok
Copy link
Contributor

@swolchok swolchok commented Oct 14, 2024

Stack from ghstack (oldest at bottom):

float16_t is ARM-specific. Half is not.

Differential Revision: D64218427

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 14, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137913

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ca2264e with merge base b9618c9 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Oct 14, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64218427

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 14, 2024
…signatures"

float16_t is ARM-specific. Half is not.

Differential Revision: [D64218427](https://our.internmc.facebook.com/intern/diff/D64218427/)

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64218427

…signatures"

float16_t is ARM-specific. Half is not.

Differential Revision: [D64218427](https://our.internmc.facebook.com/intern/diff/D64218427/)

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64218427

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this to make it accessible on x86? Otherwise float16_t feels like a reasonable type, isn't it?

…signatures"

float16_t is ARM-specific. Half is not.

Differential Revision: [D64218427](https://our.internmc.facebook.com/intern/diff/D64218427/)

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64218427

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64218427

…signatures"

float16_t is ARM-specific. Half is not.

Differential Revision: [D64218427](https://our.internmc.facebook.com/intern/diff/D64218427/)

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64218427

…signatures"

float16_t is ARM-specific. Half is not.

Differential Revision: [D64218427](https://our.internmc.facebook.com/intern/diff/D64218427/)

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64218427

pytorchmergebot pushed a commit that referenced this pull request Oct 29, 2024
…pu/ (#137914)

This is in preparation for supporting x86 as well; we need to
be in this directory so that we can get rebuilt with different
CPU_CAPABILITY settings (AVX2/AVX-512). Also incidentally starts
fulfilling request from @malfet to split the ARM64 fast path stuff
into its own file. BFloat16 will be in a later diff.

Differential Revision: [D64265755](https://our.internmc.facebook.com/intern/diff/D64265755/)

Pull Request resolved: #137914
Approved by: https://github.com/Skylion007, https://github.com/malfet
ghstack dependencies: #137661, #137911, #137912, #137913
pytorchmergebot pushed a commit that referenced this pull request Oct 29, 2024
…whole vector register instead of half (#137916)

The fixup loop doesn't really need to vectorize the last 7 elements, and not doing so will make migrating to x86 simpler.

Differential Revision: [D64280689](https://our.internmc.facebook.com/intern/diff/D64280689/)

Pull Request resolved: #137916
Approved by: https://github.com/malfet
ghstack dependencies: #137661, #137911, #137912, #137913, #137914, #137915
rahulsingh-intel pushed a commit to rahulsingh-intel/pytorch that referenced this pull request Nov 5, 2024
…pu/ (pytorch#137914)

This is in preparation for supporting x86 as well; we need to
be in this directory so that we can get rebuilt with different
CPU_CAPABILITY settings (AVX2/AVX-512). Also incidentally starts
fulfilling request from @malfet to split the ARM64 fast path stuff
into its own file. BFloat16 will be in a later diff.

Differential Revision: [D64265755](https://our.internmc.facebook.com/intern/diff/D64265755/)

Pull Request resolved: pytorch#137914
Approved by: https://github.com/Skylion007, https://github.com/malfet
ghstack dependencies: pytorch#137661, pytorch#137911, pytorch#137912, pytorch#137913
rahulsingh-intel pushed a commit to rahulsingh-intel/pytorch that referenced this pull request Nov 5, 2024
…whole vector register instead of half (pytorch#137916)

The fixup loop doesn't really need to vectorize the last 7 elements, and not doing so will make migrating to x86 simpler.

Differential Revision: [D64280689](https://our.internmc.facebook.com/intern/diff/D64280689/)

Pull Request resolved: pytorch#137916
Approved by: https://github.com/malfet
ghstack dependencies: pytorch#137661, pytorch#137911, pytorch#137912, pytorch#137913, pytorch#137914, pytorch#137915
@github-actions github-actions bot deleted the gh/swolchok/661/head branch November 29, 2024 02:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request fb-exported Merged module: cpu CPU specific problem (e.g., perf, algorithm) topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants