Abstract common helpers from CUDA implementation for code reusing in upcoming SYCL kernels #117234

fengyuan14 · 2024-01-11T08:20:23Z

Stack from ghstack (oldest at bottom):

e.g. OffsetCalculator, IntegerDivider, MemoryAccess utilities, ElementwiseInvoke.
feature: #114835

Abstracted helpers are about integer arithmetic and C++ standard template helpers for load, store and custom functor invoke.
They are general for any kind of kernel language, which supports basic interger arithmetic and C++ standard template, except for,
when we have some backend specific intrinsic, like CUDA __umulhi. Using backend specific macro to isolate divergency.

They are irrelevant to concurrency algorithm of kernel, backend specific data type (e.g. CUDA half2)
and backend specific logic concurrent resource configuration (e.g. CUDA block size, thread work size).

Signed-off-by: Feng Yuan [email protected]

pytorch-bot · 2024-01-11T08:20:27Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/117234

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit d0f7eaa with merge base b830751 ():

NEW FAILURE - The following job has failed:

Lint / lintrunner / linux-job (gh)
>>> Lint for test/test_xpu.py:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…upcoming SYCL kernels e.g. OffsetCalculator, IntegerDivider, MemoryAccess utilities, ElementwiseInvoke. feature: #114835 Abstracted helpers are about integer arithmetic and C++ standard template helpers for load, store and custom functor invoke. They are general for any kind of kernel language, which supports basic interger arithmetic and C++ standard template, except for, when we have some backend specific intrinsic, like CUDA __umulhi. Using backend specific macro to isolate divergency. They are irrelevant to concurrency algorithm of kernel, backend specific data type (e.g. CUDA half2) and backend specific logic concurrent resource configuration (e.g. CUDA block size, thread work size). Signed-off-by: Feng Yuan <feng1.yuanintel.com> ghstack-source-id: 0db78a6 Pull Request resolved: #117234

…upcoming SYCL kernels e.g. OffsetCalculator, IntegerDivider, MemoryAccess utilities, ElementwiseInvoke. feature: #114835 Abstracted helpers are about integer arithmetic and C++ standard template helpers for load, store and custom functor invoke. They are general for any kind of kernel language, which supports basic interger arithmetic and C++ standard template, except for, when we have some backend specific intrinsic, like CUDA __umulhi. Using backend specific macro to isolate divergency. They are irrelevant to concurrency algorithm of kernel, backend specific data type (e.g. CUDA half2) and backend specific logic concurrent resource configuration (e.g. CUDA block size, thread work size). Signed-off-by: Feng Yuan <[email protected]> [ghstack-poisoned]

…upcoming SYCL kernels e.g. OffsetCalculator, IntegerDivider, MemoryAccess utilities, ElementwiseInvoke. feature: #114835 Abstracted helpers are about integer arithmetic and C++ standard template helpers for load, store and custom functor invoke. They are general for any kind of kernel language, which supports basic interger arithmetic and C++ standard template, except for, when we have some backend specific intrinsic, like CUDA __umulhi. Using backend specific macro to isolate divergency. They are irrelevant to concurrency algorithm of kernel, backend specific data type (e.g. CUDA half2) and backend specific logic concurrent resource configuration (e.g. CUDA block size, thread work size). Signed-off-by: Feng Yuan <feng1.yuanintel.com> ghstack-source-id: 9cafc7c Pull Request resolved: #117234

…reusing in upcoming SYCL kernels" e.g. OffsetCalculator, IntegerDivider, MemoryAccess utilities, ElementwiseInvoke. feature: #114835 Abstracted helpers are about integer arithmetic and C++ standard template helpers for load, store and custom functor invoke. They are general for any kind of kernel language, which supports basic interger arithmetic and C++ standard template, except for, when we have some backend specific intrinsic, like CUDA __umulhi. Using backend specific macro to isolate divergency. They are irrelevant to concurrency algorithm of kernel, backend specific data type (e.g. CUDA half2) and backend specific logic concurrent resource configuration (e.g. CUDA block size, thread work size). Signed-off-by: Feng Yuan <feng1.yuanintel.com> [ghstack-poisoned]

…upcoming SYCL kernels e.g. OffsetCalculator, IntegerDivider, MemoryAccess utilities, ElementwiseInvoke. feature: #114835 Abstracted helpers are about integer arithmetic and C++ standard template helpers for load, store and custom functor invoke. They are general for any kind of kernel language, which supports basic interger arithmetic and C++ standard template, except for, when we have some backend specific intrinsic, like CUDA __umulhi. Using backend specific macro to isolate divergency. They are irrelevant to concurrency algorithm of kernel, backend specific data type (e.g. CUDA half2) and backend specific logic concurrent resource configuration (e.g. CUDA block size, thread work size). Signed-off-by: Feng Yuan <feng1.yuanintel.com> ghstack-source-id: 9634e48 Pull Request resolved: #117234

…reusing in upcoming SYCL kernels" e.g. OffsetCalculator, IntegerDivider, MemoryAccess utilities, ElementwiseInvoke. feature: #114835 Abstracted helpers are about integer arithmetic and C++ standard template helpers for load, store and custom functor invoke. They are general for any kind of kernel language, which supports basic interger arithmetic and C++ standard template, except for, when we have some backend specific intrinsic, like CUDA __umulhi. Using backend specific macro to isolate divergency. They are irrelevant to concurrency algorithm of kernel, backend specific data type (e.g. CUDA half2) and backend specific logic concurrent resource configuration (e.g. CUDA block size, thread work size). Signed-off-by: Feng Yuan <feng1.yuanintel.com> [ghstack-poisoned]

gujinghui · 2024-01-31T03:37:08Z

@pytorchbot rebase

pytorchmergebot · 2024-01-31T03:39:16Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

…reusing in upcoming SYCL kernels" e.g. OffsetCalculator, IntegerDivider, MemoryAccess utilities, ElementwiseInvoke. feature: #114835 Abstracted helpers are about integer arithmetic and C++ standard template helpers for load, store and custom functor invoke. They are general for any kind of kernel language, which supports basic interger arithmetic and C++ standard template, except for, when we have some backend specific intrinsic, like CUDA __umulhi. Using backend specific macro to isolate divergency. They are irrelevant to concurrency algorithm of kernel, backend specific data type (e.g. CUDA half2) and backend specific logic concurrent resource configuration (e.g. CUDA block size, thread work size). Signed-off-by: Feng Yuan <feng1.yuanintel.com> [ghstack-poisoned]

pytorchmergebot · 2024-01-31T03:39:34Z

Successfully rebased gh/arthuryuan1987/2/orig onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/117234)

…upcoming SYCL kernels e.g. OffsetCalculator, IntegerDivider, MemoryAccess utilities, ElementwiseInvoke. feature: #114835 Abstracted helpers are about integer arithmetic and C++ standard template helpers for load, store and custom functor invoke. They are general for any kind of kernel language, which supports basic interger arithmetic and C++ standard template, except for, when we have some backend specific intrinsic, like CUDA __umulhi. Using backend specific macro to isolate divergency. They are irrelevant to concurrency algorithm of kernel, backend specific data type (e.g. CUDA half2) and backend specific logic concurrent resource configuration (e.g. CUDA block size, thread work size). Signed-off-by: Feng Yuan <feng1.yuanintel.com> ghstack-source-id: e619ad4 Pull Request resolved: #117234

…upcoming SYCL kernels e.g. OffsetCalculator, IntegerDivider, MemoryAccess utilities, ElementwiseInvoke. feature: #114835 Abstracted helpers are about integer arithmetic and C++ standard template helpers for load, store and custom functor invoke. They are general for any kind of kernel language, which supports basic interger arithmetic and C++ standard template, except for, when we have some backend specific intrinsic, like CUDA __umulhi. Using backend specific macro to isolate divergency. They are irrelevant to concurrency algorithm of kernel, backend specific data type (e.g. CUDA half2) and backend specific logic concurrent resource configuration (e.g. CUDA block size, thread work size). Signed-off-by: Feng Yuan <feng1.yuanintel.com> ghstack-source-id: 8943663 Pull Request resolved: #117234

…reusing in upcoming SYCL kernels" e.g. OffsetCalculator, IntegerDivider, MemoryAccess utilities, ElementwiseInvoke. feature: #114835 Abstracted helpers are about integer arithmetic and C++ standard template helpers for load, store and custom functor invoke. They are general for any kind of kernel language, which supports basic interger arithmetic and C++ standard template, except for, when we have some backend specific intrinsic, like CUDA __umulhi. Using backend specific macro to isolate divergency. They are irrelevant to concurrency algorithm of kernel, backend specific data type (e.g. CUDA half2) and backend specific logic concurrent resource configuration (e.g. CUDA block size, thread work size). Signed-off-by: Feng Yuan <feng1.yuanintel.com> [ghstack-poisoned]

github-actions · 2024-04-05T06:34:12Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

pytorch-bot bot added the release notes: sparse release notes category label Jan 11, 2024

fengyuan14 mentioned this pull request Jan 11, 2024

Abstract common helpers from CUDA implementation for code reusing in … #117235

Closed

pytorchbot added the open source label Jan 11, 2024

fengyuan14 requested review from EikanWang, gujinghui and jgong5 January 12, 2024 02:42

jgong5 approved these changes Jan 12, 2024

View reviewed changes

This was referenced Jan 23, 2024

SYCL building system #117970

Closed

[WIP] Implement empty-tensor operators for XPU backend #118073

Closed

fengyuan14 added 4 commits January 23, 2024 21:54

fengyuan14 added 2 commits January 24, 2024 22:04

gujinghui approved these changes Jan 31, 2024

View reviewed changes

fengyuan14 mentioned this pull request Feb 5, 2024

Implement SYCL Loops kernel #119157

Closed

fengyuan14 mentioned this pull request Mar 12, 2024

Introduce XPU implementation for PyTorch ATen operators #120891

Closed

github-actions bot added the Stale label Apr 5, 2024

github-actions bot closed this May 5, 2024

github-actions bot deleted the gh/arthuryuan1987/2/head branch June 5, 2024 01:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Abstract common helpers from CUDA implementation for code reusing in upcoming SYCL kernels #117234

Abstract common helpers from CUDA implementation for code reusing in upcoming SYCL kernels #117234

Uh oh!

fengyuan14 commented Jan 11, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jan 11, 2024 •

edited

Loading

Uh oh!

gujinghui commented Jan 31, 2024

Uh oh!

pytorchmergebot commented Jan 31, 2024

Uh oh!

pytorchmergebot commented Jan 31, 2024

Uh oh!

github-actions bot commented Apr 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Abstract common helpers from CUDA implementation for code reusing in upcoming SYCL kernels #117234

Abstract common helpers from CUDA implementation for code reusing in upcoming SYCL kernels #117234

Uh oh!

Conversation

fengyuan14 commented Jan 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jan 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/117234

❌ 1 New Failure

Uh oh!

gujinghui commented Jan 31, 2024

Uh oh!

pytorchmergebot commented Jan 31, 2024

Uh oh!

pytorchmergebot commented Jan 31, 2024

Uh oh!

github-actions bot commented Apr 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

fengyuan14 commented Jan 11, 2024 •

edited

Loading

pytorch-bot bot commented Jan 11, 2024 •

edited

Loading