-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Abstract common helpers from CUDA implementation for code reusing in upcoming SYCL kernels #117234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/117234
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit d0f7eaa with merge base b830751 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
…upcoming SYCL kernels e.g. OffsetCalculator, IntegerDivider, MemoryAccess utilities, ElementwiseInvoke. feature: #114835 Abstracted helpers are about integer arithmetic and C++ standard template helpers for load, store and custom functor invoke. They are general for any kind of kernel language, which supports basic interger arithmetic and C++ standard template, except for, when we have some backend specific intrinsic, like CUDA __umulhi. Using backend specific macro to isolate divergency. They are irrelevant to concurrency algorithm of kernel, backend specific data type (e.g. CUDA half2) and backend specific logic concurrent resource configuration (e.g. CUDA block size, thread work size). Signed-off-by: Feng Yuan <feng1.yuanintel.com> ghstack-source-id: 0db78a6 Pull Request resolved: #117234
…upcoming SYCL kernels e.g. OffsetCalculator, IntegerDivider, MemoryAccess utilities, ElementwiseInvoke. feature: #114835 Abstracted helpers are about integer arithmetic and C++ standard template helpers for load, store and custom functor invoke. They are general for any kind of kernel language, which supports basic interger arithmetic and C++ standard template, except for, when we have some backend specific intrinsic, like CUDA __umulhi. Using backend specific macro to isolate divergency. They are irrelevant to concurrency algorithm of kernel, backend specific data type (e.g. CUDA half2) and backend specific logic concurrent resource configuration (e.g. CUDA block size, thread work size). Signed-off-by: Feng Yuan <[email protected]> [ghstack-poisoned]
…upcoming SYCL kernels e.g. OffsetCalculator, IntegerDivider, MemoryAccess utilities, ElementwiseInvoke. feature: #114835 Abstracted helpers are about integer arithmetic and C++ standard template helpers for load, store and custom functor invoke. They are general for any kind of kernel language, which supports basic interger arithmetic and C++ standard template, except for, when we have some backend specific intrinsic, like CUDA __umulhi. Using backend specific macro to isolate divergency. They are irrelevant to concurrency algorithm of kernel, backend specific data type (e.g. CUDA half2) and backend specific logic concurrent resource configuration (e.g. CUDA block size, thread work size). Signed-off-by: Feng Yuan <feng1.yuanintel.com> ghstack-source-id: 9cafc7c Pull Request resolved: #117234
…reusing in upcoming SYCL kernels" e.g. OffsetCalculator, IntegerDivider, MemoryAccess utilities, ElementwiseInvoke. feature: #114835 Abstracted helpers are about integer arithmetic and C++ standard template helpers for load, store and custom functor invoke. They are general for any kind of kernel language, which supports basic interger arithmetic and C++ standard template, except for, when we have some backend specific intrinsic, like CUDA __umulhi. Using backend specific macro to isolate divergency. They are irrelevant to concurrency algorithm of kernel, backend specific data type (e.g. CUDA half2) and backend specific logic concurrent resource configuration (e.g. CUDA block size, thread work size). Signed-off-by: Feng Yuan <feng1.yuanintel.com> [ghstack-poisoned]
…reusing in upcoming SYCL kernels" e.g. OffsetCalculator, IntegerDivider, MemoryAccess utilities, ElementwiseInvoke. feature: #114835 Abstracted helpers are about integer arithmetic and C++ standard template helpers for load, store and custom functor invoke. They are general for any kind of kernel language, which supports basic interger arithmetic and C++ standard template, except for, when we have some backend specific intrinsic, like CUDA __umulhi. Using backend specific macro to isolate divergency. They are irrelevant to concurrency algorithm of kernel, backend specific data type (e.g. CUDA half2) and backend specific logic concurrent resource configuration (e.g. CUDA block size, thread work size). Signed-off-by: Feng Yuan <feng1.yuanintel.com> [ghstack-poisoned]
…reusing in upcoming SYCL kernels" e.g. OffsetCalculator, IntegerDivider, MemoryAccess utilities, ElementwiseInvoke. feature: #114835 Abstracted helpers are about integer arithmetic and C++ standard template helpers for load, store and custom functor invoke. They are general for any kind of kernel language, which supports basic interger arithmetic and C++ standard template, except for, when we have some backend specific intrinsic, like CUDA __umulhi. Using backend specific macro to isolate divergency. They are irrelevant to concurrency algorithm of kernel, backend specific data type (e.g. CUDA half2) and backend specific logic concurrent resource configuration (e.g. CUDA block size, thread work size). Signed-off-by: Feng Yuan <feng1.yuanintel.com> [ghstack-poisoned]
…reusing in upcoming SYCL kernels" e.g. OffsetCalculator, IntegerDivider, MemoryAccess utilities, ElementwiseInvoke. feature: #114835 Abstracted helpers are about integer arithmetic and C++ standard template helpers for load, store and custom functor invoke. They are general for any kind of kernel language, which supports basic interger arithmetic and C++ standard template, except for, when we have some backend specific intrinsic, like CUDA __umulhi. Using backend specific macro to isolate divergency. They are irrelevant to concurrency algorithm of kernel, backend specific data type (e.g. CUDA half2) and backend specific logic concurrent resource configuration (e.g. CUDA block size, thread work size). Signed-off-by: Feng Yuan <feng1.yuanintel.com> [ghstack-poisoned]
…reusing in upcoming SYCL kernels" e.g. OffsetCalculator, IntegerDivider, MemoryAccess utilities, ElementwiseInvoke. feature: #114835 Abstracted helpers are about integer arithmetic and C++ standard template helpers for load, store and custom functor invoke. They are general for any kind of kernel language, which supports basic interger arithmetic and C++ standard template, except for, when we have some backend specific intrinsic, like CUDA __umulhi. Using backend specific macro to isolate divergency. They are irrelevant to concurrency algorithm of kernel, backend specific data type (e.g. CUDA half2) and backend specific logic concurrent resource configuration (e.g. CUDA block size, thread work size). Signed-off-by: Feng Yuan <feng1.yuanintel.com> [ghstack-poisoned]
…upcoming SYCL kernels e.g. OffsetCalculator, IntegerDivider, MemoryAccess utilities, ElementwiseInvoke. feature: #114835 Abstracted helpers are about integer arithmetic and C++ standard template helpers for load, store and custom functor invoke. They are general for any kind of kernel language, which supports basic interger arithmetic and C++ standard template, except for, when we have some backend specific intrinsic, like CUDA __umulhi. Using backend specific macro to isolate divergency. They are irrelevant to concurrency algorithm of kernel, backend specific data type (e.g. CUDA half2) and backend specific logic concurrent resource configuration (e.g. CUDA block size, thread work size). Signed-off-by: Feng Yuan <feng1.yuanintel.com> ghstack-source-id: 9634e48 Pull Request resolved: #117234
…reusing in upcoming SYCL kernels" e.g. OffsetCalculator, IntegerDivider, MemoryAccess utilities, ElementwiseInvoke. feature: #114835 Abstracted helpers are about integer arithmetic and C++ standard template helpers for load, store and custom functor invoke. They are general for any kind of kernel language, which supports basic interger arithmetic and C++ standard template, except for, when we have some backend specific intrinsic, like CUDA __umulhi. Using backend specific macro to isolate divergency. They are irrelevant to concurrency algorithm of kernel, backend specific data type (e.g. CUDA half2) and backend specific logic concurrent resource configuration (e.g. CUDA block size, thread work size). Signed-off-by: Feng Yuan <feng1.yuanintel.com> [ghstack-poisoned]
…reusing in upcoming SYCL kernels" e.g. OffsetCalculator, IntegerDivider, MemoryAccess utilities, ElementwiseInvoke. feature: #114835 Abstracted helpers are about integer arithmetic and C++ standard template helpers for load, store and custom functor invoke. They are general for any kind of kernel language, which supports basic interger arithmetic and C++ standard template, except for, when we have some backend specific intrinsic, like CUDA __umulhi. Using backend specific macro to isolate divergency. They are irrelevant to concurrency algorithm of kernel, backend specific data type (e.g. CUDA half2) and backend specific logic concurrent resource configuration (e.g. CUDA block size, thread work size). Signed-off-by: Feng Yuan <feng1.yuanintel.com> [ghstack-poisoned]
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
…reusing in upcoming SYCL kernels" e.g. OffsetCalculator, IntegerDivider, MemoryAccess utilities, ElementwiseInvoke. feature: #114835 Abstracted helpers are about integer arithmetic and C++ standard template helpers for load, store and custom functor invoke. They are general for any kind of kernel language, which supports basic interger arithmetic and C++ standard template, except for, when we have some backend specific intrinsic, like CUDA __umulhi. Using backend specific macro to isolate divergency. They are irrelevant to concurrency algorithm of kernel, backend specific data type (e.g. CUDA half2) and backend specific logic concurrent resource configuration (e.g. CUDA block size, thread work size). Signed-off-by: Feng Yuan <feng1.yuanintel.com> [ghstack-poisoned]
|
Successfully rebased |
…upcoming SYCL kernels e.g. OffsetCalculator, IntegerDivider, MemoryAccess utilities, ElementwiseInvoke. feature: #114835 Abstracted helpers are about integer arithmetic and C++ standard template helpers for load, store and custom functor invoke. They are general for any kind of kernel language, which supports basic interger arithmetic and C++ standard template, except for, when we have some backend specific intrinsic, like CUDA __umulhi. Using backend specific macro to isolate divergency. They are irrelevant to concurrency algorithm of kernel, backend specific data type (e.g. CUDA half2) and backend specific logic concurrent resource configuration (e.g. CUDA block size, thread work size). Signed-off-by: Feng Yuan <feng1.yuanintel.com> ghstack-source-id: e619ad4 Pull Request resolved: #117234
…upcoming SYCL kernels e.g. OffsetCalculator, IntegerDivider, MemoryAccess utilities, ElementwiseInvoke. feature: #114835 Abstracted helpers are about integer arithmetic and C++ standard template helpers for load, store and custom functor invoke. They are general for any kind of kernel language, which supports basic interger arithmetic and C++ standard template, except for, when we have some backend specific intrinsic, like CUDA __umulhi. Using backend specific macro to isolate divergency. They are irrelevant to concurrency algorithm of kernel, backend specific data type (e.g. CUDA half2) and backend specific logic concurrent resource configuration (e.g. CUDA block size, thread work size). Signed-off-by: Feng Yuan <feng1.yuanintel.com> ghstack-source-id: 8943663 Pull Request resolved: #117234
…reusing in upcoming SYCL kernels" e.g. OffsetCalculator, IntegerDivider, MemoryAccess utilities, ElementwiseInvoke. feature: #114835 Abstracted helpers are about integer arithmetic and C++ standard template helpers for load, store and custom functor invoke. They are general for any kind of kernel language, which supports basic interger arithmetic and C++ standard template, except for, when we have some backend specific intrinsic, like CUDA __umulhi. Using backend specific macro to isolate divergency. They are irrelevant to concurrency algorithm of kernel, backend specific data type (e.g. CUDA half2) and backend specific logic concurrent resource configuration (e.g. CUDA block size, thread work size). Signed-off-by: Feng Yuan <feng1.yuanintel.com> [ghstack-poisoned]
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
Stack from ghstack (oldest at bottom):
e.g. OffsetCalculator, IntegerDivider, MemoryAccess utilities, ElementwiseInvoke.
feature: #114835
Abstracted helpers are about integer arithmetic and C++ standard template helpers for load, store and custom functor invoke.
They are general for any kind of kernel language, which supports basic interger arithmetic and C++ standard template, except for,
when we have some backend specific intrinsic, like CUDA __umulhi. Using backend specific macro to isolate divergency.
They are irrelevant to concurrency algorithm of kernel, backend specific data type (e.g. CUDA half2)
and backend specific logic concurrent resource configuration (e.g. CUDA block size, thread work size).
Signed-off-by: Feng Yuan [email protected]