Skip to content

Conversation

@jerrymannil
Copy link
Collaborator

Enable *_load_dwordx4 ISA for BFloat16 and Half by using vector size of 8

Co-author: @akadutta

…se kernels (#1671)

Enable *_load_dwordx4 ISA for BFloat16 and Half by using vector size of
8

Co-author: @akadutta
@okakarpa
Copy link
Collaborator

Jenkins build for 1626a877c9703e7ad341e8710b7bcf62e0ece7d7 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Detected error during Pytorch building:

Warning: Unused direct dependencies:
	/var/lib/jenkins/pytorch/build/lib/libshm.so
	/opt/rocm/lib/libhsa-runtime64.so.1
	/lib/x86_64-linux-gnu/libm.so.6
[8000/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention.hip.o
FAILED: caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention.hip.o /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention.hip.o 
cd /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip && /opt/conda/envs/py_3.10/bin/cmake -E make_directory /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/. && /opt/conda/envs/py_3.10/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=RELEASE -D generated_file:STRING=/var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/./torch_hip_generated_attention.hip.o -P /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention.hip.o.cmake
In file included from /var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/attention.hip:84:
/var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/aotriton_adapter.h:120:10: error: no matching constructor for initialization of 'aotriton::TensorView<0>'
  120 |   return aotriton::TensorView<0>(reinterpret_cast<intptr_t>(q.data_ptr()),
      |          ^                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

@pruthvistony
Copy link
Collaborator

@jerrymannil
Can we upstream this change.

@pruthvistony pruthvistony merged commit 8f9b9d3 into ROCm:release/2.4 Nov 21, 2024
1 check failed
@jerrymannil
Copy link
Collaborator Author

Can we upstream this change.
Let's wait for QA testing to complete.
Also @jeffdaily has asked me to do a full UT sute run. Also doing that now

@jithunnair-amd jithunnair-amd changed the title [ROCm] Enable vector size for 8 for half precision types in elementwise kernels (#1671) [release/2.4] [ROCm] Enable vector size for 8 for half precision types in elementwise kernels (#1671) Nov 25, 2024
@jerrymannil
Copy link
Collaborator Author

!cherry-pick --onto release/2.5

rocm-mici pushed a commit that referenced this pull request Jan 13, 2025
…se kernels (#1671) (#1738)

Enable *_load_dwordx4 ISA for BFloat16 and Half by using vector size of
8

Co-author: @akadutta
@rocm-mici
Copy link

Created branch release/2.5_cherry-pick_pr-1738 and #1831

pruthvistony pushed a commit that referenced this pull request Jan 14, 2025
…f precision types in elementwise kernels (#1831)

Cherry-pick of #1738

Co-authored-by: Jerry Mannil <[email protected]>
jithunnair-amd pushed a commit that referenced this pull request Mar 17, 2025
…se kernels (#1671) (#1738)

Enable *_load_dwordx4 ISA for BFloat16 and Half by using vector size of
8

Co-author: @akadutta
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants