Skip to content

Conversation

@naromero77amd
Copy link
Collaborator

@naromero77amd naromero77amd commented Apr 11, 2025

This PR adds support for submatrices in offline tuning for:

  • GEMM
  • GEMM and bias
  • ScaledGEMM
  • Batch Strided GEMM

New UTs to cover submatrices. Submatrices for strided batch API is not part of this PR and will be done seperately.

There is also a bug fix for offline tuning for full matrix for GEMM and bias in the NT case. Offline and online UTs were updated to cover this corner case.

To improve code readability, swapped definition of transA and transB.

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang

@pytorch-bot
Copy link

pytorch-bot bot commented Apr 11, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/151138

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ You can merge normally! (3 Unrelated Failures)

As of commit fa0e6a9 with merge base 0ad2c5d (image):

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added module: rocm AMD GPU support for Pytorch release notes: linalg_frontend release notes category ciflow/rocm Trigger "default" config CI on ROCm labels Apr 11, 2025
@naromero77amd naromero77amd marked this pull request as draft April 11, 2025 21:57
@naromero77amd naromero77amd requested a review from jeffdaily April 11, 2025 21:57
@naromero77amd naromero77amd removed the ciflow/rocm Trigger "default" config CI on ROCm label Apr 11, 2025
@pytorch-bot pytorch-bot bot added the ciflow/rocm Trigger "default" config CI on ROCm label Apr 14, 2025
@naromero77amd
Copy link
Collaborator Author

Confirming that new UTs ran in shard2 linux-focal-rocm-py3.10/test (default, 2, 6, linux.rocm.gpu.mi300.2):

test/test_linalg.py::TestLinalgCUDA::test_gemm_bias_submatrix_offline_tunableop_cuda_bfloat16, test/test_linalg.py::TestLinalgCUDA::test_gemm_bias_tunableop_cuda_bfloat16, 

and

test/test_linalg.py::TestLinalgCUDA::test_mm_cuda_float64, test/test_linalg.py::TestLinalgCUDA::test_mm_submatrix_offline_tunableop_cuda_float32, 

@mikaylagawarecki mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Apr 16, 2025
@naromero77amd naromero77amd added ciflow/rocm Trigger "default" config CI on ROCm and removed ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 labels Apr 16, 2025
@pytorch-bot pytorch-bot bot removed the ciflow/rocm Trigger "default" config CI on ROCm label Apr 17, 2025
@naromero77amd naromero77amd added the ciflow/rocm Trigger "default" config CI on ROCm label Apr 17, 2025
@pytorch-bot pytorch-bot bot added ciflow/rocm Trigger "default" config CI on ROCm and removed ciflow/rocm Trigger "default" config CI on ROCm labels Apr 18, 2025
@pytorch-bot pytorch-bot bot removed the ciflow/rocm Trigger "default" config CI on ROCm label Apr 18, 2025
@naromero77amd naromero77amd added the ciflow/rocm Trigger "default" config CI on ROCm label Apr 18, 2025
@naromero77amd
Copy link
Collaborator Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 19, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@naromero77amd naromero77amd deleted the feature_tunableop_support_slices branch April 19, 2025 05:35
naromero77amd added a commit to ROCm/pytorch that referenced this pull request May 8, 2025
This PR adds support for submatrices in offline tuning for:
- GEMM
- GEMM and bias
- ScaledGEMM
- Batch Strided GEMM

New UTs to cover submatrices. Submatrices for strided batch API is not part of this PR and will be done seperately.

There is also a bug fix for offline tuning for full matrix for GEMM and bias in the `NT` case. Offline and online UTs were updated to cover this corner case.

To improve code readability, swapped definition of transA and transB.

Pull Request resolved: pytorch#151138
Approved by: https://github.com/jeffdaily

(cherry picked from commit f6c1cf0)
pruthvistony pushed a commit to ROCm/pytorch that referenced this pull request May 8, 2025
…ledGEMM rowwise fix (#2106)

Align TunableOp UTs, features, and bug fixes with upstream PyTorch main

UTs:
pytorch#148982
pytorch#149930
pytorch#150142
pytorch#150463

Feature: offline tuning for submatrices:
pytorch#151138

Bug Fix: ScaledGEMM rowwise
pytorch#152403

---------

Co-authored-by: Jeff Daily <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/rocm Trigger "default" config CI on ROCm ciflow/trunk Trigger trunk jobs on your pull request Merged module: rocm AMD GPU support for Pytorch open source release notes: linalg_frontend release notes category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants