Skip to content

Conversation

@naromero77amd
Copy link
Collaborator

@naromero77amd naromero77amd commented Mar 27, 2025

Improvements to unit tests and warnings for unsupported cases in offline tuning. Here are more details:

  • Previously we only compared the OpSig for the untuned vs. tuned entries. This was not strict enough so we now compare OpSig+ParamSig.
  • The main offline and online UTs are now stricter to make sure we exercise the code paths for the four combinations of transA and transB.
  • Offline tuning does not support some tensor shapes. Emit warning and skip tuning.

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang

@pytorch-bot
Copy link

pytorch-bot bot commented Mar 27, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/150142

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit b637fde with merge base 965784e (image):

UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added module: rocm AMD GPU support for Pytorch release notes: linalg_frontend release notes category labels Mar 27, 2025
@naromero77amd naromero77amd requested a review from jeffdaily March 27, 2025 22:48
@naromero77amd naromero77amd added the topic: not user facing topic category label Mar 27, 2025
@naromero77amd naromero77amd added the ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 label Mar 28, 2025
Copy link
Collaborator

@jeffdaily jeffdaily left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved, with minor nit

@pytorch-bot pytorch-bot bot removed the ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 label Mar 28, 2025
@jeffdaily jeffdaily added the ciflow/rocm Trigger "default" config CI on ROCm label Mar 28, 2025
@naromero77amd
Copy link
Collaborator Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased tunableop_ut_improvements onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout tunableop_ut_improvements && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the tunableop_ut_improvements branch from e62566f to 895ed5a Compare March 29, 2025 01:07
@pytorch-bot pytorch-bot bot removed the ciflow/rocm Trigger "default" config CI on ROCm label Mar 29, 2025
@naromero77amd naromero77amd added the ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 label Mar 29, 2025
@naromero77amd
Copy link
Collaborator Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 29, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

@naromero77amd
Copy link
Collaborator Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Tried to rebase and push PR #150142, but it was already up to date. Try rebasing against main by issuing:
@pytorchbot rebase -b main

@naromero77amd
Copy link
Collaborator Author

@pytorchbot rebase -b main

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased tunableop_ut_improvements onto refs/remotes/origin/main, please pull locally before adding more changes (for example, via git checkout tunableop_ut_improvements && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the tunableop_ut_improvements branch from 895ed5a to b637fde Compare March 29, 2025 20:04
@pytorch-bot pytorch-bot bot removed the ciflow/trunk Trigger trunk jobs on your pull request label Mar 29, 2025
@naromero77amd naromero77amd added ciflow/trunk Trigger trunk jobs on your pull request and removed ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 labels Mar 29, 2025
@naromero77amd
Copy link
Collaborator Author

rocm tests are passing but getting weird build failures for cuda, will try rebasing on main and then merging.

@naromero77amd
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@naromero77amd naromero77amd deleted the tunableop_ut_improvements branch March 31, 2025 04:14
amathewc pushed a commit to amathewc/pytorch that referenced this pull request Apr 17, 2025
…ytorch#150142)

Improvements to unit tests and warnings for unsupported cases in offline tuning. Here are more details:
- Previously we only compared the OpSig for the untuned vs. tuned entries. This was not strict enough so we now compare OpSig+ParamSig.
- The main offline and online UTs are now stricter to make sure we exercise the code paths for the four combinations of transA and transB.
- Offline tuning does not support some tensor shapes. Emit warning and skip tuning.

Pull Request resolved: pytorch#150142
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <[email protected]>
naromero77amd added a commit to ROCm/pytorch that referenced this pull request May 8, 2025
…ytorch#150142)

Improvements to unit tests and warnings for unsupported cases in offline tuning. Here are more details:
- Previously we only compared the OpSig for the untuned vs. tuned entries. This was not strict enough so we now compare OpSig+ParamSig.
- The main offline and online UTs are now stricter to make sure we exercise the code paths for the four combinations of transA and transB.
- Offline tuning does not support some tensor shapes. Emit warning and skip tuning.

Pull Request resolved: pytorch#150142
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <[email protected]>
(cherry picked from commit ca2ffc2)
pruthvistony pushed a commit to ROCm/pytorch that referenced this pull request May 8, 2025
…ledGEMM rowwise fix (#2106)

Align TunableOp UTs, features, and bug fixes with upstream PyTorch main

UTs:
pytorch#148982
pytorch#149930
pytorch#150142
pytorch#150463

Feature: offline tuning for submatrices:
pytorch#151138

Bug Fix: ScaledGEMM rowwise
pytorch#152403

---------

Co-authored-by: Jeff Daily <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged module: rocm AMD GPU support for Pytorch open source release notes: linalg_frontend release notes category topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants