Skip to content

Conversation

@naromero77amd
Copy link
Collaborator

@naromero77amd naromero77amd commented Apr 1, 2025

This PR fixes two race conditions that occur when UT tests are run:

  • In a particular order within a single shard.
  • Concurrently in multiple shards. Each test now gets a unique filename that depends on the test name.

There were two other minor improvements to the UTs:

  • matmul_offline_mgpu could occasionally fail if run on 8 GPUs. Criteria was relaxed.
  • bmm_tunableop_rocm checks that the rotating buffer is not zero. Otherwise, the test is not useful.

Additionally, several UTs took over 1 minute to run. Their duration was reduced by a combination of setting max tuning iterations to one, setting the rotating buffer size to zero, and/or reducing the matrix dimensions.

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang

@pytorch-bot
Copy link

pytorch-bot bot commented Apr 1, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/150463

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 541642d with merge base 783f045 (image):

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added module: rocm AMD GPU support for Pytorch topic: not user facing topic category ciflow/rocm Trigger "default" config CI on ROCm labels Apr 1, 2025
@naromero77amd naromero77amd removed the ciflow/rocm Trigger "default" config CI on ROCm label Apr 1, 2025
@naromero77amd naromero77amd requested a review from jeffdaily April 1, 2025 20:41
@naromero77amd naromero77amd changed the title [ROCm][TunableOp] Fix UT race condition and reduce duration [ROCm][TunableOp] Fix UT race condition and reduce UT duration. Apr 1, 2025
@pytorch-bot pytorch-bot bot added the ciflow/rocm Trigger "default" config CI on ROCm label Apr 1, 2025
@naromero77amd naromero77amd removed the ciflow/rocm Trigger "default" config CI on ROCm label Apr 1, 2025
@naromero77amd naromero77amd marked this pull request as draft April 1, 2025 22:12
@pytorch-bot pytorch-bot bot added the ciflow/rocm Trigger "default" config CI on ROCm label Apr 2, 2025
@naromero77amd naromero77amd removed the ciflow/rocm Trigger "default" config CI on ROCm label Apr 2, 2025
@naromero77amd naromero77amd marked this pull request as ready for review April 2, 2025 17:57
@jeffdaily jeffdaily added ciflow/rocm Trigger "default" config CI on ROCm ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 labels Apr 3, 2025
@naromero77amd
Copy link
Collaborator Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 3, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

timocafe pushed a commit to timocafe/pytorch that referenced this pull request Apr 16, 2025
…rch#150463)

This PR fixes two race conditions that occur when UT tests are run:
- In a particular order within a single shard.
- Concurrently in multiple shards. Each test now gets a unique filename that depends on the test name.

There were two other minor improvements to the UTs:
- matmul_offline_mgpu could occasionally fail if run on 8 GPUs. Criteria was relaxed.
- bmm_tunableop_rocm checks that the rotating buffer is not zero. Otherwise, the test is not useful.

Additionally, several UTs took over 1 minute to run. Their duration was reduced by a combination of setting max tuning iterations to one, setting the rotating buffer size to zero, and/or reducing the matrix dimensions.

Pull Request resolved: pytorch#150463
Approved by: https://github.com/jeffdaily
amathewc pushed a commit to amathewc/pytorch that referenced this pull request Apr 17, 2025
…rch#150463)

This PR fixes two race conditions that occur when UT tests are run:
- In a particular order within a single shard.
- Concurrently in multiple shards. Each test now gets a unique filename that depends on the test name.

There were two other minor improvements to the UTs:
- matmul_offline_mgpu could occasionally fail if run on 8 GPUs. Criteria was relaxed.
- bmm_tunableop_rocm checks that the rotating buffer is not zero. Otherwise, the test is not useful.

Additionally, several UTs took over 1 minute to run. Their duration was reduced by a combination of setting max tuning iterations to one, setting the rotating buffer size to zero, and/or reducing the matrix dimensions.

Pull Request resolved: pytorch#150463
Approved by: https://github.com/jeffdaily
naromero77amd added a commit to ROCm/pytorch that referenced this pull request May 8, 2025
…rch#150463)

This PR fixes two race conditions that occur when UT tests are run:
- In a particular order within a single shard.
- Concurrently in multiple shards. Each test now gets a unique filename that depends on the test name.

There were two other minor improvements to the UTs:
- matmul_offline_mgpu could occasionally fail if run on 8 GPUs. Criteria was relaxed.
- bmm_tunableop_rocm checks that the rotating buffer is not zero. Otherwise, the test is not useful.

Additionally, several UTs took over 1 minute to run. Their duration was reduced by a combination of setting max tuning iterations to one, setting the rotating buffer size to zero, and/or reducing the matrix dimensions.

Pull Request resolved: pytorch#150463
Approved by: https://github.com/jeffdaily

(cherry picked from commit d0026fa)
pruthvistony pushed a commit to ROCm/pytorch that referenced this pull request May 8, 2025
…ledGEMM rowwise fix (#2106)

Align TunableOp UTs, features, and bug fixes with upstream PyTorch main

UTs:
pytorch#148982
pytorch#149930
pytorch#150142
pytorch#150463

Feature: offline tuning for submatrices:
pytorch#151138

Bug Fix: ScaledGEMM rowwise
pytorch#152403

---------

Co-authored-by: Jeff Daily <[email protected]>
@naromero77amd naromero77amd deleted the fix_tunableop_ut_race_condition branch October 29, 2025 22:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/rocm Trigger "default" config CI on ROCm ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 ciflow/trunk Trigger trunk jobs on your pull request Merged module: rocm AMD GPU support for Pytorch open source topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants