Skip to content

Conversation

@malfet
Copy link
Contributor

@malfet malfet commented Apr 4, 2025

By addressing a feedback requested at #145746

@malfet malfet requested review from eqy and syed-ahmed as code owners April 4, 2025 20:49
@pytorch-bot
Copy link

pytorch-bot bot commented Apr 4, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/150705

Note: Links to docs will display an error until the docs builds have been completed.

❌ 7 New Failures, 2 Unrelated Failures

As of commit 482e98a with merge base 5b368fa (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: cuda release notes category label Apr 4, 2025
@malfet malfet added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 4, 2025
@malfet malfet force-pushed the malfet/cuda-do-not-vec128-on-12.6 branch from fcdd985 to f65fbb5 Compare April 4, 2025 21:13
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should USE_ROCM here also be inverted if the CUDA_VERSION condition is >= 12080

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't think so. Before #145746 vec8_alignment were only available to USE_ROCM, after it was enabled unconditionally and I want it to be enabled for either ROCM or CUDA newer than 12.6

@malfet malfet force-pushed the malfet/cuda-do-not-vec128-on-12.6 branch from f65fbb5 to 837e8b2 Compare April 4, 2025 21:27
Copy link
Contributor

@atalman atalman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@malfet malfet added the ci-no-td Do not run TD on this PR label Apr 4, 2025
ZainRizvi
ZainRizvi previously approved these changes Apr 4, 2025
return 8;
} else
#else
#elif defined(CUDA_VERSION) && CUDA_VERSION >= 12080
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't there be some logic to handle the case when CUDA_VERSION < 12080?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ZainRizvi this is basically redoing #145746 only if CUDA >= 12.8

Copy link
Contributor

@atalman atalman Apr 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hence this code should not be applied by default but only for CUDA 12.8+

if (address % vec8_alignment == 0) {
   return 8;
  } else

@ZainRizvi ZainRizvi dismissed their stale review April 4, 2025 21:46

accidentally clicked approve

@nWEIdia
Copy link
Collaborator

nWEIdia commented Apr 5, 2025

Do we need to tag ciflow binary to check the size reduction?

@malfet
Copy link
Contributor Author

malfet commented Apr 5, 2025

Do we need to tag ciflow binary to check the size reduction?

No, not really, one generated in ciflow/trunk should be sufficient.
I.e. from https://github.com/pytorch/pytorch/actions/runs/14274488251/job/40014846340?pr=150705#step:16:136 binary size is 808845730 vs 854474525 on trunk

@atalman atalman added the ciflow/binaries Trigger all binary build and upload jobs on the PR label Apr 5, 2025
@atalman
Copy link
Contributor

atalman commented Apr 5, 2025

attached ciflow/binaries just in case, to validate that binaries are built correctly and we see difference between cu 126 and cu 128

@atalman
Copy link
Contributor

atalman commented Apr 7, 2025

Hi @malfet looks like we are getting /var/lib/jenkins/workspace/aten/src/ATen/test/cuda_vectorized_test.cu:50: Failure. I think test also need to be updated for this PR.

@atalman
Copy link
Contributor

atalman commented Apr 8, 2025

@pytorchmergebot merge -f "trunk jobs look good, manywheel jobs as well"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@atalman
Copy link
Contributor

atalman commented Apr 8, 2025

@pytorchbot cherry-pick --onto release/2.7 -c critical

pytorchbot pushed a commit that referenced this pull request Apr 8, 2025
@pytorchbot
Copy link
Collaborator

Cherry picking #150705

The cherry pick PR is at #150818 and it is recommended to link a critical cherry pick PR with an issue. The following tracker issues are updated:

Details for Dev Infra team Raised by workflow job

atalman pushed a commit that referenced this pull request Apr 8, 2025
[CUDA] Only use vec128 if CUDA version is newer than 12.8 (#150705)

By addressing a feedback requested at #145746
Pull Request resolved: #150705
Approved by: https://github.com/atalman

(cherry picked from commit 5228986)

Co-authored-by: Nikita Shulga <[email protected]>
@atalman
Copy link
Contributor

atalman commented Apr 8, 2025

@pytorchmergebot revert -c nosignal -m "break periodic tests"

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

@pytorchmergebot
Copy link
Collaborator

@malfet your PR has been successfully reverted.

@malfet malfet added ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR no-runner-experiments Bypass Meta/LF runner determinator labels Apr 8, 2025
@malfet
Copy link
Contributor Author

malfet commented Apr 9, 2025

Let's see what will happen if I'll just try to revert that PR completely: #150895

timocafe pushed a commit to timocafe/pytorch that referenced this pull request Apr 16, 2025
amathewc pushed a commit to amathewc/pytorch that referenced this pull request Apr 17, 2025
@malfet
Copy link
Contributor Author

malfet commented Apr 23, 2025

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased malfet/cuda-do-not-vec128-on-12.6 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout malfet/cuda-do-not-vec128-on-12.6 && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the malfet/cuda-do-not-vec128-on-12.6 branch from 5043e31 to 2ef5638 Compare April 23, 2025 15:33
@malfet malfet force-pushed the malfet/cuda-do-not-vec128-on-12.6 branch from 2ef5638 to 482e98a Compare April 24, 2025 18:35
@malfet malfet closed this May 28, 2025
@github-actions github-actions bot deleted the malfet/cuda-do-not-vec128-on-12.6 branch June 29, 2025 02:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td Do not run TD on this PR ciflow/binaries Trigger all binary build and upload jobs on the PR ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/trunk Trigger trunk jobs on your pull request Merged no-runner-experiments Bypass Meta/LF runner determinator release notes: cuda release notes category Reverted

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants