[CUDA] Only use vec128 if CUDA version is newer than 12.8 #150705

malfet · 2025-04-04T20:49:11Z

By addressing a feedback requested at #145746

pytorch-bot · 2025-04-04T20:49:14Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/150705

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 7 New Failures, 2 Unrelated Failures

As of commit 482e98a with merge base 5b368fa ():

NEW FAILURES - The following jobs have failed:

periodic / linux-focal-cuda11.8-py3.10-gcc9-debug / test (default, 1, 7, ephemeral.linux.4xlarge.nvidia.gpu, oncall:debug-build) (gh)
test_reductions.py::TestReductionsCUDA::test_median_nan_values_cuda_float16
periodic / linux-focal-cuda11.8-py3.10-gcc9-debug / test (default, 2, 7, ephemeral.linux.4xlarge.nvidia.gpu, oncall:debug-build) (gh)
test_cuda_expandable_segments.py::TestCuda::test_graph_make_graphed_callables_with_amp_cache_disabled_allow_unused_input
periodic / linux-focal-cuda11.8-py3.10-gcc9-debug / test (default, 3, 7, ephemeral.linux.4xlarge.nvidia.gpu, oncall:debug-build) (gh)
'test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_byte_cuda_float32'
periodic / linux-focal-cuda11.8-py3.10-gcc9-debug / test (default, 4, 7, ephemeral.linux.4xlarge.nvidia.gpu, oncall:debug-build) (gh)
test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_bool
periodic / linux-focal-cuda11.8-py3.10-gcc9-debug / test (default, 5, 7, ephemeral.linux.4xlarge.nvidia.gpu, oncall:debug-build) (gh)
test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_3_requires_grad_False_cuda_bool
periodic / linux-focal-cuda11.8-py3.10-gcc9-debug / test (default, 6, 7, ephemeral.linux.4xlarge.nvidia.gpu, oncall:debug-build) (gh)
test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_32_batch_size_32_backend_cublas_cuda
periodic / linux-focal-cuda11.8-py3.10-gcc9-debug / test (default, 7, 7, ephemeral.linux.4xlarge.nvidia.gpu, oncall:debug-build) (gh)
test_cuda.py::TestCuda::test_graph_make_graphed_callables_with_amp_cache_disabled_allow_unused_input

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

periodic / linux-focal-cuda12.6-py3-gcc11-slow-gradcheck / test (default, 4, 8, ephemeral.linux.g5.4xlarge.nvidia.gpu, module:slowgradcheck) (gh) (similar failure)
inductor/test_pattern_matcher.py::TestPatternMatcher::test_fused_int_mm_mul_gating
periodic / linux-focal-rocm-py3.10 / test (distributed, 2, 3, linux.rocm.gpu.4, module:rocm, oncall:distributed) (gh) (similar failure)
distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardNDTraining::test_2d_mlp_with_nd_mesh

This comment was automatically generated by Dr. CI and updates every 15 minutes.

eqy · 2025-04-04T21:21:23Z

aten/src/ATen/native/cuda/MemoryAccess.cuh

should USE_ROCM here also be inverted if the CUDA_VERSION condition is >= 12080

No, I don't think so. Before #145746 vec8_alignment were only available to USE_ROCM, after it was enabled unconditionally and I want it to be enabled for either ROCM or CUDA newer than 12.6

atalman

lgtm

ZainRizvi · 2025-04-04T21:45:09Z

aten/src/ATen/native/cuda/MemoryAccess.cuh

    return 8;
  } else
-#else
+#elif defined(CUDA_VERSION) && CUDA_VERSION >= 12080


Shouldn't there be some logic to handle the case when CUDA_VERSION < 12080?

Hi @ZainRizvi this is basically redoing #145746 only if CUDA >= 12.8

Hence this code should not be applied by default but only for CUDA 12.8+

if (address % vec8_alignment == 0) { return 8; } else

accidentally clicked approve

nWEIdia · 2025-04-05T00:20:00Z

Do we need to tag ciflow binary to check the size reduction?

malfet · 2025-04-05T00:26:01Z

Do we need to tag ciflow binary to check the size reduction?

No, not really, one generated in ciflow/trunk should be sufficient.
I.e. from https://github.com/pytorch/pytorch/actions/runs/14274488251/job/40014846340?pr=150705#step:16:136 binary size is 808845730 vs 854474525 on trunk

atalman · 2025-04-05T13:48:48Z

attached ciflow/binaries just in case, to validate that binaries are built correctly and we see difference between cu 126 and cu 128

atalman · 2025-04-07T19:03:37Z

Hi @malfet looks like we are getting /var/lib/jenkins/workspace/aten/src/ATen/test/cuda_vectorized_test.cu:50: Failure. I think test also need to be updated for this PR.

atalman · 2025-04-08T00:44:30Z

@pytorchmergebot merge -f "trunk jobs look good, manywheel jobs as well"

pytorchmergebot · 2025-04-08T00:45:57Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

atalman · 2025-04-08T01:20:58Z

@pytorchbot cherry-pick --onto release/2.7 -c critical

By addressing a feedback requested at #145746 Pull Request resolved: #150705 Approved by: https://github.com/atalman (cherry picked from commit 5228986)

pytorchbot · 2025-04-08T01:25:27Z

Cherry picking #150705

The cherry pick PR is at #150818 and it is recommended to link a critical cherry pick PR with an issue. The following tracker issues are updated:

[v.2.7.0] Release Tracker #149044 (comment)

Details for Dev Infra team

Raised by workflow job

[CUDA] Only use vec128 if CUDA version is newer than 12.8 (#150705) By addressing a feedback requested at #145746 Pull Request resolved: #150705 Approved by: https://github.com/atalman (cherry picked from commit 5228986) Co-authored-by: Nikita Shulga <[email protected]>

atalman · 2025-04-08T16:27:26Z

@pytorchmergebot revert -c nosignal -m "break periodic tests"

pytorchmergebot · 2025-04-08T16:28:56Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

…50705)" This reverts commit 5228986. Reverted #150705 on behalf of https://github.com/atalman due to break periodic tests ([comment](#150705 (comment)))

pytorchmergebot · 2025-04-08T16:29:08Z

@malfet your PR has been successfully reverted.

malfet · 2025-04-09T01:27:38Z

Let's see what will happen if I'll just try to revert that PR completely: #150895

…0705) By addressing a feedback requested at pytorch#145746 Pull Request resolved: pytorch#150705 Approved by: https://github.com/atalman

…torch#150705)" This reverts commit 5228986. Reverted pytorch#150705 on behalf of https://github.com/atalman due to break periodic tests ([comment](pytorch#150705 (comment)))

…0705) By addressing a feedback requested at pytorch#145746 Pull Request resolved: pytorch#150705 Approved by: https://github.com/atalman

…torch#150705)" This reverts commit 5228986. Reverted pytorch#150705 on behalf of https://github.com/atalman due to break periodic tests ([comment](pytorch#150705 (comment)))

malfet · 2025-04-23T15:32:13Z

@pytorchbot rebase

pytorchmergebot · 2025-04-23T15:33:48Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2025-04-23T15:33:51Z

Successfully rebased malfet/cuda-do-not-vec128-on-12.6 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout malfet/cuda-do-not-vec128-on-12.6 && git pull --rebase)

malfet requested review from eqy and syed-ahmed as code owners April 4, 2025 20:49

pytorch-bot bot added the release notes: cuda release notes category label Apr 4, 2025

malfet added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 4, 2025

malfet force-pushed the malfet/cuda-do-not-vec128-on-12.6 branch from fcdd985 to f65fbb5 Compare April 4, 2025 21:13

eqy reviewed Apr 4, 2025

View reviewed changes

malfet force-pushed the malfet/cuda-do-not-vec128-on-12.6 branch from f65fbb5 to 837e8b2 Compare April 4, 2025 21:27

atalman approved these changes Apr 4, 2025

View reviewed changes

malfet added the ci-no-td Do not run TD on this PR label Apr 4, 2025

ZainRizvi previously approved these changes Apr 4, 2025

View reviewed changes

atalman added the ciflow/binaries Trigger all binary build and upload jobs on the PR label Apr 5, 2025

pytorchmergebot added the merging label Apr 8, 2025

pytorchmergebot closed this in 5228986 Apr 8, 2025

pytorchmergebot added Merged and removed merging labels Apr 8, 2025

pytorchbot mentioned this pull request Apr 8, 2025

[v.2.7.0] Release Tracker #149044

Closed

pytorchmergebot added the Reverted label Apr 8, 2025

pytorchmergebot reopened this Apr 8, 2025

malfet added ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR no-runner-experiments Bypass Meta/LF runner determinator labels Apr 8, 2025

pytorchmergebot force-pushed the malfet/cuda-do-not-vec128-on-12.6 branch from 5043e31 to 2ef5638 Compare April 23, 2025 15:33

eqy approved these changes Apr 23, 2025

View reviewed changes

malfet added 5 commits April 24, 2025 10:22

[CUDA] Only use vec128 if CUDA version is newer than 12.8

6c74d5d

Fix test

f99c658

And hide those as well

087cf53

Was that a mistake?

d8c6a47

And this one

482e98a

malfet force-pushed the malfet/cuda-do-not-vec128-on-12.6 branch from 2ef5638 to 482e98a Compare April 24, 2025 18:35

malfet closed this May 28, 2025

github-actions bot deleted the malfet/cuda-do-not-vec128-on-12.6 branch June 29, 2025 02:20

[CUDA] Only use vec128 if CUDA version is newer than 12.8 #150705

[CUDA] Only use vec128 if CUDA version is newer than 12.8 #150705

Uh oh!

Conversation

malfet commented Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/150705

❌ 7 New Failures, 2 Unrelated Failures

Uh oh!

eqy Apr 4, 2025

Choose a reason for hiding this comment

Uh oh!

malfet Apr 4, 2025

Choose a reason for hiding this comment

Uh oh!

atalman left a comment

Choose a reason for hiding this comment

Uh oh!

ZainRizvi Apr 4, 2025

Choose a reason for hiding this comment

Uh oh!

atalman Apr 4, 2025

Choose a reason for hiding this comment

Uh oh!

atalman Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nWEIdia commented Apr 5, 2025

Uh oh!

malfet commented Apr 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

atalman commented Apr 5, 2025

Uh oh!

atalman commented Apr 7, 2025

Uh oh!

atalman commented Apr 8, 2025

Uh oh!

pytorchmergebot commented Apr 8, 2025

Merge started

Uh oh!

atalman commented Apr 8, 2025

Uh oh!

pytorchbot commented Apr 8, 2025

Cherry picking #150705

Uh oh!

atalman commented Apr 8, 2025

Uh oh!

pytorchmergebot commented Apr 8, 2025

Uh oh!

pytorchmergebot commented Apr 8, 2025

Uh oh!

malfet commented Apr 9, 2025

Uh oh!

malfet commented Apr 23, 2025

Uh oh!

pytorchmergebot commented Apr 23, 2025

Uh oh!

pytorchmergebot commented Apr 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

malfet commented Apr 4, 2025 •

edited

Loading

pytorch-bot bot commented Apr 4, 2025 •

edited

Loading

atalman Apr 4, 2025 •

edited

Loading

malfet commented Apr 5, 2025 •

edited

Loading