Skip to content

Conversation

@Aidyn-A
Copy link
Collaborator

@Aidyn-A Aidyn-A commented Aug 6, 2025

The test fails with:

RuntimeError: var_mean only support floating point and complex dtypes

cc @ptrblck @msaroufim @eqy @jerryzh168

@Aidyn-A Aidyn-A requested a review from eqy August 6, 2025 06:25
@Aidyn-A Aidyn-A self-assigned this Aug 6, 2025
@Aidyn-A Aidyn-A added module: cuda Related to torch.cuda, and CUDA support in general topic: not user facing topic category labels Aug 6, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Aug 6, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159939

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 712d30f with merge base a53d14d (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link
Collaborator

@eqy eqy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approving assuming that this never ran before and therefore this is not a regression

@Aidyn-A
Copy link
Collaborator Author

Aidyn-A commented Aug 6, 2025

approving assuming that this never ran before and therefore this is not a regression

Indeed, it never ran before. The only machine which has that much memory is GB300.

@Aidyn-A
Copy link
Collaborator Author

Aidyn-A commented Aug 7, 2025

Hmm, those are strange failures

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased test_sort_large_float16 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout test_sort_large_float16 && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the test_sort_large_float16 branch from a321cf0 to 712d30f Compare August 7, 2025 17:41
@Aidyn-A Aidyn-A added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 7, 2025
@Aidyn-A
Copy link
Collaborator Author

Aidyn-A commented Aug 8, 2025

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@jithunnair-amd
Copy link
Collaborator

approving assuming that this never ran before and therefore this is not a regression

Indeed, it never ran before. The only machine which has that much memory is GB300.

@Aidyn-A Did this updated test run in any of the CI jobs? It failed in ROCm CI (because the MI325 has >200GB memory) with the error Cannot sort dimension of length 8192 (link) (error coming from here), but I can't tell if it passed for CUDA at all.

@Aidyn-A
Copy link
Collaborator Author

Aidyn-A commented Aug 8, 2025

approving assuming that this never ran before and therefore this is not a regression

Indeed, it never ran before. The only machine which has that much memory is GB300.

@Aidyn-A Did this updated test run in any of the CI jobs? It failed in ROCm CI (because the MI325 has >200GB memory) with the error Cannot sort dimension of length 8192 (link) (error coming from here), but I can't tell if it passed for CUDA at all.

Yes, it is passing on GB300. Was it passing prior my changes? I believe it should have failed, because var_mean does not support integer dtypes.

hinriksnaer pushed a commit to hinriksnaer/pytorch that referenced this pull request Aug 8, 2025
The test fails with:
>RuntimeError: var_mean only support floating point and complex dtypes

Pull Request resolved: pytorch#159939
Approved by: https://github.com/eqy
BLOrange-AMD pushed a commit to ROCm/pytorch that referenced this pull request Aug 19, 2025
The test fails with:
>RuntimeError: var_mean only support floating point and complex dtypes

Pull Request resolved: pytorch#159939
Approved by: https://github.com/eqy
jithunnair-amd pushed a commit to ROCm/pytorch that referenced this pull request Aug 25, 2025
Currently std::min -> ::min did not work as expected on ROCm when input
values >= 2147483648

Replace std::min to ternary statement
Also std::min can be replaced by explicit typing std::min<int64_t>

fixes on ROCm:

test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_large_cuda_float16
error:
RuntimeError: Cannot sort dimension of length 8192

Combines upstream PRs:
- pytorch#161054 to fix std::min on ROCm
- pytorch#155546 fix python test
- pytorch#159939 change test dtype from
int8 to float16

Fixes: SWDEV-526432
dhonnappa-amd pushed a commit to ROCm/pytorch that referenced this pull request Aug 25, 2025
Currently std::min -> ::min did not work as expected on ROCm when input
values >= 2147483648

Replace std::min to ternary statement
Also std::min can be replaced by explicit typing std::min<int64_t>

fixes on ROCm:

test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_large_cuda_float16
error:
RuntimeError: Cannot sort dimension of length 8192

Combines upstream PRs:
- pytorch#161054 to fix std::min on ROCm
- pytorch#155546 fix python test
- pytorch#159939 change test dtype from
int8 to float16

Fixes: SWDEV-526432
dhonnappa-amd pushed a commit to ROCm/pytorch that referenced this pull request Aug 25, 2025
Currently std::min -> ::min did not work as expected on ROCm when input
values >= 2147483648

Replace std::min to ternary statement
Also std::min can be replaced by explicit typing std::min<int64_t>

fixes on ROCm:

test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_large_cuda_float16
error:
RuntimeError: Cannot sort dimension of length 8192

Combines upstream PRs:
- pytorch#161054 to fix std::min on ROCm
- pytorch#155546 fix python test
- pytorch#159939 change test dtype from
int8 to float16

Fixes: SWDEV-526432
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
The test fails with:
>RuntimeError: var_mean only support floating point and complex dtypes

Pull Request resolved: pytorch#159939
Approved by: https://github.com/eqy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged module: cuda Related to torch.cuda, and CUDA support in general open source topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants