Skip to content

Conversation

@Flamefire
Copy link
Collaborator

On Zen 2 (AMD EPYC) and Intel Sapphire Rapids this fails with small differences when compiled with native targeted optimizations. I.e. it fails with -march=znver2 but succeeds with -march=znver1.

I assume some operator fusing is being used by GCC. Small differences like using vmovdqa can be seen in the minimized code of the baddbmm kernel: https://godbolt.org/z/jsxMa91Wb

The greatest differences are consistent and the same on both CPU architectures:

Greatest absolute difference: 3.43852152582258e-05 at index (1, 2, 1) (up to 1e-05 allowed)
Greatest relative difference: 3.6034286949870875e-06 at index (1, 2, 1) (up to 1.3e-06 allowed)

Hence I assume this is in the expected tolerances especially as complex128 and all other types pass.

@Flamefire Flamefire requested a review from mruberry as a code owner April 29, 2025 12:35
@pytorch-bot
Copy link

pytorch-bot bot commented Apr 29, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152424

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b489969 with merge base 6737e2c (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: python_frontend python frontend release notes category label Apr 29, 2025
@Flamefire Flamefire added topic: not user facing topic category and removed release notes: python_frontend python frontend release notes category labels Apr 29, 2025
@Skylion007
Copy link
Collaborator

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 5, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / linux-focal-rocm-py3.10 / test (default, 1, 2, linux.rocm.gpu.2)

Details for Dev Infra team Raised by workflow job

@Flamefire
Copy link
Collaborator Author

This is the failure:

The job has exceeded the maximum execution time of 5h0m0s

hanging in the ROCM setup step. Can we ignore that for the merge?

@Flamefire
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / linux-focal-rocm-py3.10 / test (default, 1, 2, linux.rocm.gpu.2)

Details for Dev Infra team Raised by workflow job

@Flamefire
Copy link
Collaborator Author

@Skylion007 The failure seems to be unrelated and there is no message what exactly failed. What shall we do with that?

@github-actions
Copy link
Contributor

github-actions bot commented Aug 2, 2025

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label Aug 2, 2025
@Flamefire
Copy link
Collaborator Author

Any updates here?

@github-actions github-actions bot closed this Sep 3, 2025
@Flamefire Flamefire reopened this Sep 3, 2025
@Flamefire
Copy link
Collaborator Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

On Zen 2 (AMD EPYC) and Intel Sapphire Rapids this fails with small
differences when compiled with native targeted optimizations.
I.e. it fails with `-march=znver2` but succeeds with `-march=znver1`.
@pytorchmergebot
Copy link
Collaborator

Successfully rebased decomp-tolerance onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout decomp-tolerance && git pull --rebase)

@Flamefire
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@Flamefire Flamefire deleted the decomp-tolerance branch September 4, 2025 13:29
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
On Zen 2 (AMD EPYC) and Intel Sapphire Rapids this fails with small differences when compiled with native targeted optimizations. I.e. it fails with `-march=znver2` but succeeds with `-march=znver1`.

I assume some operator fusing is being used by GCC. Small differences like using `vmovdqa` can be seen in the minimized code of the baddbmm kernel: https://godbolt.org/z/jsxMa91Wb

The greatest differences are consistent and the same on both CPU architectures:
```
Greatest absolute difference: 3.43852152582258e-05 at index (1, 2, 1) (up to 1e-05 allowed)
Greatest relative difference: 3.6034286949870875e-06 at index (1, 2, 1) (up to 1.3e-06 allowed)
```

Hence I assume this is in the expected tolerances  especially as `complex128` and all other types pass.
Pull Request resolved: pytorch#152424
Approved by: https://github.com/malfet
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
On Zen 2 (AMD EPYC) and Intel Sapphire Rapids this fails with small differences when compiled with native targeted optimizations. I.e. it fails with `-march=znver2` but succeeds with `-march=znver1`.

I assume some operator fusing is being used by GCC. Small differences like using `vmovdqa` can be seen in the minimized code of the baddbmm kernel: https://godbolt.org/z/jsxMa91Wb

The greatest differences are consistent and the same on both CPU architectures:
```
Greatest absolute difference: 3.43852152582258e-05 at index (1, 2, 1) (up to 1e-05 allowed)
Greatest relative difference: 3.6034286949870875e-06 at index (1, 2, 1) (up to 1.3e-06 allowed)
```

Hence I assume this is in the expected tolerances  especially as `complex128` and all other types pass.
Pull Request resolved: pytorch#152424
Approved by: https://github.com/malfet
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
On Zen 2 (AMD EPYC) and Intel Sapphire Rapids this fails with small differences when compiled with native targeted optimizations. I.e. it fails with `-march=znver2` but succeeds with `-march=znver1`.

I assume some operator fusing is being used by GCC. Small differences like using `vmovdqa` can be seen in the minimized code of the baddbmm kernel: https://godbolt.org/z/jsxMa91Wb

The greatest differences are consistent and the same on both CPU architectures:
```
Greatest absolute difference: 3.43852152582258e-05 at index (1, 2, 1) (up to 1e-05 allowed)
Greatest relative difference: 3.6034286949870875e-06 at index (1, 2, 1) (up to 1.3e-06 allowed)
```

Hence I assume this is in the expected tolerances  especially as `complex128` and all other types pass.
Pull Request resolved: pytorch#152424
Approved by: https://github.com/malfet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged open source Stale topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants