Skip to content

Conversation

@bertmaher
Copy link
Contributor

@bertmaher bertmaher commented Oct 29, 2024

Bump the Triton pin to the release candidate commit for Triton 3.2.

A few changes beyond the pin bump itself are needed:

  • Remove the script that adds a git version hash suffix to the Triton wheel, since as of Add git commit to the version as a suffix triton-lang/triton#4812 Triton adds that itself
  • Add pybind11 to the Triton build setup, since Triton now depends on it
  • Use manylinux-2.28 for the Triton wheel builder, and use clang+lld for building to pick up the right glibc

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 29, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139206

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures, 1 Unrelated Failure

As of commit 53f4666 with merge base 740d1eb (image):

NEW FAILURES - The following jobs have failed:

  • inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor_timm, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
    convnext_base
  • inductor / cuda12.4-py3.10-gcc9-sm86 / test (inductor_timm, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
    convnext_base
  • linux-binary-manywheel / manywheel-py3_9-cuda11_8-test / test (gh)
    ERROR: Could not find a version that satisfies the requirement pytorch-triton==3.2.0+35c6c7c628; platform_system == "Linux" and platform_machine == "x86_64" and python_version < "3.13" (from torch) (from versions: 2.0.0+0d7e753227, 2.0.0+3aa3d7024e, 2.0.0+af76c989eb, 2.0.0+b8b470bc59, 2.0.0+c8bfe3f548, 2.0.0+d54c04abe2, 2.1.0, 2.1.0+2c32f43999, 2.1.0+3c400e7818, 2.1.0+440fd1bf20, 2.1.0+46672772b4, 2.1.0+6e4932cda8, 2.1.0+7d1a95b046, 2.1.0+9820899b38, 2.1.0+9e3e10c5ed, 2.1.0+bcad9dabe1, 2.1.0+e6216047b8, 2.1.0+e650d3708b, 2.2.0+e28a256d71, 3.0.0+45fff310c8, 3.0.0+757b6a61e7, 3.0.0+901819d2b6, 3.0.0+989adb9a29, 3.0.0+a9bc1a3647, 3.0.0+dedb7bdf33, 3.1.0+5fe38ffd73, 3.1.0+cf34004b8a, 3.2.0+git35c6c7c6)
  • linux-binary-manywheel / manywheel-py3_9-cuda12_4-test / test (gh)
    ERROR: Could not find a version that satisfies the requirement pytorch-triton==3.2.0+35c6c7c628; platform_system == "Linux" and platform_machine == "x86_64" and python_version < "3.13" (from torch) (from versions: 2.0.0+0d7e753227, 2.0.0+3aa3d7024e, 2.0.0+af76c989eb, 2.0.0+b8b470bc59, 2.0.0+c8bfe3f548, 2.0.0+d54c04abe2, 2.1.0, 2.1.0+2c32f43999, 2.1.0+3c400e7818, 2.1.0+440fd1bf20, 2.1.0+46672772b4, 2.1.0+6e4932cda8, 2.1.0+7d1a95b046, 2.1.0+9820899b38, 2.1.0+9e3e10c5ed, 2.1.0+bcad9dabe1, 2.1.0+e6216047b8, 2.1.0+e650d3708b, 2.2.0+e28a256d71, 3.0.0+45fff310c8, 3.0.0+757b6a61e7, 3.0.0+901819d2b6, 3.0.0+989adb9a29, 3.0.0+a9bc1a3647, 3.0.0+dedb7bdf33, 3.1.0+5fe38ffd73, 3.1.0+cf34004b8a, 3.2.0+git35c6c7c6)
  • linux-binary-manywheel / manywheel-py3_9-cuda12_6-test / test (gh)
    ERROR: Could not find a version that satisfies the requirement pytorch-triton==3.2.0+35c6c7c628; platform_system == "Linux" and platform_machine == "x86_64" and python_version < "3.13" (from torch) (from versions: 2.0.0+0d7e753227, 2.0.0+3aa3d7024e, 2.0.0+af76c989eb, 2.0.0+b8b470bc59, 2.0.0+c8bfe3f548, 2.0.0+d54c04abe2, 2.1.0, 2.1.0+2c32f43999, 2.1.0+3c400e7818, 2.1.0+440fd1bf20, 2.1.0+46672772b4, 2.1.0+6e4932cda8, 2.1.0+7d1a95b046, 2.1.0+9820899b38, 2.1.0+9e3e10c5ed, 2.1.0+bcad9dabe1, 2.1.0+e6216047b8, 2.1.0+e650d3708b, 2.2.0+e28a256d71, 3.0.0+45fff310c8, 3.0.0+757b6a61e7, 3.0.0+901819d2b6, 3.0.0+989adb9a29, 3.0.0+a9bc1a3647, 3.0.0+dedb7bdf33, 3.1.0+5fe38ffd73, 3.1.0+cf34004b8a, 3.2.0+git35c6c7c6)

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request topic: not user facing topic category labels Oct 29, 2024
@jataylo jataylo added the ciflow/rocm Trigger "default" config CI on ROCm label Oct 29, 2024
@davidberard98
Copy link
Contributor

davidberard98 commented Oct 30, 2024

I'm going to look at the cumsum failure

Edit: issue here #139348

pytorchmergebot pushed a commit that referenced this pull request Nov 23, 2024
Binary build is failing in trunk after #139206 lands, for example, https://github.com/pytorch/pytorch/actions/runs/11981181986/job/33410250461#step:17:539.  It's a bit tricky to spot the issue but the difference is between `3.2.0+35c6c7c628` set by PyTorch and `3.2.0+git35c6c7c6` from triton (look closely one has the length of 10, the other of 8 characters)

Triton now has its own nightly build logic in triton-lang/triton#4812 that takes only 8 characters by default while the original logic from PT took 10. So, PT nightly couldn't find the dependency.
Pull Request resolved: #141410
Approved by: https://github.com/seemethere, https://github.com/malfet
pytorchmergebot pushed a commit that referenced this pull request Nov 23, 2024
Binary build is failing in trunk after #139206 lands, for example, https://github.com/pytorch/pytorch/actions/runs/11981181986/job/33410250461#step:17:539.  It's a bit tricky to spot the issue but the difference is between `3.2.0+35c6c7c628` set by PyTorch and `3.2.0+git35c6c7c6` from triton (look closely one has the length of 10, the other of 8 characters)

Triton now has its own nightly build logic in triton-lang/triton#4812 that takes only 8 characters by default while the original logic from PT took 10. So, PT nightly couldn't find the dependency.
Pull Request resolved: #141410
Approved by: https://github.com/seemethere, https://github.com/malfet
@guangyey
Copy link
Collaborator

Thanks @bertmaher, we will rebase our PR #137886 to update XPU triton.

guangyey added a commit to intel/intel-xpu-backend-for-triton that referenced this pull request Nov 25, 2024
This reverts commit 74cde3c.

# Motivation
PyTorch community has uplifted triton to 3.2.0 in
pytorch/pytorch#139206. We should follow it.

# Solution
Revert this commit introduced in
#2716
pytorchmergebot pushed a commit that referenced this pull request Nov 28, 2024
Triton xpu build was stopped by #139206 temporally to wait triton xpu upgrade PR #137886 landed.

Works for #139722 and #114850

Pull Request resolved: #141775
Approved by: https://github.com/atalman
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
Bump the Triton pin to the release candidate commit for Triton 3.2.

A few changes beyond the pin bump itself are needed:
* Remove the script that adds a git version hash suffix to the Triton wheel, since as of triton-lang/triton#4812 Triton adds that itself
* Add `pybind11` to the Triton build setup, since Triton now depends on it
* Use manylinux-2.28 for the Triton wheel builder, and use clang+lld for building to pick up the right glibc

Pull Request resolved: pytorch#139206
Approved by: https://github.com/malfet, https://github.com/atalman

Co-authored-by: Andrey Talman <[email protected]>
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
Bump the Triton pin to the release candidate commit for Triton 3.2.

A few changes beyond the pin bump itself are needed:
* Remove the script that adds a git version hash suffix to the Triton wheel, since as of triton-lang/triton#4812 Triton adds that itself
* Add `pybind11` to the Triton build setup, since Triton now depends on it
* Use manylinux-2.28 for the Triton wheel builder, and use clang+lld for building to pick up the right glibc

Pull Request resolved: pytorch#139206
Approved by: https://github.com/malfet, https://github.com/atalman

Co-authored-by: Andrey Talman <[email protected]>
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
Binary build is failing in trunk after pytorch#139206 lands, for example, https://github.com/pytorch/pytorch/actions/runs/11981181986/job/33410250461#step:17:539.  It's a bit tricky to spot the issue but the difference is between `3.2.0+35c6c7c628` set by PyTorch and `3.2.0+git35c6c7c6` from triton (look closely one has the length of 10, the other of 8 characters)

Triton now has its own nightly build logic in triton-lang/triton#4812 that takes only 8 characters by default while the original logic from PT took 10. So, PT nightly couldn't find the dependency.
Pull Request resolved: pytorch#141410
Approved by: https://github.com/seemethere, https://github.com/malfet
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
Binary build is failing in trunk after pytorch#139206 lands, for example, https://github.com/pytorch/pytorch/actions/runs/11981181986/job/33410250461#step:17:539.  It's a bit tricky to spot the issue but the difference is between `3.2.0+35c6c7c628` set by PyTorch and `3.2.0+git35c6c7c6` from triton (look closely one has the length of 10, the other of 8 characters)

Triton now has its own nightly build logic in triton-lang/triton#4812 that takes only 8 characters by default while the original logic from PT took 10. So, PT nightly couldn't find the dependency.
Pull Request resolved: pytorch#141410
Approved by: https://github.com/seemethere, https://github.com/malfet
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
facebook-github-bot pushed a commit to meta-pytorch/tritonbench that referenced this pull request Dec 5, 2024
Summary:
Following upstream Triton pin update (pytorch/pytorch#139206), we can now enable more kernels in the CI.

Pull Request resolved: #96

Reviewed By: adamomainz

Differential Revision: D66763281

Pulled By: xuzhao9

fbshipit-source-id: 943472995c2da0f4cd0d01665b53786614f122ae
brad-mengchi added a commit to pytorch/FBGEMM that referenced this pull request Dec 11, 2024
Update triton version to align with PyTorch: pytorch/pytorch#139206
facebook-github-bot pushed a commit to pytorch/FBGEMM that referenced this pull request Dec 12, 2024
…4a (#3497)

Summary:
X-link: facebookresearch/FBGEMM#577

Update triton version to align with PyTorch: pytorch/pytorch#139206

Pull Request resolved: #3497

Reviewed By: q10

Differential Revision: D67075647

Pulled By: brad-mengchi

fbshipit-source-id: 6adc76484d1f6b827d62615c4311cf4b9fffd6b6
@cpuhrsch
Copy link
Contributor

This seems to break int_mm #144705 . I'l try to produce a standalone Triton kernel to reproduce this error.

q10 pushed a commit to q10/FBGEMM that referenced this pull request Apr 10, 2025
…4a (pytorch#577)

Summary:
Pull Request resolved: facebookresearch/FBGEMM#577

Update triton version to align with PyTorch: pytorch/pytorch#139206

X-link: pytorch#3497

Reviewed By: q10

Differential Revision: D67075647

Pulled By: brad-mengchi

fbshipit-source-id: 6adc76484d1f6b827d62615c4311cf4b9fffd6b6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td Do not run TD on this PR ciflow/inductor ciflow/inductor-rocm Trigger "inductor" config CI on ROCm ciflow/rocm Trigger "default" config CI on ROCm ciflow/trunk Trigger trunk jobs on your pull request keep-going Don't stop on first failure, keep running tests until the end Merged module: inductor Reverted topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.