Let aotriton.cmake detect the best binary package to use, and deprecate aotriton_version.txt #137443

xinyazhang · 2024-10-07T20:41:54Z

We do not need install_aotriton.sh and aotriton_version.txt any more since aotriton.cmake now installs the best binary release package as the default option when building pytorch.

This should resolve the issue of needing a pre-installed aotriton package when building PyTorch for ROCm from source, which is not feasible if building PyTorch outside a CI docker image. With this change, a user can have a pre-installed AOTriton in their environment, if desired, and have the build pick it up by specifying the AOTRITON_INSTALLED_PREFIX env var, or have the build automatically detect and install the compatible version. As a third option, the user can also force AOTriton to build from source instead, using the AOTRITON_INSTALL_FROM_SOURCE env var.

Also, with the changes in this PR, the cmake build process handles the tasks of copying aotriton .so and images directory from torch/lib to the installation path.

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd

pytorch-bot · 2024-10-07T20:41:57Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137443

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 79ad25f with merge base ebeb433 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

xinyazhang · 2024-10-07T20:52:43Z

@jithunnair-amd shall we use this PR to bump to 0.7.1b as well?

jithunnair-amd

LGTM, let's make sure various CI workflows run successfully

jithunnair-amd · 2024-10-07T21:13:18Z

@jithunnair-amd shall we use this PR to bump to 0.7.1b as well?

No, I'd prefer we do that in a different PR

Fix tarball's suffix Read rocm version from variables set by LoadHIP.cmake Copy AOTRITON_INSTALLED_PREFIX/* to torch/ Otherwise bdist_wheel will miss libaotriton_v2.so Fix problems in aotriton.cmake Guard against new ROCM environment This also updates aotriton_version.txt's format Supply ROCM versions and SHA256 checksums as lists. Let install_aotriton.sh parse the new format of aotriton_version.txt Do not hardcode __AOTRITON_ARCH Updates to aotriton build steps and Dockerfiles (#1599) Changes cherry-picked from pytorch#137443

cmake/External/aotriton.cmake

Fix tarball's suffix Read rocm version from variables set by LoadHIP.cmake Copy AOTRITON_INSTALLED_PREFIX/* to torch/ Otherwise bdist_wheel will miss libaotriton_v2.so Fix problems in aotriton.cmake Guard against new ROCM environment This also updates aotriton_version.txt's format Supply ROCM versions and SHA256 checksums as lists. Let install_aotriton.sh parse the new format of aotriton_version.txt Do not hardcode __AOTRITON_ARCH Updates to aotriton build steps and Dockerfiles (#1599) Changes cherry-picked from pytorch#137443

xinyazhang · 2024-10-10T18:53:07Z

Confirmed the new algorithm's correctness with

set(__AOTRITON_ROCM_LIST
    "rocm6.1"
    "rocm6.2"
    "rocm6.4"
)

set(detection_lists
    "6.0"
    "6.1"
    "6.2"
    "6.3"
    "6.4"
)
foreach(__AOTRITON_SYSTEM_ROCM IN LISTS detection_lists)
    list(GET __AOTRITON_ROCM_LIST 0 __AOTRITON_ROCM_DEFAULT_STR)
    # Initialize __AOTRITON_ROCM to lowest version, in case all builds > system's ROCM
    string(SUBSTRING ${__AOTRITON_ROCM_DEFAULT_STR} 4 -1 __AOTRITON_ROCM)
    foreach(AOTRITON_ROCM_BUILD_STR IN LISTS __AOTRITON_ROCM_LIST)
      # len("rocm") == 4
      string(SUBSTRING ${AOTRITON_ROCM_BUILD_STR} 4 -1 AOTRITON_ROCM_BUILD)
      # Find the last build that <= system's ROCM
      # Assume the list is from lower to higher
      if(AOTRITON_ROCM_BUILD VERSION_GREATER __AOTRITON_SYSTEM_ROCM)
        break()
      endif()
      set(__AOTRITON_ROCM ${AOTRITON_ROCM_BUILD})
    endforeach()
    message(STATUS "System ROCM ${__AOTRITON_SYSTEM_ROCM}, Selected Package ${__AOTRITON_ROCM}")
endforeach()

Output

-- System ROCM 6.0, Selected Package 6.1
-- System ROCM 6.1, Selected Package 6.1
-- System ROCM 6.2, Selected Package 6.2
-- System ROCM 6.3, Selected Package 6.2
-- System ROCM 6.4, Selected Package 6.4

xinyazhang · 2024-11-07T18:15:03Z

Current plan: let's send varlen/nestedtensor support, which requires some fixes in AOTriton 0.7.2b, and then update this PR.

huydhn · 2024-11-11T19:50:10Z

@pytorchbot rebase

pytorchmergebot · 2024-11-11T19:51:37Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2024-11-11T19:51:39Z

Successfully rebased xinyazhang/aotriton-multiple_binary_packages onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout xinyazhang/aotriton-multiple_binary_packages && git pull --rebase)

jithunnair-amd · 2025-01-08T09:57:43Z

@jeffdaily Please review and approve if it looks good.

jithunnair-amd · 2025-01-08T23:58:18Z

@pytorchbot merge -f "Rocm CI jobs passed. Wheel build jobs also passed, installing all aotriton files during Pytorch build as expected"

pytorchmergebot · 2025-01-08T23:59:45Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

This is backported from upstream PR pytorch#140172, pytorch#137443 and pytorch#139432. Original commit message of pytorch#140172: Notable new features for SDPA operators on AMD systems from AOTriton 0.8b: 1. Nestedtensor support; 2. MQA/GQA support; 3. Restore Efficient attention support for causal=True and seqlen_q != seqlen_k cases; + The kernel should use top-left alignment, bottom right alignment will be added later 4. Move gfx1100 (RX7900/W7800/W7900) out of experimental support status. However, users are strongly recommended to update to ROCM 6.2.4, notably for its firmware updates. Related unit tests are enabled as well. Notable related changes from AOTriton 0.8b: 1. AOTriton 0.8b moves the GPU kernel out of libaotriton.so to a separate directory `aotriton.images`; 2. LZMA replaces ZSTD as GPU kernel compression algorithm for better compression ratio: aotriton0.8b (.so + aotriton.images take 350MB) compared to aotriton0.7b .so: 800MB 3. The compression cannot be disabled now, and `liblzma` is hard run-time dependency. + Should not be a problem, since `lzma` is part of Python Standard Library Pull Request resolved: pytorch#140172 Approved by: https://github.com/jithunnair-amd, https://github.com/jeffdaily Co-authored-by: Jithun Nair <[email protected]>

The previous port of upstream PR pytorch#137443 is incomplete

The previous port of upstream PR pytorch#137443 is incomplete Fixes #SWDEV-509002

This is backported from upstream PR pytorch#140172, pytorch#137443 and pytorch#139432. Original commit message of pytorch#140172: Notable new features for SDPA operators on AMD systems from AOTriton 0.8b: 1. Nestedtensor support; 2. MQA/GQA support; 3. Restore Efficient attention support for causal=True and seqlen_q != seqlen_k cases; + The kernel should use top-left alignment, bottom right alignment will be added later 4. Move gfx1100 (RX7900/W7800/W7900) out of experimental support status. However, users are strongly recommended to update to ROCM 6.2.4, notably for its firmware updates. Related unit tests are enabled as well. Notable related changes from AOTriton 0.8b: 1. AOTriton 0.8b moves the GPU kernel out of libaotriton.so to a separate directory `aotriton.images`; 2. LZMA replaces ZSTD as GPU kernel compression algorithm for better compression ratio: aotriton0.8b (.so + aotriton.images take 350MB) compared to aotriton0.7b .so: 800MB 3. The compression cannot be disabled now, and `liblzma` is hard run-time dependency. + Should not be a problem, since `lzma` is part of Python Standard Library Pull Request resolved: pytorch#140172 Approved by: https://github.com/jithunnair-amd, https://github.com/jeffdaily Fixes #ISSUE_NUMBER --------- Co-authored-by: Jithun Nair <[email protected]>

…te aotriton_version.txt (pytorch#137443) We do not need `install_aotriton.sh` and `aotriton_version.txt` any more since `aotriton.cmake` now installs the best binary release package as the default option when building pytorch. This should resolve the issue of needing a pre-installed aotriton package when building PyTorch for ROCm from source, which is not feasible if building PyTorch *outside* a CI docker image. With this change, a user can have a pre-installed AOTriton in their environment, if desired, and have the build pick it up by specifying the `AOTRITON_INSTALLED_PREFIX` env var, or have the build automatically detect and install the compatible version. As a third option, the user can also force AOTriton to build from source instead, using the `AOTRITON_INSTALL_FROM_SOURCE` env var. Also, with the changes in this PR, the cmake build process handles the tasks of copying aotriton .so and images directory from `torch/lib` to the installation path. Pull Request resolved: pytorch#137443 Approved by: https://github.com/jithunnair-amd, https://github.com/jeffdaily Co-authored-by: Jithun Nair <[email protected]>

…pport ========================================================================== Let aotriton.cmake detect the best binary package to use, and deprecate aotriton_version.txt (pytorch#137443) We do not need `install_aotriton.sh` and `aotriton_version.txt` any more since `aotriton.cmake` now installs the best binary release package as the default option when building pytorch. This should resolve the issue of needing a pre-installed aotriton package when building PyTorch for ROCm from source, which is not feasible if building PyTorch *outside* a CI docker image. With this change, a user can have a pre-installed AOTriton in their environment, if desired, and have the build pick it up by specifying the `AOTRITON_INSTALLED_PREFIX` env var, or have the build automatically detect and install the compatible version. As a third option, the user can also force AOTriton to build from source instead, using the `AOTRITON_INSTALL_FROM_SOURCE` env var. Also, with the changes in this PR, the cmake build process handles the tasks of copying aotriton .so and images directory from `torch/lib` to the installation path. Pull Request resolved: pytorch#137443 Approved by: https://github.com/jithunnair-amd, https://github.com/jeffdaily Co-authored-by: Jithun Nair <[email protected]> (cherry picked from commit bc57635) Bump AOTriton to 0.8.2b (#1853) Fixes SWDEV-508774 (cherry picked from commit 4bed249) Enable head_dim == 512 with AOTriton 0.8.1 (cherry picked from commit 6edd36f) Add unit tests for head dimension 512 (cherry picked from commit 85290fa)

…te aotriton_version.txt (pytorch#137443) We do not need `install_aotriton.sh` and `aotriton_version.txt` any more since `aotriton.cmake` now installs the best binary release package as the default option when building pytorch. This should resolve the issue of needing a pre-installed aotriton package when building PyTorch for ROCm from source, which is not feasible if building PyTorch *outside* a CI docker image. With this change, a user can have a pre-installed AOTriton in their environment, if desired, and have the build pick it up by specifying the `AOTRITON_INSTALLED_PREFIX` env var, or have the build automatically detect and install the compatible version. As a third option, the user can also force AOTriton to build from source instead, using the `AOTRITON_INSTALL_FROM_SOURCE` env var. Also, with the changes in this PR, the cmake build process handles the tasks of copying aotriton .so and images directory from `torch/lib` to the installation path. Pull Request resolved: pytorch#137443 Approved by: https://github.com/jithunnair-amd, https://github.com/jeffdaily Co-authored-by: Jithun Nair <[email protected]>

pytorch-bot bot added the release notes: releng release notes category label Oct 7, 2024

pytorchbot added the open source label Oct 7, 2024

jithunnair-amd approved these changes Oct 7, 2024

View reviewed changes

jithunnair-amd added ciflow/binaries Trigger all binary build and upload jobs on the PR ciflow/rocm Trigger "default" config CI on ROCm labels Oct 7, 2024

jithunnair-amd mentioned this pull request Oct 8, 2024

[release/2.3] AOTriton build refactor: download binary from github, NO BUILD FROM SOURCE option ROCm/pytorch#1623

Closed

xinyazhang mentioned this pull request Oct 8, 2024

Download pre-compiled AOTriton from GitHub unless AOTRITON_INSTALL_FROM_SOURCE=1 is set #135560

Closed

jithunnair-amd reviewed Oct 9, 2024

View reviewed changes

cmake/External/aotriton.cmake Outdated Show resolved Hide resolved

jithunnair-amd mentioned this pull request Oct 9, 2024

[release/2.4] AOTriton build refactor: download binary from github, NO BUILD FROM SOURCE option ROCm/pytorch#1626

Closed

jithunnair-amd requested changes Oct 9, 2024

View reviewed changes

cmake/External/aotriton.cmake Outdated Show resolved Hide resolved

xinyazhang requested a review from jithunnair-amd October 10, 2024 18:48

ethanwee1 pushed a commit to ROCm/pytorch that referenced this pull request Oct 29, 2024

Aotriton build refactor from pytorch/pull/137443

4667f82

ethanwee1 pushed a commit to ROCm/pytorch that referenced this pull request Oct 29, 2024

Adding in aotriton refactor from pytorch/pull/137443

0aff8a0

ethanwee1 pushed a commit to ROCm/pytorch that referenced this pull request Oct 29, 2024

Adding in aotriton refactor from pytorch/pull/137443

7cc09a7

ethanwee1 pushed a commit to ROCm/builder that referenced this pull request Oct 29, 2024

Adding in aotriton refactor from pytorch/pytorch/pull/137443

dabc67b

ethanwee1 pushed a commit to ROCm/builder that referenced this pull request Oct 29, 2024

Adding in aotriton refactor from pytorch/pytorch/pull/137443

e481b2b

xinyazhang force-pushed the xinyazhang/aotriton-multiple_binary_packages branch from a112857 to 9ce8535 Compare November 9, 2024 00:09

pytorchmergebot force-pushed the xinyazhang/aotriton-multiple_binary_packages branch from 9ce8535 to ecbdf92 Compare November 11, 2024 19:51

pruthvistony added module: rocm AMD GPU support for Pytorch rocm This tag is for PRs from ROCm team labels Jan 8, 2025

jeffdaily approved these changes Jan 8, 2025

View reviewed changes

pytorchmergebot added the merging label Jan 8, 2025

pytorchmergebot closed this in bc57635 Jan 9, 2025

pytorchmergebot added Merged and removed merging labels Jan 9, 2025

xinyazhang mentioned this pull request Jan 13, 2025

[release/2.5] Support head dimension 512 with AOTriton 0.8.1b ROCm/pytorch#1832

Merged

xinyazhang added a commit to ROCm/pytorch that referenced this pull request Jan 14, 2025

Fix SWDEV-509002

d95c445

The previous port of upstream PR pytorch#137443 is incomplete

xinyazhang mentioned this pull request Jan 14, 2025

Fix SWDEV-509002 ROCm/pytorch#1833

Merged

xinyazhang added a commit to ROCm/pytorch that referenced this pull request Jan 14, 2025

Fix SWDEV-509002 (#1833)

d3c942e

The previous port of upstream PR pytorch#137443 is incomplete Fixes #SWDEV-509002

xinyazhang mentioned this pull request Jan 29, 2025

Bump to AOTriton 0.8.2b, and related build system changes ROCm/TransformerEngine#110

Merged

13 tasks

Let aotriton.cmake detect the best binary package to use, and deprecate aotriton_version.txt #137443

Let aotriton.cmake detect the best binary package to use, and deprecate aotriton_version.txt #137443

Uh oh!

Conversation

xinyazhang commented Oct 7, 2024 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137443

✅ No Failures

Uh oh!

xinyazhang commented Oct 7, 2024

Uh oh!

jithunnair-amd left a comment

Choose a reason for hiding this comment

Uh oh!

jithunnair-amd commented Oct 7, 2024

Uh oh!

Uh oh!

Uh oh!

xinyazhang commented Oct 10, 2024

Uh oh!

xinyazhang commented Nov 7, 2024

Uh oh!

huydhn commented Nov 11, 2024

Uh oh!

pytorchmergebot commented Nov 11, 2024

Uh oh!

pytorchmergebot commented Nov 11, 2024

Uh oh!

jithunnair-amd commented Jan 8, 2025

Uh oh!

jithunnair-amd commented Jan 8, 2025

Uh oh!

pytorchmergebot commented Jan 8, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

xinyazhang commented Oct 7, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Oct 7, 2024 •

edited

Loading