Skip to content

Conversation

@xinyazhang
Copy link
Collaborator

@xinyazhang xinyazhang commented Oct 31, 2024

@pytorch-bot pytorch-bot bot added the module: rocm AMD GPU support for Pytorch label Oct 31, 2024
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Oct 31, 2024

CLA Signed


The committers listed above are authorized under a signed CLA.

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 31, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139432

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 7e1f559 with merge base 8c22e09 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@xinyazhang
Copy link
Collaborator Author

@pytorchbot label "topic: not user facing"

@xinyazhang
Copy link
Collaborator Author

@pytorchbot label "rocm"

@pytorch-bot pytorch-bot bot added the rocm This tag is for PRs from ROCm team label Nov 8, 2024
Copy link
Collaborator

@jithunnair-amd jithunnair-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jithunnair-amd jithunnair-amd marked this pull request as ready for review November 22, 2024 21:35
@jithunnair-amd
Copy link
Collaborator

@pytorchbot merge -f "Unrelated CI failures"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

xinyazhang added a commit to ROCm/pytorch that referenced this pull request Jan 13, 2025
This is backported from upstream PR pytorch#140172, pytorch#137443 and pytorch#139432.

Original commit message of pytorch#140172:

Notable new features for SDPA operators on AMD systems from AOTriton 0.8b:

1. Nestedtensor support;
2. MQA/GQA support;
3. Restore Efficient attention support for causal=True and seqlen_q != seqlen_k cases;
    + The kernel should use top-left alignment, bottom right alignment will be added later
4. Move gfx1100 (RX7900/W7800/W7900) out of experimental support status.
   However, users are strongly recommended to update to ROCM 6.2.4, notably for
   its firmware updates.

Related unit tests are enabled as well.

Notable related changes from AOTriton 0.8b:

1. AOTriton 0.8b moves the GPU kernel out of libaotriton.so to a separate directory `aotriton.images`;
2. LZMA replaces ZSTD as GPU kernel compression algorithm for better compression ratio: aotriton0.8b (.so + aotriton.images take 350MB) compared to aotriton0.7b .so: 800MB
3. The compression cannot be disabled now, and `liblzma` is hard run-time dependency.
    + Should not be a problem, since `lzma` is part of Python Standard Library

Pull Request resolved: pytorch#140172
Approved by: https://github.com/jithunnair-amd, https://github.com/jeffdaily

Co-authored-by: Jithun Nair <[email protected]>
pruthvistony pushed a commit to ROCm/pytorch that referenced this pull request Jan 16, 2025
This is backported from upstream PR pytorch#140172, pytorch#137443 and pytorch#139432.

Original commit message of pytorch#140172:

Notable new features for SDPA operators on AMD systems from AOTriton
0.8b:

1. Nestedtensor support;
2. MQA/GQA support;
3. Restore Efficient attention support for causal=True and seqlen_q !=
seqlen_k cases;
+ The kernel should use top-left alignment, bottom right alignment will
be added later
4. Move gfx1100 (RX7900/W7800/W7900) out of experimental support status.
However, users are strongly recommended to update to ROCM 6.2.4, notably
for its firmware updates.

Related unit tests are enabled as well.

Notable related changes from AOTriton 0.8b:

1. AOTriton 0.8b moves the GPU kernel out of libaotriton.so to a
separate directory `aotriton.images`;
2. LZMA replaces ZSTD as GPU kernel compression algorithm for better
compression ratio: aotriton0.8b (.so + aotriton.images take 350MB)
compared to aotriton0.7b .so: 800MB
3. The compression cannot be disabled now, and `liblzma` is hard
run-time dependency.
+ Should not be a problem, since `lzma` is part of Python Standard
Library

Pull Request resolved: pytorch#140172
Approved by: https://github.com/jithunnair-amd,
https://github.com/jeffdaily

Fixes #ISSUE_NUMBER

---------

Co-authored-by: Jithun Nair <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: rocm AMD GPU support for Pytorch open source rocm This tag is for PRs from ROCm team topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants