Skip to content

Conversation

@nWEIdia
Copy link
Collaborator

@nWEIdia nWEIdia commented Mar 11, 2024

@nWEIdia nWEIdia requested a review from a team as a code owner March 11, 2024 23:23
@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Mar 11, 2024
@pytorch-bot
Copy link

pytorch-bot bot commented Mar 11, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/121684

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit 456757f with merge base a03b9a2 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@johnnynunez
Copy link
Contributor

johnnynunez commented Mar 13, 2024

@ptrblck @nWEIdia why not with cudnn 9.0? It's major improve on flash attention, and I saw that headers were updated to cudnn 9.0
https://docs.nvidia.com/deeplearning/cudnn/release-notes.html

@ptrblck
Copy link
Collaborator

ptrblck commented Mar 13, 2024

@johnnynunez It's not available as described in the review: #121684 (comment)

@johnnynunez
Copy link
Contributor

@johnnynunez It's not available as described in the review: #121684 (comment)

I see.. thanks

@drisspg drisspg added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Mar 13, 2024
@ptrblck ptrblck added the ciflow/binaries Trigger all binary build and upload jobs on the PR label Mar 15, 2024
@johnnynunez
Copy link
Contributor

when will it merged? :)

@johnnynunez
Copy link
Contributor

@johnnynunez
Copy link
Contributor

nWEIdia added a commit to nWEIdia/builder that referenced this pull request Apr 11, 2024
GHA results show this is needed to fix errors in pytorch/pytorch#121684
Reference: pytorch#1374
@atalman
Copy link
Contributor

atalman commented Apr 11, 2024

@nWEIdia while windows AMI is not yet in place we would need to add only Linux part of things.

malfet pushed a commit to pytorch/builder that referenced this pull request Apr 12, 2024
GHA results show this is needed to fix errors in pytorch/pytorch#121684 

Reference: #1374
@malfet malfet added ciflow/binaries_wheel Trigger binary build and upload jobs for wheel on the PR and removed ciflow/binaries Trigger all binary build and upload jobs on the PR labels Apr 16, 2024
@nWEIdia
Copy link
Collaborator Author

nWEIdia commented Apr 16, 2024

Hi @malfet, for pip wheels, the build seems successful, but the test job failed: installing it required the presence of cu124/ AWS directory. https://github.com/pytorch/pytorch/actions/runs/8701283769/job/23866760262
Could you please help prepare the cu124 directory? https://download.pytorch.org/whl/nightly/cu124 Thanks!

nWEIdia and others added 11 commits April 30, 2024 10:26
We need to keep 12.1 here still since this is the default wheel we will be uploading to pypi 12.4 should still be experimental build for now.

Co-authored-by: Andrey Talman <[email protected]>
@atalman
Copy link
Contributor

atalman commented Apr 30, 2024

@pytorchmergebot merge -f "All required jobs are passing"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorch-bot bot pushed a commit that referenced this pull request May 3, 2024
@nWEIdia nWEIdia mentioned this pull request Jul 31, 2024
pytorchmergebot pushed a commit that referenced this pull request Aug 20, 2024
Trying to keep a record of the steps before I lose track of it.

- 1st Commit: Similar to pytorch/builder#1720
- 2nd Commit:  Update CUDA 12.4 CI CUDA versions from 12.4.0 to 12.4.1 mapping to changes in https://github.com/pytorch/pytorch/pull/125944/files
- 3rd Commit: update for aarch64 install_cuda_aarch64.sh docker step
- 4th Commit: aaa456e Related #121684
- Synchronization point: Meta helps uploading pypi cuda dependencies specified in .github/scripts/generate_binary_build_matrix.py
- The above pypi upload is done (thanks Andrey!), restarted jobs like https://github.com/pytorch/pytorch/actions/runs/10188203670/job/28369471321
- 7753234, use temporary docker containers (generated from a previous successful container build). If merged, these containers would be rebuilt, therefore testing them now.  (5th commit)
- 6th commit 5f93c62: revert the 5th commit. Update, done but have to debug seemingly irrelevant failures (rocm/xpu/mps)

Pull Request resolved: #132202
Approved by: https://github.com/Skylion007, https://github.com/eqy, https://github.com/atalman
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/binaries_wheel Trigger binary build and upload jobs for wheel on the PR ciflow/trunk Trigger trunk jobs on your pull request Merged open source skip-pr-sanity-checks topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.