Skip to content

Conversation

@Skylion007
Copy link
Collaborator

@Skylion007 Skylion007 commented Jun 4, 2024

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames @rec

@Skylion007 Skylion007 requested review from albanD, eqy, ezyang and malfet June 4, 2024 16:55
@Skylion007 Skylion007 requested a review from a team as a code owner June 4, 2024 16:55
@pytorch-bot
Copy link

pytorch-bot bot commented Jun 4, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/127925

Note: Links to docs will display an error until the docs builds have been completed.

❌ 13 New Failures, 1 Cancelled Job, 2 Unrelated Failures

As of commit 6a28122 with merge base f8a5b71 (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOB - The following job was cancelled. Please retry:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Jun 4, 2024
@Skylion007 Skylion007 added the better-engineering Relatively self-contained tasks for better engineering contributors label Jun 4, 2024
@Skylion007
Copy link
Collaborator Author

In the off chance we can squeeze this into to 2.4 or if it solves any inductor issues we are having.

@Skylion007 Skylion007 added the ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR label Jun 4, 2024
@ezyang ezyang requested a review from atalman June 5, 2024 14:36
@Skylion007
Copy link
Collaborator Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased skylion007/update-cudnn-9-1-1-17 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout skylion007/update-cudnn-9-1-1-17 && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the skylion007/update-cudnn-9-1-1-17 branch from 0557788 to 159c0c8 Compare June 7, 2024 14:05
@Skylion007 Skylion007 added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 7, 2024
@Skylion007 Skylion007 force-pushed the skylion007/update-cudnn-9-1-1-17 branch from 159c0c8 to decaa27 Compare July 14, 2024 19:00
@Skylion007 Skylion007 changed the title [BE] Update CUDNN to 9.1.1.17 [BE] Update CUDNN to 9.2.1.18 Jul 14, 2024
@albanD albanD removed their request for review July 16, 2024 21:34
@Skylion007 Skylion007 force-pushed the skylion007/update-cudnn-9-1-1-17 branch from decaa27 to a5f9cfe Compare August 2, 2024 14:30
@Skylion007 Skylion007 force-pushed the skylion007/update-cudnn-9-1-1-17 branch from a5f9cfe to 0d6ad43 Compare August 13, 2024 14:37
@Skylion007 Skylion007 changed the title [BE] Update CUDNN to 9.2.1.18 [BE] Update CUDNN to 9.3.0.75 Aug 13, 2024
@Skylion007
Copy link
Collaborator Author

Skylion007 commented Aug 13, 2024

@atalman Updated it again to the latest CUDNN version

@Skylion007 Skylion007 force-pushed the skylion007/update-cudnn-9-1-1-17 branch from 0d6ad43 to 299ca52 Compare August 13, 2024 14:40
@Skylion007 Skylion007 requested a review from jeffdaily as a code owner August 13, 2024 14:45
@eqy
Copy link
Collaborator

eqy commented Aug 13, 2024

We have a few internal (conv?) tests that seem to newly failiying with 9.3.0.75, will try to provide public reproducers that we can test...

@Skylion007
Copy link
Collaborator Author

@eqy Are these issues fixed in the newly released 9.4?

@Skylion007 Skylion007 force-pushed the skylion007/update-cudnn-9-1-1-17 branch 5 times, most recently from 75d27aa to c1d95f1 Compare September 28, 2024 16:48
@Skylion007 Skylion007 changed the title [BE] Update CUDNN to 9.3.0.75 [BE] Update CUDNN to 9.4.0.58 Sep 28, 2024
@Skylion007 Skylion007 force-pushed the skylion007/update-cudnn-9-1-1-17 branch 2 times, most recently from c2c03a1 to 2e46d77 Compare September 28, 2024 17:14
@eqy
Copy link
Collaborator

eqy commented Oct 3, 2024

Tentatively signing off on this, I think we are OK for now

@nWEIdia
Copy link
Collaborator

nWEIdia commented Oct 10, 2024

Have inductor tests been run with this PR yet? It would be good to know whether certain models would have fluctuations regarding the accuracy.

@Skylion007 Skylion007 force-pushed the skylion007/update-cudnn-9-1-1-17 branch 2 times, most recently from 892e328 to ef99783 Compare October 14, 2024 16:14
@Skylion007
Copy link
Collaborator Author

@nWEIdia Updated benchmarks, tinynet_a now passes inductor accuracy tests

Copy link
Collaborator

@nWEIdia nWEIdia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me!

@Skylion007 Skylion007 force-pushed the skylion007/update-cudnn-9-1-1-17 branch from ef99783 to 6a28122 Compare October 14, 2024 16:30
@Skylion007
Copy link
Collaborator Author

Skylion007 commented Oct 14, 2024

@nWEIdia I need to get the binaries uploaded to the pytorch_builder repo, that's the main blocker.

@nWEIdia
Copy link
Collaborator

nWEIdia commented Oct 14, 2024

@nWEIdia I need to get the binaries uploaded to the pytorch_builder repo, that's the main blocker.

Do you mean uploading the cudnn PYPI dependency packages the S3 (download.pytorch.org)?
cc @atalman for help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

better-engineering Relatively self-contained tasks for better engineering contributors ciflow/inductor ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/trunk Trigger trunk jobs on your pull request module: dynamo open source topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants