Skip to content

Conversation

@atalman
Copy link
Contributor

@atalman atalman commented Sep 23, 2025

Preload logic no longer works with CUDA 13.0
See the installation path:

ls /home/ubuntu/.venv/lib/python3.10/site-packages/nvidia/cu13/lib/
libcheckpoint.so   libcudadevrt.a      libcufft.so.12   libcufile_rdma.so.1  libcusolver.so.12    libnvJitLink.so.13  libnvperf_target.so            libnvrtc.alt.so.13    libpcsamplingutil.so
libcublas.so.13    libcudart.so.13     libcufftw.so.12  libcupti.so.13       libcusolverMg.so.12  libnvblas.so.13     libnvrtc-builtins.alt.so.13.0  libnvrtc.so.13
libcublasLt.so.13  libcudart_static.a  libcufile.so.0   libcurand.so.10      libcusparse.so.12    libnvperf_host.so   libnvrtc-builtins.so.13.0      libnvtx3interop.so.1

ls /home/ubuntu/.venv/lib/python3.10/site-packages/nvidia/
cu13  cudnn  cusparselt  nccl  nvshmem

Test using script from : #162367

Kernel test passed!

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 23, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163661

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 Cancelled Jobs, 1 Unrelated Failure

As of commit ee91517 with merge base 5f0c7cb (image):

CANCELLED JOBS - The following jobs were cancelled. Please retry:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@atalman atalman added the topic: not user facing topic category label Sep 23, 2025
@nWEIdia
Copy link
Collaborator

nWEIdia commented Sep 23, 2025

Linking the changes in CUDA 13 that requires this PR: https://developer.nvidia.com/blog/whats-new-and-important-in-cuda-toolkit-13-0/#wheel_package_changes_to_cuda_130

Copy link
Collaborator

@nWEIdia nWEIdia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix, LGTM!

@tinglvv
Copy link
Collaborator

tinglvv commented Sep 23, 2025

Thanks for the fix, this should resolve the issue - #162367.

@atalman
Copy link
Contributor Author

atalman commented Sep 24, 2025

@pytorchmergebot merge -i

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 24, 2025
@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: Approvers from one of the following sets are needed:

  • functorch (kshitij12345, srossross, chillee, zou3519, guilhermeleobas)
  • superuser (pytorch/metamates)
  • Core Reviewers (mruberry, lezcano, Skylion007, ngimel, peterbell10, ...)
  • Core Maintainers (soumith, gchanan, ezyang, dzhulgakov, malfet, ...)
Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@atalman atalman added the ciflow/binaries Trigger all binary build and upload jobs on the PR label Sep 24, 2025
@atalman
Copy link
Contributor Author

atalman commented Sep 24, 2025

@pytorchmergebot merge -f "signal looks good"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@atalman
Copy link
Contributor Author

atalman commented Sep 24, 2025

@pytorchbot cherry-pick --onto release/2.9 --fixes "Critical CI fix" -c critical

pytorchbot pushed a commit that referenced this pull request Sep 24, 2025
Preload logic no longer works with CUDA 13.0
See the installation path:
```
ls /home/ubuntu/.venv/lib/python3.10/site-packages/nvidia/cu13/lib/
libcheckpoint.so   libcudadevrt.a      libcufft.so.12   libcufile_rdma.so.1  libcusolver.so.12    libnvJitLink.so.13  libnvperf_target.so            libnvrtc.alt.so.13    libpcsamplingutil.so
libcublas.so.13    libcudart.so.13     libcufftw.so.12  libcupti.so.13       libcusolverMg.so.12  libnvblas.so.13     libnvrtc-builtins.alt.so.13.0  libnvrtc.so.13
libcublasLt.so.13  libcudart_static.a  libcufile.so.0   libcurand.so.10      libcusparse.so.12    libnvperf_host.so   libnvrtc-builtins.so.13.0      libnvtx3interop.so.1

ls /home/ubuntu/.venv/lib/python3.10/site-packages/nvidia/
cu13  cudnn  cusparselt  nccl  nvshmem
```

Test using script from : #162367
```
Kernel test passed!
```
Pull Request resolved: #163661
Approved by: https://github.com/nWEIdia, https://github.com/tinglvv, https://github.com/Camyll

(cherry picked from commit 141fc72)
@pytorchbot
Copy link
Collaborator

Cherry picking #163661

The cherry pick PR is at #163766 and it is linked with issue Critical CI fix. The following tracker issues are updated:

Details for Dev Infra team Raised by workflow job

atalman added a commit that referenced this pull request Sep 25, 2025
[CD] CUDA 13.0 fix preload logic to include nvidia/cu13/lib/ (#163661)

Preload logic no longer works with CUDA 13.0
See the installation path:
```
ls /home/ubuntu/.venv/lib/python3.10/site-packages/nvidia/cu13/lib/
libcheckpoint.so   libcudadevrt.a      libcufft.so.12   libcufile_rdma.so.1  libcusolver.so.12    libnvJitLink.so.13  libnvperf_target.so            libnvrtc.alt.so.13    libpcsamplingutil.so
libcublas.so.13    libcudart.so.13     libcufftw.so.12  libcupti.so.13       libcusolverMg.so.12  libnvblas.so.13     libnvrtc-builtins.alt.so.13.0  libnvrtc.so.13
libcublasLt.so.13  libcudart_static.a  libcufile.so.0   libcurand.so.10      libcusparse.so.12    libnvperf_host.so   libnvrtc-builtins.so.13.0      libnvtx3interop.so.1

ls /home/ubuntu/.venv/lib/python3.10/site-packages/nvidia/
cu13  cudnn  cusparselt  nccl  nvshmem
```

Test using script from : #162367
```
Kernel test passed!
```
Pull Request resolved: #163661
Approved by: https://github.com/nWEIdia, https://github.com/tinglvv, https://github.com/Camyll

(cherry picked from commit 141fc72)

Co-authored-by: atalman <[email protected]>
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
…#163661)

Preload logic no longer works with CUDA 13.0
See the installation path:
```
ls /home/ubuntu/.venv/lib/python3.10/site-packages/nvidia/cu13/lib/
libcheckpoint.so   libcudadevrt.a      libcufft.so.12   libcufile_rdma.so.1  libcusolver.so.12    libnvJitLink.so.13  libnvperf_target.so            libnvrtc.alt.so.13    libpcsamplingutil.so
libcublas.so.13    libcudart.so.13     libcufftw.so.12  libcupti.so.13       libcusolverMg.so.12  libnvblas.so.13     libnvrtc-builtins.alt.so.13.0  libnvrtc.so.13
libcublasLt.so.13  libcudart_static.a  libcufile.so.0   libcurand.so.10      libcusparse.so.12    libnvperf_host.so   libnvrtc-builtins.so.13.0      libnvtx3interop.so.1

ls /home/ubuntu/.venv/lib/python3.10/site-packages/nvidia/
cu13  cudnn  cusparselt  nccl  nvshmem
```

Test using script from : pytorch#162367
```
Kernel test passed!
```
Pull Request resolved: pytorch#163661
Approved by: https://github.com/nWEIdia, https://github.com/tinglvv, https://github.com/Camyll
jainapurva pushed a commit that referenced this pull request Sep 29, 2025
Preload logic no longer works with CUDA 13.0
See the installation path:
```
ls /home/ubuntu/.venv/lib/python3.10/site-packages/nvidia/cu13/lib/
libcheckpoint.so   libcudadevrt.a      libcufft.so.12   libcufile_rdma.so.1  libcusolver.so.12    libnvJitLink.so.13  libnvperf_target.so            libnvrtc.alt.so.13    libpcsamplingutil.so
libcublas.so.13    libcudart.so.13     libcufftw.so.12  libcupti.so.13       libcusolverMg.so.12  libnvblas.so.13     libnvrtc-builtins.alt.so.13.0  libnvrtc.so.13
libcublasLt.so.13  libcudart_static.a  libcufile.so.0   libcurand.so.10      libcusparse.so.12    libnvperf_host.so   libnvrtc-builtins.so.13.0      libnvtx3interop.so.1

ls /home/ubuntu/.venv/lib/python3.10/site-packages/nvidia/
cu13  cudnn  cusparselt  nccl  nvshmem
```

Test using script from : #162367
```
Kernel test passed!
```
Pull Request resolved: #163661
Approved by: https://github.com/nWEIdia, https://github.com/tinglvv, https://github.com/Camyll
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/binaries Trigger all binary build and upload jobs on the PR ciflow/trunk Trigger trunk jobs on your pull request Merged topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants