Skip to content

Conversation

@ptrblck
Copy link
Collaborator

@ptrblck ptrblck commented Jun 6, 2023

@ptrblck ptrblck requested a review from a team as a code owner June 6, 2023 16:31
@pytorch-bot
Copy link

pytorch-bot bot commented Jun 6, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/103091

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit f00c2e3:

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Jun 6, 2023
@atalman
Copy link
Contributor

atalman commented Jun 6, 2023

Hi @ptrblck I believe also this section need to be modified to match 12.1

 "nvidia-cuda-runtime-cu11==11.7.99; platform_system == 'Linux' and platform_machine == 'x86_64' | "
                        "nvidia-cuda-cupti-cu11==11.7.101; platform_system == 'Linux' and platform_machine == 'x86_64' | "
                        "nvidia-cudnn-cu11==8.5.0.96; platform_system == 'Linux' and platform_machine == 'x86_64' | "
                        "nvidia-cublas-cu11==11.10.3.66; platform_system == 'Linux' and platform_machine == 'x86_64' | "
                        "nvidia-cufft-cu11==10.9.0.58; platform_system == 'Linux' and platform_machine == 'x86_64' | "
                        "nvidia-curand-cu11==10.2.10.91; platform_system == 'Linux' and platform_machine == 'x86_64' | "
                        "nvidia-cusolver-cu11==11.4.0.1; platform_system == 'Linux' and platform_machine == 'x86_64' | "
                        "nvidia-cusparse-cu11==11.7.4.91; platform_system == 'Linux' and platform_machine == 'x86_64' | "
                        "nvidia-nccl-cu11==2.14.3; platform_system == 'Linux' and platform_machine == 'x86_64' | "
                        "nvidia-nvtx-cu11==11.7.91; platform_system == 'Linux' and platform_machine == 'x86_64'",

And call to

regenerate.sh

need to be made

@jbschlosser jbschlosser added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 6, 2023
@ptrblck ptrblck added the ciflow/binaries Trigger all binary build and upload jobs on the PR label Jun 9, 2023
@atalman
Copy link
Contributor

atalman commented Jun 9, 2023

@ptrblck we need to add cuda 12 libraries to download.pytorch.org, I will do it now then we would need to rerun the CI on this PR

@ptrblck
Copy link
Collaborator Author

ptrblck commented Jun 9, 2023

@pytorchmergebot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased cu121_small_wheel onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout cu121_small_wheel && git pull --rebase)

@ptrblck
Copy link
Collaborator Author

ptrblck commented Jun 10, 2023

Some failing builds seem to be unrelated to this PR, e.g. conda-py3_9-cuda12_1-build fails with:

2023-06-09T23:49:36.2752617Z CMake Error at /opt/conda/conda-bld/pytorch_1686354481724/_build_env/share/cmake-3.22/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
2023-06-09T23:49:36.2753522Z   Could NOT find OpenSSL, try to set the path to OpenSSL root folder in the
2023-06-09T23:49:36.2753855Z   system variable OPENSSL_ROOT_DIR: Found unsuitable version "3.0.8", but
2023-06-09T23:49:36.2754127Z   required is exact version "1.1" (found
2023-06-09T23:49:36.2754856Z   /opt/conda/conda-bld/pytorch_1686354481724/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib/libcrypto.so,
2023-06-09T23:49:36.2755328Z   )
2023-06-09T23:49:36.2755502Z Call Stack (most recent call first):
2023-06-09T23:49:36.2755995Z   /opt/conda/conda-bld/pytorch_1686354481724/_build_env/share/cmake-3.22/Modules/FindPackageHandleStandardArgs.cmake:592 (_FPHSA_FAILURE_MESSAGE)
2023-06-09T23:49:36.2756583Z   /opt/conda/conda-bld/pytorch_1686354481724/_build_env/share/cmake-3.22/Modules/FindOpenSSL.cmake:574 (find_package_handle_standard_args)
2023-06-09T23:49:36.2756932Z   third_party/gloo/gloo/CMakeLists.txt:80 (find_package)
2023-06-09T23:49:36.2757093Z 
2023-06-09T23:49:36.2757102Z 
2023-06-09T23:49:36.2769664Z -- Configuring incomplete, errors occurred!

Others seem to be real and fail with e.g.:

Processing /final_pkgs/torch-2.1.0.dev20230609+cu121.with.pypi.cudnn-cp38-cp38-linux_x86_64.whl
ERROR: Could not find a version that satisfies the requirement nvidia-cuda-cupti-cu12==12.1.105; platform_system == "Linux" and platform_machine == "x86_64" (from torch) (from versions: none)
ERROR: No matching distribution found for nvidia-cuda-cupti-cu12==12.1.105; platform_system == "Linux" and platform_machine == "x86_64"
...
ERROR: Could not find a version that satisfies the requirement nvidia-cublas-cu12==12.1.3.1; platform_system == "Linux" and platform_machine == "x86_64" (from torch) (from versions: none)
ERROR: No matching distribution found for nvidia-cublas-cu12==12.1.3.1; platform_system == "Linux" and platform_machine == "x86_64"
...
RROR: Could not find a version that satisfies the requirement nvidia-cuda-nvrtc-cu12==12.1.105; platform_system == "Linux" and platform_machine == "x86_64" (from torch) (from versions: none)
ERROR: No matching distribution found for nvidia-cuda-nvrtc-cu12==12.1.105; platform_system == "Linux" and platform_machine == "x86_64"

which should all be available:

(tmp) pbialecki@ptrblck-srv:~$ pip install nvidia-cuda-nvrtc-cu12==12.1.105
Collecting nvidia-cuda-nvrtc-cu12==12.1.105
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.7/23.7 MB 15.3 MB/s eta 0:00:00
Installing collected packages: nvidia-cuda-nvrtc-cu12
Successfully installed nvidia-cuda-nvrtc-cu12-12.1.105
(tmp) pbialecki@ptrblck-srv:~$ pip install nvidia-cublas-cu12==12.1.3.1
Collecting nvidia-cublas-cu12==12.1.3.1
  Downloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 410.6/410.6 MB 58.4 MB/s eta 0:00:00
Installing collected packages: nvidia-cublas-cu12
Successfully installed nvidia-cublas-cu12-12.1.3.1
(tmp) pbialecki@ptrblck-srv:~$ pip install nvidia-cuda-cupti-cu12==12.1.105
Collecting nvidia-cuda-cupti-cu12==12.1.105
  Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.1/14.1 MB 34.8 MB/s eta 0:00:00
Installing collected packages: nvidia-cuda-cupti-cu12
Successfully installed nvidia-cuda-cupti-cu12-12.1.105

@atalman do you think pytorch/builder#1420 was still running jobs?

@atalman
Copy link
Contributor

atalman commented Jun 12, 2023

@pytorchbot merge -f "All required workflows with small wheel are passing"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/binaries Trigger all binary build and upload jobs on the PR Merged open source topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants