Skip to content

Conversation

@atalman
Copy link
Contributor

@atalman atalman commented Oct 23, 2024

Fixes #123649
Use Manylinux 2_28 Docker builds for PyTorch Nightly builds

This moves the wheels to a Docker image that uses : quay.io/pypa/manylinux_2_28_x86_64 as a base rather then centos:7 which is EOL on June 30, 2024.

Information:
https://github.com/pypa/manylinux#manylinux_2_28-almalinux-8-based

manylinux_2_28 (AlmaLinux 8 based)
Toolchain: GCC 13
Built wheels are also expected to be compatible with other distros using glibc 2.28 or later, including:
Debian 10+
Ubuntu 18.10+
Fedora 29+
CentOS/RHEL 8+

This migration should enable us to migrate to latest CUDNN version, and land this PR: #137978

@atalman atalman requested a review from a team as a code owner October 23, 2024 19:12
@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Oct 23, 2024
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 23, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138732

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 10 New Failures, 1 Unrelated Failure

As of commit 20e5128 with merge base c1bf714 (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@atalman atalman added the ciflow/binaries_wheel Trigger binary build and upload jobs for wheel on the PR label Oct 23, 2024
pytorchmergebot pushed a commit that referenced this pull request Nov 4, 2024
Enabling Manywheel builds here: #138732

During the build I observe the failure with cuda jobs:

```
-- Compiler does not support SVE extension. Will not build perfkernels.
-- Found CUDA: /usr/local/cuda (found version "11.8")
-- The CUDA compiler identification is unknown
CMake Error at cmake/public/cuda.cmake:47 (enable_language):
  No CMAKE_CUDA_COMPILER could be found.

  Tell CMake where to find the compiler by setting either the environment
  variable "CUDACXX" or the CMake cache entry CMAKE_CUDA_COMPILER to the full
  path to the compiler, or to the compiler name if it is in the PATH.
Call Stack (most recent call first):
  cmake/Dependencies.cmake:44 (include)
  CMakeLists.txt:851 (include)
```

While correct sequence suppose to be:
```
-- Found CUDA: /usr/local/cuda (found version "11.8")
-- The CUDA compiler identification is NVIDIA 11.8.89
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Found CUDAToolkit: /usr/local/cuda/include (found version "11.8.89")
```

Issue found to be missing PATH setting in 2_28 Docker file.  This section exist in CentOS Docker file here:
https://github.com/pytorch/pytorch/blob/main/.ci/docker/manywheel/Dockerfile#L174-L175

(Please Note these Docker images are not used yet. The #138732 should enable using these images)
Pull Request resolved: #139631
Approved by: https://github.com/malfet, https://github.com/huydhn
pytorchmergebot pushed a commit that referenced this pull request Nov 5, 2024
… builds (#139636)

Install setuptools and wheel dependencies for cp312, cp313, cp313t on Manylinux 2_28 images.
This should resolve
```
ModuleNotFoundError: No module named 'setuptools'
```
On PR: #138732

This issue was addressed on XPU images already. We should apply the same fix for the rest of the images instead of keeping it XPU specific.
Pull Request resolved: #139636
Approved by: https://github.com/huydhn, https://github.com/chuanqi129
@atalman atalman force-pushed the test_manywheel2_28_images branch from 9ed8215 to 91efbbf Compare November 5, 2024 12:43
Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 5, 2024
@malfet
Copy link
Contributor

malfet commented Nov 5, 2024

@pytorchbot merge -f "Binary builds looks fine"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@atalman
Copy link
Contributor Author

atalman commented Nov 6, 2024

@pytorchmergebot revert -c nosignal -m "Reverting for now will be relanding"

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot added a commit that referenced this pull request Nov 6, 2024
@pytorchmergebot
Copy link
Collaborator

@atalman your PR has been successfully reverted.

@pytorchmergebot pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Nov 6, 2024
@jondea
Copy link
Contributor

jondea commented Nov 8, 2024

I think #137696 is necessary when building for AArch64 using Manylinux2_28

atalman added a commit to atalman/pytorch that referenced this pull request Nov 11, 2024
… builds (pytorch#139636)

Install setuptools and wheel dependencies for cp312, cp313, cp313t on Manylinux 2_28 images.
This should resolve
```
ModuleNotFoundError: No module named 'setuptools'
```
On PR: pytorch#138732

This issue was addressed on XPU images already. We should apply the same fix for the rest of the images instead of keeping it XPU specific.
Pull Request resolved: pytorch#139636
Approved by: https://github.com/huydhn, https://github.com/chuanqi129
@atalman atalman force-pushed the test_manywheel2_28_images branch from e74c907 to 20e5128 Compare November 13, 2024 22:02
@atalman
Copy link
Contributor Author

atalman commented Nov 14, 2024

@pytorchmergebot merge -f "lint is green"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

atalman added a commit to pytorch/test-infra that referenced this pull request Nov 14, 2024
zero000064 pushed a commit to zero000064/pytorch that referenced this pull request Nov 14, 2024
zero000064 pushed a commit to zero000064/pytorch that referenced this pull request Nov 14, 2024
Fixes pytorch#123649
Use Manylinux 2_28 Docker builds for PyTorch Nightly builds

This moves the wheels to a Docker image that uses : ``quay.io/pypa/manylinux_2_28_x86_64`` as a base rather then ``centos:7`` which is EOL on June 30, 2024.

Information:
https://github.com/pypa/manylinux#manylinux_2_28-almalinux-8-based

manylinux_2_28 (AlmaLinux 8 based)
Toolchain: GCC 13
Built wheels are also expected to be compatible with other distros using glibc 2.28 or later, including:
Debian 10+
Ubuntu 18.10+
Fedora 29+
CentOS/RHEL 8+

This migration should enable us to migrate to latest CUDNN version, and land this PR: pytorch#137978

Pull Request resolved: pytorch#138732
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/huydhn
NicolasHug pushed a commit to NicolasHug/test-infra that referenced this pull request Nov 18, 2024
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
Fixes pytorch#123649
Use Manylinux 2_28 Docker builds for PyTorch Nightly builds

This moves the wheels to a Docker image that uses : ``quay.io/pypa/manylinux_2_28_x86_64`` as a base rather then ``centos:7`` which is EOL on June 30, 2024.

Information:
https://github.com/pypa/manylinux#manylinux_2_28-almalinux-8-based

manylinux_2_28 (AlmaLinux 8 based)
Toolchain: GCC 13
Built wheels are also expected to be compatible with other distros using glibc 2.28 or later, including:
Debian 10+
Ubuntu 18.10+
Fedora 29+
CentOS/RHEL 8+

This migration should enable us to migrate to latest CUDNN version, and land this PR: pytorch#137978

Pull Request resolved: pytorch#138732
Approved by: https://github.com/Skylion007, https://github.com/malfet
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
Fixes pytorch#123649
Use Manylinux 2_28 Docker builds for PyTorch Nightly builds

This moves the wheels to a Docker image that uses : ``quay.io/pypa/manylinux_2_28_x86_64`` as a base rather then ``centos:7`` which is EOL on June 30, 2024.

Information:
https://github.com/pypa/manylinux#manylinux_2_28-almalinux-8-based

manylinux_2_28 (AlmaLinux 8 based)
Toolchain: GCC 13
Built wheels are also expected to be compatible with other distros using glibc 2.28 or later, including:
Debian 10+
Ubuntu 18.10+
Fedora 29+
CentOS/RHEL 8+

This migration should enable us to migrate to latest CUDNN version, and land this PR: pytorch#137978

Pull Request resolved: pytorch#138732
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/huydhn
fmo-mt pushed a commit to fmo-mt/pytorch that referenced this pull request Dec 11, 2024
fmo-mt pushed a commit to fmo-mt/pytorch that referenced this pull request Dec 11, 2024
Fixes pytorch#123649
Use Manylinux 2_28 Docker builds for PyTorch Nightly builds

This moves the wheels to a Docker image that uses : ``quay.io/pypa/manylinux_2_28_x86_64`` as a base rather then ``centos:7`` which is EOL on June 30, 2024.

Information:
https://github.com/pypa/manylinux#manylinux_2_28-almalinux-8-based

manylinux_2_28 (AlmaLinux 8 based)
Toolchain: GCC 13
Built wheels are also expected to be compatible with other distros using glibc 2.28 or later, including:
Debian 10+
Ubuntu 18.10+
Fedora 29+
CentOS/RHEL 8+

This migration should enable us to migrate to latest CUDNN version, and land this PR: pytorch#137978

Pull Request resolved: pytorch#138732
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/huydhn
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td Do not run TD on this PR ciflow/binaries_wheel Trigger binary build and upload jobs for wheel on the PR ciflow/trunk Trigger trunk jobs on your pull request Merged Reverted topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RFC] PyTorch next wheel build platform: manylinux-2.28

6 participants