Skip to content

Conversation

@afrittoli
Copy link
Collaborator

@afrittoli afrittoli commented Nov 6, 2024

This is a first step towards removing builds dependency to conda.

Currently we build magma as a conda package in a pytorch conda channel, implemented in https://github.com/pytorch/builder/tree/a1b372dbda2e9e3bd946cf135aa3b0137dfdf052/magma.

This commit adapts the logic from pytorch/builder as follows:

  • use pytorch/manylinux-cuda as base image
  • apply patches and invoke the build.sh script directly (not anymore through conda build)
  • stores license and build files along with the built artifact, in an info subfolder
  • create a tarball file which resembles that created by conda, without any conda-specific metadata

A new matrix workflow is added, which runs the build for each supported cuda version, and uploads the binaries to pyorch s3 bucket.

For the upload, define an upload.sh script, which will be used by the magma windows job as well, to upload to s3://ossci-* buckets.

The build runs on PR and push, upload runs in DRY_RUN mode in case of PR.

Fixes #139397

@afrittoli afrittoli requested a review from a team as a code owner November 6, 2024 15:07
@pytorch-bot
Copy link

pytorch-bot bot commented Nov 6, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139888

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e2987ff with merge base 3abbde9 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: releng release notes category label Nov 6, 2024
@afrittoli afrittoli force-pushed the magma_build_no_conda branch from 4549d3f to 8518b99 Compare November 6, 2024 15:12
@afrittoli afrittoli force-pushed the magma_build_no_conda branch 3 times, most recently from a8f0c87 to c1c4fd6 Compare November 6, 2024 15:44
Copy link
Contributor

@atalman atalman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Please create another PR removing these magma builds from builder repo.
Also would make sence to move the windows magma builds workflows here as well.

@afrittoli
Copy link
Collaborator Author

This looks great! Please create another PR removing these magma builds from builder repo. Also would make sence to move the windows magma builds workflows here as well.

Thanks - I need to check what the Windows will look like without conda, once I figure it out I will make another PR like this one. I don't have a windows environment handy to test the Windows builds though, so there may be a bit more of trial and error there.

@afrittoli
Copy link
Collaborator Author

@atalman would it make sense to configure the linux-focal-py* jobs to skip for PRs that only touch the magma folder and/or this workflow?

@afrittoli afrittoli force-pushed the magma_build_no_conda branch from c1c4fd6 to 76e070e Compare November 6, 2024 16:42
@afrittoli
Copy link
Collaborator Author

This looks great! Please create another PR removing these magma builds from builder repo. Also would make sence to move the windows magma builds workflows here as well.

Thanks - I need to check what the Windows will look like without conda, once I figure it out I will make another PR like this one. I don't have a windows environment handy to test the Windows builds though, so there may be a bit more of trial and error there.

It looks like the windows path does not use conda, so it should be only a matter of copying the files over and fixing whatever linter rules break in the new repo.
I noticed that in the windows workflow we push to a dedicated bucket, not the pytorch one. Shall I do the same here?

          OSSCI_WINDOWS_S3: s3://ossci-windows/

https://github.com/pytorch/builder/blob/a1b372dbda2e9e3bd946cf135aa3b0137dfdf052/.github/workflows/build-magma-windows.yml#L46-L55

@atalman
Copy link
Contributor

atalman commented Nov 6, 2024

@atalman would it make sense to configure the linux-focal-py* jobs to skip for PRs that only touch the magma folder and/or this workflow?

I think so magma builds are pretty much standalone. They are used in Docker builds after being published.

@afrittoli afrittoli force-pushed the magma_build_no_conda branch from 76e070e to 1a7d9c4 Compare November 6, 2024 18:17
Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, will edit PR description to specify specific commit those magma scripts are picked from

afrittoli added a commit to afrittoli/pytorch that referenced this pull request Nov 6, 2024
Add a version of the manylinux 2.28 image with cuda 12.6.

Once this is done, cuda 12.6 can be enable for the new
magma non-conda distribution provided by pytorch#139888

Signed-off-by: Andrea Frittoli <[email protected]>
@afrittoli afrittoli force-pushed the magma_build_no_conda branch 2 times, most recently from 813278e to 11b6f8e Compare November 6, 2024 20:49
@afrittoli afrittoli force-pushed the magma_build_no_conda branch from 11b6f8e to 9376f96 Compare November 6, 2024 21:47
@afrittoli afrittoli force-pushed the magma_build_no_conda branch 4 times, most recently from da548c0 to f7e02ad Compare November 7, 2024 18:07
pytorchmergebot pushed a commit that referenced this pull request Nov 8, 2024
Add a version of the manylinux 2.28 image with cuda 12.6.

Once this is done, cuda 12.6 can be enable for the new magma non-conda distribution provided by #139888

Partially-fixes #139397

Pull Request resolved: #139909
Approved by: https://github.com/atalman
@atalman
Copy link
Contributor

atalman commented Nov 8, 2024

@pytorchbot merge -f "signal is green"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

afrittoli added a commit to afrittoli/pytorch that referenced this pull request Nov 8, 2024
Copy the magma for windows job and script from pytorch/builder
https://github.com/pytorch/builder/blob/c9aac65e12734a16d9c285c366bcbf6cc0c67e43/.github/workflows/build-magma-windows.yml

The linux version is moved here in pytorch#139888

Use the upload_aws_ossci.sh upload script used by the linux
magma builds as well.

Signed-off-by: Andrea Frittoli <[email protected]>
frances720 pushed a commit to Promptless/pytorch-test that referenced this pull request Nov 8, 2024
Add a version of the manylinux 2.28 image with cuda 12.6.

Once this is done, cuda 12.6 can be enable for the new magma non-conda distribution provided by pytorch#139888

Partially-fixes pytorch#139397

Pull Request resolved: pytorch#139909
Approved by: https://github.com/atalman
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
Add a version of the manylinux 2.28 image with cuda 12.6.

Once this is done, cuda 12.6 can be enable for the new magma non-conda distribution provided by pytorch#139888

Partially-fixes pytorch#139397

Pull Request resolved: pytorch#139909
Approved by: https://github.com/atalman
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
This is a first step towards removing builds dependency to conda.

Currently we build magma as a conda package in a pytorch conda channel, implemented in https://github.com/pytorch/builder/tree/a1b372dbda2e9e3bd946cf135aa3b0137dfdf052/magma.

This commit adapts the logic from pytorch/builder as follows:
- use pytorch/manylinux-cuda<cuda-version> as base image
- apply patches and invoke the build.sh script directly (not anymore through conda build)
- stores license and build files along with the built artifact, in an info subfolder
- create a tarball file which resembles that created by conda, without any conda-specific metadata

A new matrix workflow is added, which runs the build for each supported cuda version, and uploads the binaries to pyorch s3 bucket.

For the upload, define an upload.sh script, which will be used by the magma windows job as well, to upload to `s3://ossci-*` buckets.

The build runs on PR and push, upload runs in DRY_RUN mode in case of PR.

Fixes pytorch#139397

Pull Request resolved: pytorch#139888
Approved by: https://github.com/atalman, https://github.com/malfet, https://github.com/seemethere
pytorchmergebot pushed a commit that referenced this pull request Apr 29, 2025
Resolves #151707 for CUDA Nvidia Magma builds.
Removes deprecated cuda 12.4 build.

Using `pytorch/manylinux2_28-builder` image for magma build creates circular dependency.

For a while for magma builds we used `conda-builder` image since it does not have circular dependency:
https://github.com/pytorch/builder/blob/release/2.4/magma/Makefile#L13
However during migration to pytorch/pytorch: #139888 we introduced circular dependency using Manylinux 2.28 docker image.

Hence using almalinux image which suppose to be general usage image

Please note: Magma builds using Docker build : https://github.com/pytorch/pytorch/blob/main/.ci/magma/README.md we can look into migrating them to Docker images if required as a followup BE change if needed

TODO: Make same change for rocm builds. I believe some more work for rocm is required, since maga-rocm is requires rocm dev, utils and lib to be installed : https://github.com/pytorch/pytorch/blob/main/.ci/docker/common/install_rocm.sh

Pull Request resolved: #152358
Approved by: https://github.com/nWEIdia, https://github.com/huydhn
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Build a non-conda package for magma-cuda

6 participants