Skip to content

difficulty creating magma tarball when new rocm or cuda versions are deployed #151707

@jeffdaily

Description

@jeffdaily

There is a chicken/egg problem with magma tarballs. Building magma for rocm or cuda is done in the manylinux image, for example:

pytorch/manylinux2_28-builder:rocm${DESIRED_CUDA}-main

but this image is built using a Dockerfile that calls install_magma.sh (for cuda) or install_rocm_magma.sh. These scripts just fetch the tarball. magma needs the image to exist in order to build the tarball, but for the image to build properly it needs the magma tarball. It's a circular dependency.

The recent ROCm 6.4 upgrade required 3 PRs in sequence to update the magma packages. PR 1 created the new builder image but temporarily allowed the magma tarball fetch to fail with a warning. PR 2 updated the magma workflows to add the new ROCm version. PR 3 reverted the changes from 1 and 2 while also updating the GHA nightly wheel workflows to build rocm 6.4.

  1. [ROCm][CI/CD] create ROCm 6.4 images, part 1, skip magma tarball #151236
  2. [ROCm][CI/CD] Create ROCm6.4 magma tarball #151345
  3. [ROCm] upgrade nightly wheels to rocm6.4 #151355

cc @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd @seemethere @malfet @pytorch/pytorch-dev-infra

Metadata

Metadata

Assignees

Labels

better-engineeringRelatively self-contained tasks for better engineering contributorsmodule: ciRelated to continuous integrationmodule: rocmAMD GPU support for PytorchtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

Status

Done

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions