Draft: Add ppc64le pipelines for ML builds by nicholas-sly · Pull Request #39174 · spack/spack

nicholas-sly · 2023-07-31T20:12:26Z

No description provided.

…werpc-ml-ci

…wrong.

adamjstewart · 2023-08-01T20:34:10Z

I'd rather name everything "ppc64le" to be consistent with spack arch

scottwittenburg

Thanks for contributing some new stacks @nicholas-sly! I've added a few comments, addressing the one about the missing docker image may let your stacks get into the actual pipeline generation step on gitlab.

Also, @eugeneswalker manages the only power runners available to us, let's see what he has to say about their capacity to handle this extra workload.

I pinged @kwryankrattiger in my review as well, since he came up with the new scheme for organizing the gitlab ci configs.

Thanks again!

share/spack/gitlab/cloud_pipelines/stacks/ml-linux-power64le-cuda/spack.yaml

share/spack/gitlab/cloud_pipelines/stacks/ml-linux-power64le-cpu/spack.yaml

scottwittenburg · 2023-08-03T00:07:52Z

share/spack/gitlab/cloud_pipelines/stacks/ml-linux-power64le-cpu/spack.yaml

+    - build-job:
+        image:
+          name: ecpe4s/fedora36-runner-ppc64le:2023-01-01
+          entrypoint: ['']


I think one of these build-job sections should be removed. @kwryankrattiger can correct me when I get this backwards, but I think these are applied bottom up, so your final job configuration will just have the first one. In which case, you can remove the second.

So, I assume you mean only have one of the ubuntu or fedora jobs. I'm actually intentionally targeting both as many of these packages, if they even support linux, are built for Ubuntu. If users are installing these packages locally on a linux system, it is very likely Ubuntu. But most of the HPC systems that I've interacted with are either SLES or RHEL-based. As such, I think it is useful to ensure that these packages are tested against both OSes as best we can.

If I'm just specifying it incorrectly to achieve that end, then I'm happy to change it.

I think testing both ubuntu and rhel is overkill right now. We don't even do that for x86_64. This is something worth adding someday but idk if we have the CI bandwidth to do both yet.

I've already experienced builds that succeeded with ubuntu but not with rhel8. I'm leaving it in because tests that pass on ubuntu do us no good when the majority of HPC systems are on a rhel-based OS. It really comes down to what our goal is with the CI system. If we want to ensure that changes to these packages won't have a significant impact to most of our users that try and build these recipes on most of the systems they might use, then we need to ensure that our CI is representative of those systems and users. If we just want a build that works even if it doesn't represent anything an end user might use, we can probably just ignore ppc64le altogether. As for CI bandwidth, most changes in Spack shouldn't even trigger this. The point (as I see it) is to test for any changes that will impact these builds. Once these pipelines are integrated, they shouldn't be triggered often, unless they need to be triggered. Not sure how we judge this bandwidth or if such limiting factors are publicly available, but I think this is necessary for now. Obviously, once all of these systems have been decommissioned, we can remove these pipelines. Until then, these pipelines are a useful tool for trying to get these ML packages working on the machines we want them to build on.

share/spack/gitlab/cloud_pipelines/stacks/ml-linux-power64le-cuda/spack.yaml

share/spack/gitlab/cloud_pipelines/.gitlab-ci.yml

…lib ci config.

nicholas-sly · 2023-09-10T03:53:29Z

@scottwittenburg @adamjstewart It seems the tests are all passing now. Let me know if you have any further changes to request. Github says there's one unaddressed request, but everything I'm seeing is either outdated or addressed. Thanks.

adamjstewart · 2023-09-11T21:14:23Z

What's the reason for no ROCm?

nicholas-sly · 2023-09-11T22:10:51Z

What's the reason for no ROCm?

Are there any machines that have ppc64le processors and AMD GPUs? I'm not aware of any and I'm under the impression that IBM isn't going to be making new machines. If that's not the case, we can try.

adamjstewart · 2023-09-11T22:11:39Z

I have no idea

adamjstewart

We can figure out the Bazel stuff another day

scottwittenburg

Every pipeline needs to write into it's own mirror. Concurrent pipelines writing to the same destination creates race conditions which result in weird errors.

Currently the solution is to have a spack.yaml for every pair of generate/build jobs. But once we have #39939, that won't be necessary, as the spack.yaml will no longer be where we specify the mirrors for spack pipelines.

scottwittenburg · 2023-09-20T16:27:25Z

share/spack/gitlab/cloud_pipelines/stacks/ml-linux-ppc64le-cpu/spack.yaml

+  ci:
+    pipeline-gen:
+    - build-job:
+        image:
+          name: ecpe4s/ubuntu20.04-runner-ppc64le:2023-01-01
+          entrypoint: ['']
+    - build-job:
+        image:
+          name: ecpe4s/rhel7-runner-ppc64le:2023-01-01
+          entrypoint: ['']


Did you mean to have some submapping here, rather than just build-job? If this was working, I'm not sure how. Given you were trying to have two spack.yaml for the four pipelines, you may have been able to do something like this instead:

Example:

... - match_behavior: first submapping: - match: - os=ubuntu20.04 build-job: image: name: ecpe4s/ubuntu20.04-runner-ppc64le:2023-01-01 entrypoint: ['']

But that's irrelevant if you change to one stack per pipeline. If you wait until #39939 is merged, then I think you could stick with only two spack.yaml for your four pipelines, and the above example would be how you map build jobs to docker image.

cc @kwryankrattiger

Not sure what the issue is here. I want each stack to build on both ubuntu and rhel8. In yaml, the pipeline-gen key takes a list, so I just put two elements in that list. You can see from the CI build that this properly broke out the pipeline into separate jobs, each utilizing the appropriate container.

I am planning to wait for #39939 to be merged to avoid breaking this into multiple files only to be able to deduplicate after that PR goes through.

It looks to me like it didn't break them out properly. This job was supposed to be a rhel8, but ran on the ubuntu container.

Interesting indeed. The container does seem to indicate ubuntu for the rhel job. But then the uname -a indicates a rhel OS. Presumably that's the host, so fair enough. The spack arch output seems to corroborate the ubuntu OS. But then the gnuconfig dependency installation along with the libiconv installation directories indicate a rhel OS. I guess that's due to the generate builds occurring with the correct container. I just pushed a commit that tries to be more explicit with which container image should be used for the CI jobs.

For reasons I cannot fathom, a CI jobs ending in -build will not accept an image key in the .gitlab-ci.yaml file. Likewise, it can "extend" another job that does take an image key, but does not honor that same image when building. I can try modifying the spack.yaml files according to your above example, but without proper documentation, I'll be doing a good bit of guess and check with CI parsing to try and get it right.

Forgot about this comment, but want to resurface this:

I think testing both ubuntu and rhel is overkill right now. We don't even do that for x86_64. This is something worth adding someday but idk if we have the CI bandwidth to do both yet.

I stand by my response to that comment: #39174 (comment)

The latest pipeline run https://gitlab.spack.io/spack/spack/-/pipelines/500642 is an example of exactly this happening.

FYI, Spack is slowly moving in the direction of modeling glibc such that packages will no longer require any system dependencies. Once this is done, Ubuntu and RHEL will be the same system, and there should be no real reason to test both.

While I appreciate your optimism in this respect, the demonstrable differences between the two OSes means I'm going to have to see the two pipelines succeed side-by-side with a significant portion of the ML packages we're interested in before I'm willing to down sample to a single OS.

…ation.

…em right either.

…-ml-ci

…werpc-ml-ci

adamjstewart · 2024-12-21T00:10:24Z

Based on Slack discussion, it seems like ppc64le is on its way out and our CI runners are limited anyway. Let's scrap this unless IBM decides to contribute to this effort.

Add power64le pipelines for ML builds.

c46f002

spackbot-app bot added core PR affects Spack core functionality gitlab Issues related to gitlab integration labels Jul 31, 2023

nicholas-sly self-assigned this Jul 31, 2023

Merge branch 'develop' into powerpc-ml-ci

f5043bf

nicholas-sly mentioned this pull request Jul 31, 2023

Add CI stacks for ML suite on ppc64le. #33296

Closed

adamjstewart self-assigned this Jul 31, 2023

nicholas-sly added 7 commits August 1, 2023 11:06

Add power ml stacks to .gitlab-ci.yaml.

f07137d

Merge branch 'powerpc-ml-ci' of github.com:nicholas-sly/spack into po…

f996bd9

…werpc-ml-ci

Remove, hopefully, superfluous tags.

85a725c

Fix power ubuntu image names. Maybe fix fedora names too.

f1d399d

Add tags to use the appropriate build machine. Fedora image is still …

c4fa8b2

…wrong.

Guess and check with Fedora image/tag name.

2ba094e

Relocate tags more effectively.

3540552

adamjstewart requested review from eugeneswalker and scottwittenburg August 1, 2023 20:32

Update ML Power CI package requirements.

9558549

spackbot-app bot added python update-package labels Aug 2, 2023

nicholas-sly added 4 commits August 2, 2023 11:24

CI Style.

22903e8

Target proper architecture.

3e166b3

Whitespace.

b8c7a80

Power+cuda ML pipeline architecture fix.

8c544c3

scottwittenburg requested changes Aug 3, 2023

View reviewed changes

nicholas-sly added 4 commits August 3, 2023 09:40

Update cache mirror for CI.

9f106dd

Switch from fedora to rhel8 image. Improve ppc64le consistency in git…

2d71897

…lib ci config.

Update directory names for consistency.

a67da65

Point rhel ppc64le stacks at correct image.

f5871c3

nicholas-sly marked this pull request as draft August 16, 2023 21:46

Merge branch 'develop' into powerpc-ml-ci

b1950f8

Merge branch 'develop' into powerpc-ml-ci

50d9fa1

nicholas-sly requested a review from scottwittenburg September 13, 2023 21:06

Merge branch 'develop' into powerpc-ml-ci

d764719

adamjstewart previously approved these changes Sep 17, 2023

View reviewed changes

scottwittenburg requested changes Sep 18, 2023

View reviewed changes

scottwittenburg reviewed Sep 20, 2023

View reviewed changes

Trying to be more explicit with CI build job container image specific…

c03ee25

…ation.

nicholas-sly dismissed adamjstewart’s stale review via c03ee25 September 20, 2023 17:34

nicholas-sly and others added 12 commits September 20, 2023 11:08

Try new multimapped CI jobs for different OS of a single stack.

f0a6b8e

Update ppc64le CI stack file formatting for multiple container images.

5413ec6

Fix misplaced configs to resolve conflict.

6bb3bb7

Attempting to utilize match_behavior: merge, because first doesn't se…

9552d0f

…em right either.

Merge branch 'develop' of https://github.com/spack/spack into powerpc…

62f59a9

…-ml-ci

Merge branch 'develop' into powerpc-ml-ci

6e6325b

Merge branch 'develop' into powerpc-ml-ci

a0aa68d

Merge branch 'powerpc-ml-ci' of github.com:nicholas-sly/spack into po…

0a0b957

…werpc-ml-ci

Add ffmpeg external to ppc64le stacks.

a13a42f

Add FFMPEG_ROOT environment variable to py-torchaudio.

e9a6221

Merge branch 'develop' into powerpc-ml-ci

ed4bf70

Merge branch 'develop' into powerpc-ml-ci

8211d16

adamjstewart mentioned this pull request Feb 15, 2024

Installation issue: [email protected] fails to build on Power PC #42669

Closed

4 tasks

adamjstewart mentioned this pull request Sep 18, 2024

CI and public binaries for ML libraries #31551

Open

6 tasks

adamjstewart mentioned this pull request Nov 5, 2024

ML CI: add ppc64le pipelines #47441

Closed

adamjstewart closed this Dec 21, 2024

Conversation

nicholas-sly commented Jul 31, 2023

Uh oh!

adamjstewart commented Aug 1, 2023

Uh oh!

scottwittenburg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

nicholas-sly commented Sep 10, 2023

Uh oh!

adamjstewart commented Sep 11, 2023

Uh oh!

nicholas-sly commented Sep 11, 2023

Uh oh!

adamjstewart commented Sep 11, 2023

Uh oh!

adamjstewart left a comment

Choose a reason for hiding this comment

Uh oh!

scottwittenburg left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adamjstewart commented Dec 21, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

scottwittenburg left a comment •

edited

Loading