Add aws-plcuster[-aarch64] stacks by stephenmsachs · Pull Request #37627 · spack/spack

stephenmsachs · 2023-05-12T08:42:53Z

These stacks build packages defined in
https://github.com/spack/spack-configs/tree/main/AWS/parallelcluster

They use a custom container from https://github.com/spack/gitlab-runners which
includes necessary ParallelCluster software to link and build as well as an
upstream spack installation with current GCC and dependencies.

Intel and ARM software is installed and used during the build stage but removed
from the buildcache before the signing stage.

Files configs/linux/{arch}/ci.yaml select the necessary providers in order to
build for specific architectures (icelake, skylake, neoverse_{n,v}1).

These stacks build packages defined in https://github.com/spack/spack-configs/tree/main/AWS/parallelcluster They use a custom container from https://github.com/spack/gitlab-runners which includes necessary ParallelCluster software to link and build as well as an upstream spack installation with current GCC and dependencies. Intel and ARM software is installed and used during the build stage but removed from the buildcache before the signing stage. Files `configs/linux/{arch}/ci.yaml` select the necessary providers in order to build for specific architectures (icelake, skylake, neoverse_{n,v}1).

stephenmsachs · 2023-05-12T12:50:19Z

share/spack/gitlab/cloud_pipelines/stacks/aws-pcluster/spack.yaml

+        - - /bin/bash "${SPACK_ARTIFACTS_ROOT}/postinstall.sh" -fg
+          - spack config --scope site add "packages:all:target:\"target=${SPACK_TARGET_ARCH}\""
+    - signing-job:
+        before_script:


I have checked that the rebuild-index job will suceed after deleting a few packages in the signing job. But wil there be consequences when downloading?

- `libunistring` cflags for intel are now supported in https://github.com/spack/spack-configs/tree/main/AWS/parallelcluster - `gromacs` and `hdf5` for `wrf` currently cannot link in container.

Removing packages to investigate offline

Same 'libimf.so not found' error in cmake build of package `git`.

scottwittenburg

Looking at this, I'm noticing that you've created 4 new pipelines, associated with only 2 new stacks. Can it be re-organized to add one pipeline (generate/build pair) per stack? We try to avoid allowing multiple jobs to write binaries for the same hash in parallel, as the race conditions that can result often cause checksum mismatches between the generated binaries and associated meta-data files.

But let me know if I've just missed something.

share/spack/gitlab/cloud_pipelines/.gitlab-ci.yml

@scottwittenburg

As suggested by @scottwittenburg .

stephenmsachs · 2023-05-17T06:53:47Z

Looking at this, I'm noticing that you've created 4 new pipelines, associated with only 2 new stacks. Can it be re-organized to add one pipeline (generate/build pair) per stack? We try to avoid allowing multiple jobs to write binaries for the same hash in parallel, as the race conditions that can result often cause checksum mismatches between the generated binaries and associated meta-data files.

But let me know if I've just missed something.

Thnaks for the review. I made one stack per pipeline per your suggestions. I guess there would be potential for race condistions otherwise as some of the build targets overlap.

stephenmsachs · 2023-05-17T10:07:45Z

@spackbot re-run pipeline

spackbot-app · 2023-05-17T10:07:54Z

I've started that pipeline for you!

stephenmsachs · 2023-05-17T11:09:51Z

share/spack/gitlab/cloud_pipelines/stacks/aws-pcluster-neoverse_n1/spack.yaml

+  - $optimized_libs
+
+
+  mirrors: { "mirror": "s3://spack-binaries/develop/aws-pcluster-aarch64" }


I will change the mirror names to match the pipelines in another commit after the pipelines ran though.

kwryankrattiger

Some comments. Due to the deadline maybe some of these are a follow-on

kwryankrattiger · 2023-05-17T14:46:26Z

share/spack/gitlab/cloud_pipelines/configs/linux/icelake/ci.yaml

+      - - curl -LfsS "https://github.com/JuliaBinaryWrappers/GNUMake_jll.jl/releases/download/GNUMake-v4.3.0+1/GNUMake.v4.3.0.x86_64-linux-gnu.tar.gz" -o gmake.tar.gz
+        - printf "fef1f59e56d2d11e6d700ba22d3444b6e583c663d6883fd0a4f63ab8bd280f0f gmake.tar.gz" | sha256sum --check --strict --quiet
+        - tar -xzf gmake.tar.gz -C /usr bin/make 2> /dev/null
+      tags: ["x86_64_v4"]


It seems to me like icelake and skylake could be reduced to x86_64_v4.

The stack spack.yamls can specifiy the SPACK_TARGET_ARCH under it's own variables.

You are right. I did not have a separate stack spack.yaml when I made these files.

kwryankrattiger · 2023-05-17T14:49:02Z

share/spack/gitlab/cloud_pipelines/stacks/aws-pcluster-icelake/spack.yaml

+    - gettext
+
+  - compiler_target:
+    - '%[email protected] target=x86_64_v3'


This doesn't seem consistent with the icelake configs above. Am I missing something?

The architecture used is defined by the cluster head node. But clusters may have different architecture compute nodes, so I want the compilers to be as broad as possible. If I made this _v4 and added a zen2 compute node, I would not be able to use the compiler for the compute node.

if that is a concern, then I don't understand why this stack is specifically building for icelake..

kwryankrattiger · 2023-05-17T15:06:24Z

share/spack/gitlab/cloud_pipelines/stacks/aws-pcluster-icelake/spack.yaml

+
+  ci:
+    pipeline-gen:
+    - build-job:


It seems like this part could be moved into share/spack/gitlab/cloud_pipelines/configs/pcluster/ci.yaml and included or added to the generate jobs configs.

Here https://github.com/spack/spack/pull/37627/files#diff-70f9e236c9b2eff97832cfe086f0ba52c7d7fad02d03bf0c04189bc8dd95b1f0R107

Yes, I can do that.

kwryankrattiger · 2023-05-17T15:24:31Z

share/spack/gitlab/cloud_pipelines/stacks/aws-pcluster-neoverse_n1/spack.yaml

+  view: false
+
+  definitions:
+  - compiler_specs:


Since this stuff is identical between the stacks, but may be worth while to merge the aarch64 stacks and the x86_64_v4 and use a matrix spec and submappings to assign the runner tags. appropriately.

ci: pipeline-gen: - match_behavior: first submapping - match: - target=neoverse_n1 build_job: tags: ["graviton2"] - match: - target=neoverse_v1 build_job: tags: ["graviton3"]

I am not sure if both neoverse_n1 and neoverse_v1 can be concretized on gravitron3 runners, but I am pretty sure icelake and skylake can be so I would think the same backwards compatibility might apply to ARM architectures.

Let me take a look at this after the deadline. It seems I need to learn some more syntax.

scottwittenburg · 2023-05-17T16:49:27Z

I wonder what's going on in this early-stage pkgconf job. It seems to find no dependencies in the buildcache and have to install them all from source.

scottwittenburg · 2023-05-17T16:52:31Z

share/spack/gitlab/cloud_pipelines/stacks/aws-pcluster-icelake/spack.yaml

+        before_script:
+          # Do not distribute Intel & ARM binaries
+          - - for i in $(aws s3 ls --recursive ${SPACK_REMOTE_MIRROR_OVERRIDE}/build_cache/ | grep intel-oneapi | awk '{print $4}' | sed -e 's?^.*build_cache/??g'); do aws s3 rm ${SPACK_REMOTE_MIRROR_OVERRIDE}/build_cache/$i; done
+            - for i in $(aws s3 ls --recursive ${SPACK_REMOTE_MIRROR_OVERRIDE}/build_cache/ | grep armpl | awk '{print $4}' | sed -e 's?^.*build_cache/??g'); do aws s3 rm ${SPACK_REMOTE_MIRROR_OVERRIDE}/build_cache/$i; done


Do I understand this correctly? You are deleting from the mirror some (possibly large) subset of the binaries just built, in order to avoid re-distribution? I thought it was going to be ok to leave these in the stack-specific mirrors, as long as we don't included them at the root. The way it is here, isn't it going to force every matching spec to be rebuilt from source on every pipeline?

No, I am only deleting software that's distributed as binaries from Intel/ARM anyway. This is only oneapi and armPL software. On the one hand we are on the save side for re-distribution and on the hand the installation "from source" is downloading pre-built binaries anyway.

What I should have said here, but I just didn't think of at the time, is that some time ago we made it an error (see here) for dependencies in a pipeline rebuild job to be installed from source. The definition of "from source" there doesn't consider that it might be downloading pre-built binaries, it's either from a spack buildcache or it's "from source".

scottwittenburg · 2023-05-17T16:54:44Z

I'm also curious if you know whether this is something to worry about: https://gitlab.spack.io/spack/spack/-/jobs/7020733#L67

stephenmsachs · 2023-05-17T17:43:06Z

I wonder what's going on in this early-stage pkgconf job. It seems to find no dependencies in the buildcache and have to install them all from source.

I am afraid it's still using the "old" container which built [email protected] target=neoverse_n1 by mistake. If you pull the container now it has [email protected] target=aarch64 as is required in the job.

stephenmsachs · 2023-05-17T17:46:22Z

I'm also curious if you know whether this is something to worry about: https://gitlab.spack.io/spack/spack/-/jobs/7020733#L67

Same issue with the "old" container with wrong gcc target. I will re-run the pipeline after it completed. Not sure how long the containers are cached, but I pushed the "new" one 8h ago. I hope this is sufficient now.

stephenmsachs · 2023-05-17T17:52:33Z

Looking at this, I'm noticing that you've created 4 new pipelines, associated with only 2 new stacks. Can it be re-organized to add one pipeline (generate/build pair) per stack? We try to avoid allowing multiple jobs to write binaries for the same hash in parallel, as the race conditions that can result often cause checksum mismatches between the generated binaries and associated meta-data files.

But let me know if I've just missed something.

Not sure how this works in Github but it seems the request for these changes is blocking the PR. @scottwittenburg I already pushed 1 pipelin per stack. Do I need to do anything else to get it resolved?

scottwittenburg · 2023-05-17T17:53:53Z

I will re-run the pipeline after it completed.

I know you ran into a problem with spackbot causing duplicated pipelines, but normally if you push your PR branch while a pipeline is running, gitlab is supposed to notice and cancel the previous one running on the same ref. Usually it's fairly reliable, but as @alalazo pointed out in slack, there are some gitlab bugs filed around it. I'd go ahead and push your branch if you think a bunch of specs are going to change hashes and have to rebuild anyway. But I'll leave the decision up to you.

scottwittenburg · 2023-05-17T18:10:37Z

I already pushed 1 pipelin per stack. Do I need to do anything else to get it resolved?

Sorry I wasn't more clear about it, but I was hoping you would end up with two pipelines for two stacks, not add two more pipelines, but I get that this was easier to accomplish. The way I think it should be done (but let me know if you agree @kwryankrattiger), is the specs which can be concretized/built on the same image, should be combined into a single stack, and then mapping rules defined in yaml configs should take care of making sure the correct arch, etc, is used for each.

But I understand it's not so clear how to achieve that, and you're under a tight deadline at this point. So maybe you can just file an issue to clean up these new stacks/pipelines and re-examine what's going on here when time is not so tight. Maybe in the issue, you can link to this comment for details.

Once everything builds, I'll be ok with merging this and revisiting it in a subsequent PR.

share/spack/gitlab/cloud_pipelines/stacks/aws-pcluster-neoverse_n1/spack.yaml

stephenmsachs · 2023-05-17T21:54:45Z

@scottwittenburg can we merge? I opened an issue to address merging the stacks.

Add aws-plcuster[-aarch64] stacks. These stacks build packages defined in https://github.com/spack/spack-configs/tree/main/AWS/parallelcluster They use a custom container from https://github.com/spack/gitlab-runners which includes necessary ParallelCluster software to link and build as well as an upstream spack installation with current GCC and dependencies. Intel and ARM software is installed and used during the build stage but removed from the buildcache before the signing stage. Files `configs/linux/{arch}/ci.yaml` select the necessary providers in order to build for specific architectures (icelake, skylake, neoverse_{n,v}1).

Stephen Sachs added 2 commits May 12, 2023 10:23

Cleanup from local testing

8ce56e6

spackbot-app bot added core PR affects Spack core functionality gitlab Issues related to gitlab integration labels May 12, 2023

stephenmsachs commented May 12, 2023

View reviewed changes

Stephen Sachs added 10 commits May 13, 2023 00:22

Update spack-runners container name

25d2342

Resolve merge conflict

3c39178

Merge branch 'develop' into aws-pcluster-stacks

d3a7535

Update .gitlab-ci to c08be95 syntax

597539c

Remove TODOs and update package list

89811f6

Merge branch 'develop' into aws-pcluster-stacks

9364352

Updating to latest container with fixed gcc

4029e3b

pkg-config in cmake cannot find libimf.so in container

c8d39c3

Removing packages to investigate offline

Skip git build

d66d1e4

Same 'libimf.so not found' error in cmake build of package `git`.

scottwittenburg requested changes May 16, 2023

View reviewed changes

scottwittenburg requested a review from kwryankrattiger May 16, 2023 22:02

scottwittenburg reviewed May 16, 2023

View reviewed changes

share/spack/gitlab/cloud_pipelines/.gitlab-ci.yml Outdated Show resolved Hide resolved

scottwittenburg reviewed May 16, 2023

View reviewed changes

share/spack/gitlab/cloud_pipelines/.gitlab-ci.yml Outdated Show resolved Hide resolved

One stack per pipeline to avoid race conditions

46ad34e

As suggested by @scottwittenburg .

Fix copy/paste mistake

67e27e8

alalazo mentioned this pull request May 17, 2023

Re-run pipelines leaves multiple concurrent pipelines for the same PR spack/spackbot#85

Open

stephenmsachs commented May 17, 2023

View reviewed changes

One mirror per pipeline

4ed5a91

kwryankrattiger reviewed May 17, 2023

View reviewed changes

scottwittenburg reviewed May 17, 2023

View reviewed changes

scottwittenburg self-requested a review May 17, 2023 17:57

stephenmsachs mentioned this pull request May 17, 2023

Merge aws-pcluster pipelines & stacks per generic architecture #37746

Closed

1 task

kwryankrattiger reviewed May 17, 2023

View reviewed changes

share/spack/gitlab/cloud_pipelines/stacks/aws-pcluster-neoverse_n1/spack.yaml Show resolved Hide resolved

scottwittenburg approved these changes May 17, 2023

View reviewed changes

scottwittenburg merged commit 125c20b into spack:develop May 17, 2023

greenc-FNAL mentioned this pull request May 18, 2023

Merge from upstream/develop 2023-05-18 FNALssi/spack#140

Merged

scottwittenburg mentioned this pull request May 25, 2023

E4S Cray CI Stack #37837

Merged

		- $optimized_libs


		mirrors: { "mirror": "s3://spack-binaries/develop/aws-pcluster-aarch64" }

Conversation

stephenmsachs commented May 12, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scottwittenburg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

stephenmsachs commented May 17, 2023

Uh oh!

stephenmsachs commented May 17, 2023

Uh oh!

spackbot-app bot commented May 17, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kwryankrattiger left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scottwittenburg commented May 17, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scottwittenburg commented May 17, 2023

Uh oh!

stephenmsachs commented May 17, 2023

Uh oh!

stephenmsachs commented May 17, 2023

Uh oh!

stephenmsachs commented May 17, 2023

Uh oh!

scottwittenburg commented May 17, 2023

Uh oh!

scottwittenburg commented May 17, 2023

Uh oh!

Uh oh!

stephenmsachs commented May 17, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants