Rethink release workflow in terms of environments#11612
Rethink release workflow in terms of environments#11612tgamblin merged 6 commits intospack:developfrom
Conversation
|
@reviewers: I'd like to take a lot of the PR description here and start working it into the documentation, as we decide this is getting close to the direction we want to go. |
90e137e to
c9e8124
Compare
opadron
left a comment
There was a problem hiding this comment.
Looks great! Added a few notes.
.gitlab-ci.yml
Outdated
| - "./bin/generate-gitlab-ci-yml.sh" | ||
| tags: | ||
| - "spack-k8s" | ||
| image: "scottwittenburg/spack_builder_ubuntu_18.04" |
There was a problem hiding this comment.
At some point, this should be update to spack/ci:ubuntu-18.04. See #11621.
There was a problem hiding this comment.
Yep, makes sense. I'll take a look at that some time today and provide some feedback as well. If we can get that merged and the images pushed to Dockerhub, I could fix this and have one less thing to wait around for.
bin/generate-gitlab-ci-yml.sh
Outdated
| else | ||
| echo "Cloning environment repo: ${SPACK_RELEASE_ENVIRONMENT_REPO}" >&2 | ||
| mkdir /work | ||
| cd /work |
There was a problem hiding this comment.
This is fine for our CI images, but will likely be a problem for sites running directly on their systems. Better would be to rely on mktemp -d and adding a trap line to ensure that the temp dir is cleaned up before the script exits.
Of course, you'll need the Python equivalents to the above (tempfile and atexit modules) if/once you move this over to a Python implementation.
There was a problem hiding this comment.
This is fine for our CI images, but will likely be a problem for sites running directly on their systems. Better would be to rely on
mktemp -dand adding atrapline to ensure that the temp dir is cleaned up before the script exits.
Good suggestion, thanks. I'll do this on the next pass.
Of course, you'll need the Python equivalents to the above (
tempfileandatexitmodules) if/once you move this over to a Python implementation.
Until you mentioned it here, I hadn't imagined we'd make this a python implementation. I was considering that for the rebuild-package.sh script instead. In that case, there's a lot of work to do, and it's mostly spack commands, so it makes some sense to me do make the whole thing a spack command. Do you think there's a good argument to be made for doing that here as well?
There was a problem hiding this comment.
Not particularly, no. Feel free to ignore my point if that doesn't turn out to be a direction you decide to take this work.
There was a problem hiding this comment.
This is fine for our CI images, but will likely be a problem for sites running directly on their systems. Better would be to rely on
mktemp -dand adding atrapline to ensure that the temp dir is cleaned up before the script exits.
@opadron Do you know what signals I should provide to trap to make sure the cleanup always happens, no matter how the script exits?
There was a problem hiding this comment.
trap 'rm -rf "$tempdir"' INT TERM QUIT EXIT
bin/generate-gitlab-ci-yml.sh
Outdated
|
|
||
| cd $env_dir | ||
|
|
||
| token_file=/tmp/cdash_auth_token |
There was a problem hiding this comment.
Put this in your generated temp dir to avoid name collisions.
There was a problem hiding this comment.
Yes, good suggestion, will do.
bin/generate-gitlab-ci-yml.sh
Outdated
| echo "git commit" | ||
| git commit -m "$commit_msg" | ||
| echo "git commit-tree/reset" | ||
| git reset "$( git commit-tree HEAD^{tree} -m ${commit_msg} )" |
There was a problem hiding this comment.
Might want to add a comment explaining what's going on with this line.
There was a problem hiding this comment.
At one point @tgamblin also suggested just pushing the single new commit as well. Perhaps it's worth having a slightly more involved discussion about how we want to handle this?
Otherwise, yes, a comment will be nice here 😄 will do.
There was a problem hiding this comment.
It really boils down to whether the downstream repo is prepopulated with whatever is the main branch upstream. In this case, sending history should be (much) faster than sending the working copy, since most of the WC is already there. However, to maintain this benefit long term requires that the main downstream branch be continuously synchronized with the main upstream branch. Otherwise (given enough history), sending history becomes much slower.
So, the real trade off made here is that we are always sending what amounts to a tarball of the WC, but in exchange we avoid the need for any external processes to keep the main branches synchronized across the two repos. If there's an easy way to setup this synchronization (that doesn't require a paid GL license), then I'd be all for pushing the commits directly.
Maybe we could add that among the final "housekeeping" tasks we generate in CI?
bin/generate-gitlab-ci-yml.sh
Outdated
| git status | ||
| echo "git push" | ||
| git push --force "$DOWNSTREAM_CI_REPO" \ | ||
| "___multi_ci___:$current_branch" |
There was a problem hiding this comment.
-
Might want to add a prefix to the pushed branch name, maybe something like
multi-ci-$current_branch. Should make these generated branches easier to identify. -
Not sure if you've already thought of this, but at some point at the end of the generated workflow, we're going to want to come back and clean out these auto-generated branches.
There was a problem hiding this comment.
- Might want to add a prefix to the pushed branch name, maybe something like
multi-ci-$current_branch. Should make these generated branches easier to identify.
Yeah, I think that could make sense. Are these branches going to be publicly visible ultimately? If so, it would be nice for people to recognize their branch names, in which case maybe I should tack something onto the end of the name rather than the beginning. If not, it may not matter much. But I agree that having something on there that helps us identify the branches created/pushed by the workflow makes sense.
- Not sure if you've already thought of this, but at some point at the end of the generated workflow, we're going to want to come back and clean out these auto-generated branches.
It occurred to me at some point, but I haven't done anything about it yet. Perhaps we could talk about it if/when we have the discussion I mentioned above. I've been thinking it could be nice to have a history of the generated .gitlab-ci.yml files maintained somehow. But I'm not married to the idea in any way.
There was a problem hiding this comment.
Regarding point 1: My thinking falls on two points: First, pushing a branch with nothing but a hash for the name does nothing to communicate the branch's purpose. I don't expect human eyes to be looking at these branches regularly, but neither should someone poking around in these internals be scratching their heads wondering to themselves what these branches are all about?
Second, while we expect that in most use cases the downstream repo would exist exclusively for triggering the next phase of CI, I don't think we should ignore the possibility that someone would use a downstream repo that has other purposes. In this case, we would certainly expect more human eyes to be on these generated branches, but more importantly, I'd want to take every measure to avoid name collisions. Let's not assume we're the only automated process that is generating branches with hashes in their names.
There was a problem hiding this comment.
Regarding point 1: My thinking falls on two points: First, pushing a branch with nothing but a hash for the name does nothing to communicate the branch's purpose. I don't expect human eyes to be looking at these branches regularly, but neither should someone poking around in these internals be scratching their heads wondering to themselves what these branches are all about?
Just to be clear, we're not pushing a branch with nothing but a hash for the name, right? We're pushing a branch with whatever name the developer gave their own branch. Now, maybe that's not the best idea, because it's not really their branch, but their branch with the appropriate .gitlab-ci.yml added.
Second, while we expect that in most use cases the downstream repo would exist exclusively for triggering the next phase of CI, I don't think we should ignore the possibility that someone would use a downstream repo that has other purposes. In this case, we would certainly expect more human eyes to be on these generated branches, but more importantly, I'd want to take every measure to avoid name collisions. Let's not assume we're the only automated process that is generating branches with hashes in their names.
Agreed, I'll add the prefix you suggested.
| # a part of the CI workflow | ||
| tags: | ||
| - spack-k8s | ||
| image: scottwittenburg/spack_builder_ubuntu_18.04 |
| tags: | ||
| - spack-k8s | ||
| - some-other-tag | ||
| image: scottwittenburg/spack_builder_centos_7 |
|
ping @aashish24 |
6fe4682 to
c48b779
Compare
lib/spack/spack/cmd/release_jobs.py
Outdated
| if not compiler_job: | ||
| job_dependencies.append(root_spec.compiler, spec_arch, build_group) | ||
|
|
||
| job_variables = { |
There was a problem hiding this comment.
Minor request, what are thoughts on adding something like PIPELINE_ (or SPACK_RELEASE since it's already used?) as a prefix to all variables created as part of this process. Similar to how GitLab CI starts the majority of the variables it includes in a job with CI_.
There was a problem hiding this comment.
Thanks @paulbry I've been thinking something like that would make sense in order to distinguish clearly between the variables we generate automatically and the ones you may put in your environment yaml (in the runner-attributes section).
a68dca1 to
1233f63
Compare
lib/spack/spack/cmd/release_jobs.py
Outdated
| ci_mappings = yaml_root['gitlab-ci']['mappings'] | ||
|
|
||
| mirror_url = args.mirror_url | ||
| ci_cdash = yaml_root['cdash'] |
There was a problem hiding this comment.
Will currently lead to ==> Error: 'cdash' if the cdash: object is not defined in the environment.
There was a problem hiding this comment.
Glad you're trying it out @paulbry, thanks. I'm aware of this issue and it's fixed in my current working branch, which I'll bring over to this PR sometime today. The new work is mostly about bootstrapping compilers, which is optionaly and you should be able to ignore.
e34bd5f to
f89afb8
Compare
59d5a0a to
3a77378
Compare
d331a07 to
458c16f
Compare
458c16f to
71254ea
Compare
71254ea to
79bb20c
Compare
|
@tgamblin I hadn't noticed, did it need to be rebased? |
|
I caused a conflict by flake8-cleaning |
tgamblin
left a comment
There was a problem hiding this comment.
@scottwittenburg: this LGTM, and I think we can merge as-is, as I think it's more important to get these features moving with the mainline than it is to ensure that they are perfect. This is not on the critical path for most users.
I don't have any functional concerns with this at the moment, though I have some high level maintenance concerns about the code that should be addressed in follow-up PRs:
-
We need to get rid of all (or nearly all) of the shell code. Spack's a
pythonproject, and Python is easier to maintain for a lot of this logic. So I'd like to see the shell code moved to neatly abstracted commands (or sets of commands) and core Spack classes/components if needed. Right now the CI pipeline code is hard to reuse across different parts of the project, and probably more complex than it needs to be because it is inbash. -
We should move the shell scripts in
bintoshare/spack/qa, or some other non-user-facing directory. The only user facing command in the end should bespack, and we should have aspack pipelinecommand or something to house the CI logic.
All of this is really for a follow-on PRs -- like I said, the functionality here is awesome and we just need to get it out of bash and into the core of Spack.
This was addressed in the Spack Stacks PR (#11057)
|
@opadron: do you have outstanding concerns? |
|
No, I think we should go ahead and merge! For the move away from bash, I thought we might want to split up large pieces of the functionality and expose them as commands under a
etc. |
- improve mirror git repo management - minio s3 implementation needs endpoint_url - Remove image key from rebuild-index job - Remove image, rely on tags instead
79bb20c to
14e9a97
Compare
Omar approves per his comment above.
|
@scottwittenburg @opadron: the other thing I did not mention above is that this stuff needs tests. This is yet another reason that it should be moved to Python. |
This PR supplants #10771, as it provides much of the same (or similar) functionality in a cleaner way. This PR is based on #11057, as it takes advantage of some of the features added to environments there (definitions, matrix). The goals of this PR, at a high level, are:
Use of Spack environments as release descriptors
In this section, we describe the structure of the new
gitlab-ciandcdashsub-sections of the environment schema and how they can be used to describe a release. This PR contains an examplespack.yamlfile illustrating the newly proposed bits (located inlib/spack/docs/example_files), while a shorter example is shown below for reference:The
definitionssection, as well as thematrixwithin thespecssection illustrate why this PR is based on #11057, although it is possible that everything in this PR might work fine if specs were explicitly enumerated in thespack.yamlenvironment file instead.The
gitlab-cisection describes a set of gitlab runners and the conditions under which the specs described in the environment should be assigned to be built by one of the runners. Currently this section has only one sub-section,mappings, and so perhaps themappingsintermediate level could be removed. At any rate, each entry within the list ofmappingscorresponds to a known gitlab runner, where thematchsection is used in assigning a release spec to one of the runners, and therunner-attributessection is used to configure the spec/job for that particular runner.The optional
cdashsection provides information that will be used by therelease-jobscommand for reporting to CDash. All the jobs generated from this environment will belong to a "build group" within CDash that can be tracked over time. As the release progresses, this build group may have jobs added or removed. Theurl,project, andsiteare used to specify the CDash instance to which build results should be reported.Assignment of specs to runners
The
mappingssection corresponds to a list of runners, and during assignment of specs to runners, the list is traversed in order looking for matches, the first runner that matches a release spec is assigned to build it. Thematchsection within each runner mapping section is a list of specs, and if any of those specs match the release spec (thespec.satisfies()method is used), then that runner is considered a match.Configuration of specs/job for a runner
Once a runner has been chosen to build a release spec, the
runner-attributessection provides information determining details of the job in the context of the runner. Therunner-attributessection must have atagskey, which is a list containing at least one tag used to select the runner from among the runners known to a gitlab instance. ForDocker executortype runners, theimagekey is used to specify the Docker image used to build the release spec, while other types of runners thevariableskey will be useful to pass any information on to the runner which it needs to do its work (e.g. scheduler parameters, etc.).Summary of
.gitlab-ci.ymlgeneration algorithmAll specs yielded by the matrix (or all the specs in the environment) have their dependencies computed, and the entire resulting set of specs are staged together before being run through the
gitlab-ci/mappingsentries, where each staged spec is assigned a runner. Staging is the name we have given to the process of figuring out in what order the specs should be built, taking into consideration gitlab ci rules about jobs/stages. In the staging process the goal is to maximize the number of jobs in any stage of the pipeline, while ensuring that the jobs in any stage only depend on jobs in previous stages (since those jobs are guaranteed to have completed already). As a runner is determined for a job, the information in therunner-attributesis used to populate various parts of the job description that will be used by gitlab ci. Once all the jobs have been assigned a runner, the.gitlab-ci.ymlis written to disk.The short example provided above would result in the
readline,ncurses, andpkgconfpackages to be staged and built on two different runners. The runner namedspack-cloud-centos(the names have no meaning, and can be anything) will be assigned to build all three packages forcentos7, while thespack-cloud-ubunturunner will be assigned to build the same set of packages forubuntu-18.04. The resulting.gitlab-ci.ymlwill contain 6 jobs in three stages. Once the jobs have been generated, the presence of a--cdash-credentialsargument to therelease-jobsscript would result in all of the jobs being put in a build group on CDash calledRelease Testing(that group will be created if it didn't already exist).Comparing the previous approach to the new one
Previously, a release was described in the
release.yamlfile, which provided a file-based representation of theCombinatorialSpecSettype. This type allowed to list packages, package versions, compilers, and compiler versions, and would then compute and yield the full cross-product of those lists. Then thespack release-jobs ...command would process the resulting specs, matching them up against entries in theos-container-mapping.yamlfile to find docker images which could build the specs, and ultimately generate the.gitlab-ci.ymlfile.There were multiple drawbacks to the previous approach. One issue was that the
os-container-mapping.yamlassumed that all ci jobs would be run in a gitlab runner configured with theDocker executortype runner, which made it difficult for sites to adopt the release workflow. The need for Docker containers was, itself, another drawback of the approach, but at the time the release workflow was initially developed, specs could not be guaranteed to be concretized properly if the concretization was not performed on a machine matching theosarchof the spec. Thus, in order to generate the.gitlab-ci.yml, we needed first to organize the release into groups of specs that could all be concretized the in the same container, then implement a micro web-service that could farm that concretization work out appropriately. While this PR doesn't specifically fix the need to concretize specs in containers, it assumes the existence of such a fix in order to simplify the code required to generate the.gitlab-ci.yml.The
gitlab-cischema proposed in this PR is meant to replace theos-container-mapping.ymlused previously. For reference, the goal of theos-container-mapping.ymlfile was to aid in mapping each spec from a set of release specs to a gitlab runner which could build the spec. Thegitlab-cischema, when used within an environment with a set of packages, one or more configured mirrors, and acdashsection is the same. While thegitlab-cischema doesn't do anything fundamentally new, it does represent a few key improvements over the old mapping file:release.yamland theos-container-mapping.ymlwasn't clear at all. This change co-locates thespec -> runnermapping with the description of the specs in the release.Now every release can be provided with a custom mapping.
Docker executorgitlab runners were assumed, which made it impossible to target, for example, setuid runners designed by/for sites. This change allows the release writer to specify tags to select the runner for a job, and also allows the author to specify variables to provide to the selected runner.Multi-repo CI approach
Another thing this PR does is take a first stab at a multi-repo approach to the release workflow. Because gitlab-ci does not yet allow for dynamic generation of jobs, a so-called "pre-ci" phase is needed to generate the
.gitlab-ci.ymlfrom the release description. This was previously conceived to be handled by standing up a microservice that could generate the ci file on demand, and hosting it at some known endpoint. The endpoint would allow provision of the commit sha to be checked out when generating the ci jobs (via a url parameter). Then the gitlab-ciincludefeature would be used to include the dynamically generated jobs. Unfortunately, theincludefeature does not support variable interpolation, and so the important information of which commit to check out when generating the jobs could not be communicated to the microservice. Hence, the need for two stages of ci.The multi-repo approach proposed in this PR assumes that the first stage of the release workflow will be handled by a gitlab repo, for example a Spack mirror. This CI on this repo will generate the actual release jobs as a
.gitlab-ci.yml, and then push a commit to the gitlab repo where the actual release jobs can be run.The proposed approach allows for either an environment file within the Spack repo, or else an environment file in another arbitrary repo to be used as the release description. At a high level, the pre-ci phase in this PR (the actual implementation is located in
bin/generate-gitlab-ci-yml.sh) does the following:SPACK_RELEASE_ENVIRONMENT_REPO, if that variable is found, clone the repo, otherwise, assume the Spack repo contains the release environment fileSPACK_RELEASE_ENVIRONMENT_PATH, using the location of the repo determined in the previous step as the base path, to find the release environment filespack.yamlfile from the previous stepspack release-jobs ...command.gitlab-ci.ymlfile and push the commit to the repo designated by the CI environment variableDOWNSTREAM_CI_REPO.See this repo for an example that a site could set up to desribed it's own release. In this kind of setup, the site would probably maintain three repos:
spack.yaml).gitlab-ci.ymlgeneration)Optional compiler bootstrapping
This PR also adds support for bootstrapping compilers on systems which may not already have the desired compilers installed. The idea here is that you can specify a list of things to bootstrap in your definitions, and we will guarantee those will be installed in a phase of the pipeline before your release specs, so that you can rely on those packages being available in the binary mirror when you need them later on in the pipeline. At the moment the only viable use-case for bootstrapping is to install compilers.
Here's an example of what bootstrapping some compilers looks like, taken from the example environment/stack included with this PR (
lib/spack/docs/example_files/spack.yaml):In the example above, we have added a list to the
definitionscalledcompiler-pkgs(you can add any number of these), which lists compiler packages we want to be staged ahead of the full matrix of release specs (which consists only ofreadlinein our example). Then within thegitlab-cisection, we have added abootstrapsections which can contain a list of items, each of which refers to a list in thedefinitionssection. These items can either be a dictionary or a string. If you supply a dictionary, it must have anamekey whose value must match one of the lists indefinitionsand it can have acompiler-agnostickey whose value is a boolean. If you supply a string, then it needs to match one of the lists provided indefinitions. You can think of thebootstraplist as an ordered list of pipeline "phases" that will be staged before your actual release specs. While this introduces another layer of bottleneck in the pipeline (all jobs in all stages of one phase must complete before any jobs in the next phase can begin), it also means you are guaranteed your bootstrapped compilers will be available when you need them.The
compiler-agnostickey which can be provided with each item in thebootstraplist tells therelease-jobscommand that any jobs staged from that particular list should have the compiler removed from the spec, so that any compiler available on the runner where the job is run can be used to build the package.When including a bootstrapping phase as in the example above, the result is that the bootstrapped compiler packages will be pushed to the binary mirror (and the local artifacts mirror) before the actual release specs are built. In this case, the jobs corresponding to subsequent release specs are configured to
install_missing_compilers, so that if spack is asked to install a package with a compiler it doesn't know about, it can be quickly installed from the binary mirror first.Since bootstrapping compilers is optional, those items can be left out of the environment/stack file, and in that case no bootstrapping will be done (only the
specswill be staged for building) and the runners will be expected to already have all needed compilers installed and configured for spack to use.