spack ci: Refine job attribute configuration#32300
spack ci: Refine job attribute configuration#32300blue42u wants to merge 2 commits intospack:developfrom
spack ci: Refine job attribute configuration#32300Conversation
59200ad to
690d307
Compare
690d307 to
c455445
Compare
c455445 to
f9ebb8d
Compare
f9ebb8d to
b148d0b
Compare
|
@scottwittenburg do you see any issues with allowing whatever runner attributes the user wants? This was discussed today and it was thought that gitlab will complain if invalid attributes are specified. (to be clear, I'm not asking you to brainstorm possible cases where this could go wrong, more if you happen to remember if there was a case where this went wrong) |
|
@scheibelp In general I like the idea of allowing the user to specify any attributes they want. The problem I see is how to handle when For example, we currently hard-code some |
|
@scottwittenburg Good catch, I didn't realize that the build jobs didn't copy all the attributes over. I'll work on that over the weekend. I think the user should always be able to override the values from So, proposal: the final job is generated by merging the generated job + gitlab-ci:
build-job-attributes:
retry: false
tags: [general]
+needs:
- custom-job
mappings:
- match: 'a'
job-attributes:
+tags: [specific]
- match: 'b'
job-attributes:
tags: [unique]Would result in a pipeline containing roughly: (spec) a:
...
retry: false
needs:
- (spec) a-dep1
- (spec) a-dep2
- custom-job
tags: [general, specific]
(spec) b:
...
retry: false
needs:
- (spec) b-dep1
- custom-job
tags: [unique] |
I mostly agree with this. Some exceptions might remain, e.g. we decided the I like the proposal to use |
I made that one up, it seemed appropriate and happens to be one of the few characters with no special meaning to YAML.
I'd prefer to have no or opt-in exceptions if possible, those tag adjustments will break a lot when using |
b148d0b to
30bae97
Compare
|
Rewritten to implement the new design. It feels much cleaner this time around. I left the tag adjustments in to keep it consistent with the original code, I also kept some tests using the original format where it mattered. OP updated with explanations for the new changes. |
9471e75 to
8e3b656
Compare
|
@scottwittenburg Does this version look good to you, or would you like further changes? |
| after_script = None | ||
| if "after_script" in runner_attribs: | ||
| after_script = [s for s in runner_attribs["after_script"]] | ||
| job_object["script"].insert(0, "cd {0}".format(concrete_env_dir)) |
There was a problem hiding this comment.
Maybe I'm reading this wrong? It looks like it will update the outgoing job object with the command, regardless of whether the user supplied their own script. While before this, the job_script list that was updated here might have been completely overwritten with the users script list elements just below.
There was a problem hiding this comment.
With this PR the order of operations is reversed, the whole job_object is generated first and then the user's modifications are applied afterwards. This ensures the user's configuration takes precedence in all cases and for all fields. It also keeps the modification order consistent and unsurprising (e.g. +script appends to the end of the default script, as one might expect). This pattern is repeated throughout the PR for the other job types as well.
After looking through it, I think it makes sense, thanks. I'd feel better if some rebuild jobs had run at any point in the past several pushes though. I looked through the last 5 pipelines on this PR, but could only find "no specs to rebuild" jobs. I'll ask spackbot to rebuild everything, then cancel the child pipelines once a few stages have run. |
|
@spackbot rebuild everything |
|
I've started that pipeline for you! |
Actually, I decided to cancel all generation jobs but the |
|
Interestingly enough, the check on GitHub reports the "Pipeline is running," but the pipeline itself has already completed. I assume that's a spackbot quirk? |
It could be a bug in the bit responsible for updating the status, which is not spackbot, but similar. Or could it be related to me canceling most of the jobs in the pipeline? Pinging @zackgalbreath in case he may have some idea. |
|
Either way, it's great all the |
|
@spackbot run pipeline |
|
I've started that pipeline for you! |
|
I think this needs a rebase to pick a develop that no longer has any |
8e3b656 to
e3573de
Compare
|
Rebased, all the checks are green now (aside from |
e3573de to
732cc3c
Compare
732cc3c to
583e651
Compare
583e651 to
fb1e906
Compare
|
I am working on extending this and removing more boilerplate from the stack I initially liked the idea of the Pros:
Cons:
When working with the For the |
|
@kwryankrattiger Thank you for the comments! There are a few points that I'm confused or don't agree on, questions/comments below.
I can see the allure for full control over how keys are merged, but I'm also having trouble coming up with examples where the extra features you've listed would be useful:
This is the one example I could see using gitlab-ci:
build-job-attributes:
before_script: &main_before
- echo 'main script!'
mappings:
- match: ['os=special']
job-attributes:
before_script:
- echo 'before main script (special)!'
- *main_before
- match: ['os=otherspecial']
job-attributes:
before_script:
- echo 'before main script (otherspecial)!'
- *main_beforeFor gitlab-ci:
match_behavior: merge
mappings:
- match: ['os=special']
job-attributes:
before_script:
- echo 'before main script (special)!'
- match: ['os=otherspecial']
job-attributes:
before_script:
- echo 'before main script (otherspecial)!'
- match: ['@:']
job-attributes:
+before_script:
- echo 'main script!'I'm not as opposed to
There are plenty of reasons for a configurable
I also want to ensure Spack has as little knowledge of the GitLab CI specification as possible, so as little special handling for particular keys as possible. The reason for this PR in the first place was because there were attributes that I couldn't set without external scripts, keeping Spack fairly agnostic here makes it robust as new keys are added to the GitLab CI schema. |
|
I am going to go backwards...
On the script stuff, I meant at the mapping level. Currently, it is allowed to change the script depending on the package, I think that is maybe not the best. But as I write this response I am feeling like not restricting it has the benefit of not making it special and therefore that means there is less nuance to reason around.
Correct. I was omitting the full scope of the change to come after this, it would probably be helpful to add some more context. In order to remove most of the boiler plate from stacks I am working on introducing a new
Mappings in one thing that I am more unsure on. What you say is true. But what if at the top level CI config we ask for mappings:
- match_behavior: first
matches:
- match: [...]
job-attributes: {...}
- match_behavior: merge
matches:
- match: [...]
job-attributes: {...}
My feeling was it would be beneficial to have ways to specify precedence in the world of layered configs, "this stacks configs are more important than the defaults vs. defaults first then mine. But reading your comments and thinking some more I agree that maybe for how CI is being used (or should be used) it may be sufficient to only support the "append"/"merge" behavior you have here. My goal is to be able to reuse the same merging logic for everything to prevent behavior that is difficult to reason about simply. To continue to provide some more context, I am also looking at refining the idea of "scope" in the CI configurations. By this I mean attributes that apply to all of CI, attributes that apply to specific jobs types, and attributes that apply to specific package rebuild job specifications. Everything would get initialized with the top level attributes, then their job type attributes, and finally the package job attributes from mappings. Attributes like An example of how I think this could all look using your merge logic key prefixes, but I don't think it necessarily has to if everything is assumed to be append/insert for lists/maps, overwrite for values at the yaml level, and then use the prefix at the mappings level only. General CI configs: ci.yaml spack-ci:
tags: ["spack"]
before_script:
- . "./share/spack/setup-env.sh"
- spack --version
rebuild-job-attributes:
before_script: [...] # download make/print the arch/etc.
script: [...]
signing-job-attributes:
script: [...]
image: spack-signing-image:1234-56-78
mapping:
- match_behavior: first
matches:
- match:
- package
job-attributes:
tags: ["huge"]
variables:
KUBERNETES_CPU_REQUEST: "11000m"
cdash: {...}General Linux CI configs: ci-linux.yaml include:
- ci.yaml
ci:
# Linux only package mappings
image: ghcr.io/spack/default-linux-ci-image:2022-11-04
+before_script: [...] # print linux system information and stuff
+mappings: [...]Linux Architecture configs: ci-linux-x86_64.yaml include:
- ci-linux.yaml
ci:
+tags: ["x86_64"]
+before_script: [...] # download architecture specific gmakeStack specific configs: spack.yaml spack:
include:
- ci-linux-x86_64.yaml
view: false
concretizer:
reuse: false
unify: false
config: {...}
specs: [...]
ci:
# The specific version/configuration of packages in this stack require
# more/less resources than the default configurations
match_behavior: merge
mappings: [...] |
|
Thanks for the extra context. I definitely would use a I think trying to reuse the merge logic for the configuration will be too nuanced, there are too many fields that interact in non-trivial ways, for example Would it suffice for your cases to "apply" the effects of each --- # spack.yaml
spack:
gitlab-ci:
include:
- a.yaml
- b.yaml
mappings:
- match: [package]
job-attributes:
+before_script:
- echo main
... # a.yaml
gitlab-ci:
build-job-attributes:
before_script:
- echo a
... # b.yaml
gitlab-ci:
build-job-attributes:
mappings:
- match: ['os=ubuntu']
job-attributes:
before_script:
- echo bThen a Extending the above idea a bit to allow writing "mixins," I think there would need to be a bit more control over the order in which the configurations are applied. So, what about splitting the external configurations into a part that is --- # ci-linux.yaml
apply-before:
image: default-spack-linux-image:v1234
build-job-attributes:
+before_script: [...] # Setup bits, print system info
apply-after:
build-job-attributes:
+before_script: [...] # Necessary cleanup before the main scriptAnd then an environment can spack:
gitlab-ci:
include:
- mixin1.yaml
- mixin2.yaml
# Application order:
# mixin1.yaml:apply-before
# mixin2.yaml:apply-before
# spack.yaml:gitlab-ci
# mixin2.yaml:apply-after
# mixin1.yaml:apply-afterOr (mutually exclusive) the exact order can be explicitly listed out: spack:
gitlab-ci:
include-before: # Application order:
- mixin1.yaml # mixin1.yaml:apply-before
- mixin2.yaml # mixin2.yaml:apply-before
include-after: # spack.yaml:gitlab-ci
- mixin1.yaml # mixin1.yaml:apply-after
- mixin3.yaml # mixin3.yaml:apply-afterI'm not entirely sure what to do with dependencies between mixins (e.g. if |
|
The way includes would work is all of In this example, graph TD
A[spack.yaml]
B[mixin-1.yaml]
C[mixin-2.yaml]
D[mixin-config.yaml]
A --> B & C
B --> D
C --> D
But I don't think this is generally going to be an issue since I don't see the need to have diamond patterns in configs. The layers should be pretty linear for this type of thing. graph TD
A[ci-base]
B[os-base]
C[os-architecture]
D[spack.yaml]
D --> C
C --> B
B --> A
|
|
@blue42u I was looking into the config module a bit and I found This may be worth using. |
|
The reason I avoided Is it critically important in practice? Probably not, but I like expressing intention and having sanity checks 😄 I could also see adding the explicit merge-fix to the generic Spack configuration format, something like: key:: VALUE # Always override key with VALUE
key+: VALUE # Always merge VALUE with previous VALUE (error if not possible)
key: VALUE # Merge if possible, override otherwise@kwryankrattiger Think this would be a good direction? |
|
I think that makes sense. I think to take it a step further with the --- # a.yaml
MERGE_KEY: VALUE
OVERRIDE_KEY: VALUE
KEY: VALUE
--- # b.yaml
MERGE_KEY+: [a, b, c]
OVERRIDE_KEY:: x
KEY: [d, e, f]
--- # ab.yaml
MERGE_KEY: [VALUE, a, b, c]
OVERRIDE_KEY:: x
KEY: [d, e, f] |
|
Closing this implementation in favor of the more complete #34272 |
The configuration possible in the
gitlab-cisection is a subset of GitLab's CI pipeline schema. This is missing some critical features, including:workflowpipeline field cannot be set, making the generated pipeline unusable depending on the parent pipeline'sworkflowsettings.cachefield cannot be used for generated jobs, making it impossible to implement portable network I/O optimization (e.g. caching the Spack Git repo, or Local cache for buildcache files #32136).*scriptfields only accept a flat list of strings, however GitLab will flatten a nested list of strings. Without this feature common script fragments between build and service jobs can't be shared with YAML anchors.In addition, the
service-job-attributesconfiguration applies too broadly, in particular theno-specs-to-rebuildjob is special in that it does not require a working Spack or caches. Removing these from the job description can save a significant amount of time.This patch removes the schema limitations for job attributes allowing any keys to be inserted into the generated pipeline. These attributes can now be specified under the following keys:
gitlab-ci:pipeline-attributesfor the pipeline as a whole,gitlab-ci:build-job-attributesandgitlab-ci:mappings:job-attributesfor build jobs,gitlab-ci:cleanup-job-attributesfor the cleanup job,gitlab-ci:reindex-job-attributesfor thereindex-cachejob, andgitlab-ci:noop-job-attributesfor theno-specs-to-rebuildjob.The keys listed above are "merged" (in order) into the job/pipeline
dictafter it has been generated, where "merge" here overwrites any shared keys unless they are prefixed with+, in which case it recursively "merges" (fordict) or concatenates (forlist). In short, merge operates as follows:{key: [val1, val2]}+{key: [new3]}=>{key: [new3]}{key: [val1, val2]}+{+key: [new3]}=>{key: [val1, val2, new3]}{key: {key1: val1, key2: val2}}+{+key: {key1: new1, key3: new3}}=>{key: {key1: new1, key2: val2, key3: new3}}The previous schema still works (by internally emulating its effects), but will warn that it is a "legacy feature" and urge a transition to the newer format.
Documentation not yet included.