WIP: Package hash check binaries#8444
WIP: Package hash check binaries#8444scottwittenburg wants to merge 10 commits intospack:developfrom
Conversation
add286f to
7fae0f2
Compare
7fae0f2 to
e229cab
Compare
e229cab to
6312b9d
Compare
9c45b46 to
c64a053
Compare
c64a053 to
3f9e792
Compare
tgamblin
left a comment
There was a problem hiding this comment.
@scottwittenburg: this looks mostly good. The logic is good.
I think the main questions I have are how we'll farm out the jobs from a release spec-set, and how this command fits into GitLab CI.
Any ideas? The spec-set can figure out what has changed, but then runners need to be scheduled to build the different dependencies. I wish we were actually having a telcon today so that we could discuss this, but let's do it on GitHub and/or email.
| --- | ||
| spec-set: | ||
| include: [ ape, atompaw, transset] | ||
| exclude: [binutils,tk] |
There was a problem hiding this comment.
I forget if this is the case, but if there is an include list and an exclude list, the thing should start with what's in include and exclude should exclude from that set. So in this example, the exclude list wouldn't be necessary.
There was a problem hiding this comment.
Makes sense, I'll keep that in mind. In the sample release file I provided (just as a start), I just left them both empty for the moment. I figured I would start populating the exclude list as I discover that some things don't work together.
| # | ||
| # This YAML file describes a Spack test suite | ||
| # | ||
| spec-set: |
| CDash test suites | ||
| --------------------- | ||
|
|
||
| The ``spack test-suite`` command reads in a specialy formatted YAML file |
There was a problem hiding this comment.
Are you going to keep test-suite? I think it needs a rework, but it would make sense if it is part of what drives the CI. The issue I see is that it's sequential, and if we want to test 2700 packages quickly, we need to farm them out to separate runners.
There was a problem hiding this comment.
Well, I was going to keep it, thinking it was in some ways separate from the release process. I'll give it another look, but like you say, we'll need to parallelize the builds. My feeling is that we'll want to "fan out" to as many little jobs as possible so we have maximum flexibility in how we distribute them. I have spent quite a bit of time discussing approaches in this area with @opadron, so it will be good when we can meet and discuss them. I'll make another comment on our current thinking along those lines somewhere else.
There was a problem hiding this comment.
I think the test-suite command can be retired and we can start from scratch, given that you guys have much more flexible ideas about this.
lib/spack/spack/cmd/checkbinaries.py
Outdated
| level = "long" | ||
|
|
||
|
|
||
| _buildcache_path = os.path.join('build_cache') |
There was a problem hiding this comment.
I don't think this needs an os.path.join. Shouldn't this location be taken from binary_distribution? Oh... I just looked at that. It's not defined as a variable there, it's just a string written in lots of places... ugh. Can you make it into a variable in binary_distribution?
There was a problem hiding this comment.
Yes, I was thinking the same thing here. I'll do it in my next pass.
lib/spack/spack/cmd/checkbinaries.py
Outdated
| _pkg_yaml_match_regex = re.compile(r'<a href=[^>]+\>\s*([^<]+)\s*\<') | ||
|
|
||
|
|
||
| def changed_package_files(args): |
There was a problem hiding this comment.
This should be moved to some common place like spack.util.git so that this command and spack flake8 can use the same code.
There was a problem hiding this comment.
I agree with you here, if we keep this logic I'll move it somewhere it can be shared by both. However, just recently I've been thinking that using git to determine what has changed is flawed in general. Maybe you can straighten me out though. If we only look at changed packages, then will we not also need, for each changed package, to somehow walk back up the dependency tree and add all dependent packages to the rebuild list? I've recently been wondering if we just need to do a full_hash on every package in the release spec-set, since that will truly discover if any package needs to be rebuilt. To avoid a lot of network traffic, we could think about keeping a slightly more detailed index on the binary mirror so we can fetch all the full hashes in a single request.
There was a problem hiding this comment.
If we only look at changed packages, then will we not also need, for each changed package, to somehow walk back up the dependency tree and add all dependent packages to the rebuild list?
Yes, that's true. Look at spack dependents for some code that does that -- it constructs an inverted dependency dict and uses that to get the static dependency information, which will tell you which packages could have changed.
I think computing the full hash for the full spec set is the best way to go. We need to do something equivalent to that. It may still be a win to run the code from spack dependents to figure out which packages could have changed, then only re-hash the things from the spec set that have those names, but it's probably only a small win at this point.
There was a problem hiding this comment.
Modifying the index to have more info is definitely a good idea.
lib/spack/spack/cmd/checkbinaries.py
Outdated
|
|
||
| def parse_yaml_list(index_html_contents): | ||
| result = [] | ||
| match_list = _pkg_yaml_match_regex.findall(index_html_contents) |
There was a problem hiding this comment.
This looks good but I think adding a parseable json file alongside index.html would be good.
There was a problem hiding this comment.
Yes, I agree. While I did add a yaml index to the buildcache command, I didn't yet update this to read it. Based on your comments on that PR (#8451) though, I think I'll change that to a json representation before updating this code.
There was a problem hiding this comment.
Basically the undocumented policy we've been using is json for machine-generated stuff, yaml for human-edited stuff (config files). #2189 has a pretty lengthy discussion of how much faster json is.
| except URLError as urlErr: | ||
| tty.error('Unable to open url %s' % index_path) | ||
| tty.error(urlErr) | ||
| return None |
There was a problem hiding this comment.
These return Nones should probably raise some type of SpackError. Any SpackErrorr raised will end the command and print the message from the error, or a stack trace in debug mode.
There was a problem hiding this comment.
Sounds good.
lib/spack/spack/cmd/checkbinaries.py
Outdated
| try: | ||
| pkg_yaml.index('-%s-' % pkg_name) | ||
| tty.msg(' %s' % pkg_yaml) | ||
| except Exception as expn: |
There was a problem hiding this comment.
maybe a warning here? Dunno why this would happen.
There was a problem hiding this comment.
At the time I wasn't quite sure that I could count on the form of the file name. Just a warning sounds fine though.
lib/spack/spack/cmd/checkbinaries.py
Outdated
| return result | ||
|
|
||
|
|
||
| def get_mirror_rebuilds(mirror_name, mirror_url, pkg_change_set): |
There was a problem hiding this comment.
logic in here looks good to me -- code could use some polish.
| help="configuration scope containing mirrors to check") | ||
|
|
||
|
|
||
| def checkbinaries(parser, args): |
There was a problem hiding this comment.
check-binaries would be better - the function would be called check_binares in that case. does that work?
There was a problem hiding this comment.
Fine with me, I don't see why it wouldn't work.
|
@tgamblin Thanks for the review. As I mentioned somewhere above, I wanted to mention what @opadron and I are currently thinking about farming out the build jobs. Please chime in Omar, if I've mis-spoken. For one thing, we thought it would be nice if we were not tied too tightly to running on Kubernetes, so a purely gitlab-ci approach seemed like a good idea (though we might have more fine-grained control if we did want to tie ourselves to Kubernetes). So Then, via the magic of the work @opadron has done, those jobs can be run in parallel on the cluster. I've probably left out some important details in that high-level description, so feel free to poke holes in it to determine whether I just forgot to mention something, or if there are details we really haven't considered yet. One thing I'm still not 100% clear on is where the different OS come in. Should OS be added as another dimension in the ping @aashish24 @bryonbean |
|
@scottwittenburg your description is pretty spot-on, imo. Depending on how many builds we end up running, and how deep our pockets for AWS resources, we may want to limit the number of concurrent builds. For the OS question, I think your latter suggestion- that we simply run our build matrix once for each of a static set of images that we specify up front- is a perfectly fine first step that we can adjust later if needed. |
|
@scottwittenburg @opadron: Sorry for the late reply. I, like @opadron, think the plan is great. I like the idea of generating the gitlab YAML file from a spec set. I don't expect I also agree that we shouldn't tie the workflow to Kube, though kube is a useful place to run cloud containers. I agree with @opadron about where the OS configurations should come from. Handling OS's by running for multiple images works for me. I don't think we can really handle that elegantly at the spec set level -- let something else run Spack and have it build for the platform it's running on. |
- add CombinatorialSpecSet in spack.util.spec_set module.
- class is iterable and encaspulated YAML parsing and validation.
- Adjust YAML format to be more generic
- YAML test-suite format now has a `matrix` section, which can contain
multiple lists of specs, generated different ways. Including:
- specs: a raw list of specs.
- packages: a list of package names and versions
- compilers: a list of compiler names and versions
- All of the elements of `matrix` are dimensions for the build matrix;
we take the cartesian product of these lists of specs to generate a
build matrix. This means we can add things like [^mpich, ^openmpi]
to get builds with different MPI versions. It also means we can
multiply the build matrix out with lots of different parameters.
- Add a schema format for test-suites
- Bug fixes:
- fix bug with constraining an anonymous spec by name, add a test.
This reflects the more general usefulness of the class for other purposes including, for example, specifying the set of specs that comprise a release.
Now it no longer uses the git diff to find changes, but rather just iterates over everything in the release spec_set. Also relies on updates to the buildcache command in another PR.
964ddf8 to
7555d65
Compare
Also brings check_binaries.py command from spack#8444, and refactors it a bit to allow it to check a single spec against a single remote binary mirror. This way each job can check it's own spec against the mirror to see if it needs to do its work or just quit.
Also brings check_binaries.py command from spack#8444, and refactors it a bit to allow it to check a single spec against a single remote binary mirror. This way each job can check it's own spec against the mirror to see if it needs to do its work or just quit.
Also brings check_binaries.py command from spack#8444, and refactors it a bit to allow it to check a single spec against a single remote binary mirror. This way each job can check it's own spec against the mirror to see if it needs to do its work or just quit.
Also brings check_binaries.py command from spack#8444, and refactors it a bit to allow it to check a single spec against a single remote binary mirror. This way each job can check it's own spec against the mirror to see if it needs to do its work or just quit.
|
I'm closing this because it has been mostly superseded by #8718 |
Also brings check_binaries.py command from spack#8444, and refactors it a bit to allow it to check a single spec against a single remote binary mirror. This way each job can check it's own spec against the mirror to see if it needs to do its work or just quit.
Also brings check_binaries.py command from spack#8444, and refactors it a bit to allow it to check a single spec against a single remote binary mirror. This way each job can check it's own spec against the mirror to see if it needs to do its work or just quit.
Also brings check_binaries.py command from spack#8444, and refactors it a bit to allow it to check a single spec against a single remote binary mirror. This way each job can check it's own spec against the mirror to see if it needs to do its work or just quit.
A new command addressing the steps documented in #8387