Skip to content

WIP: Package hash check binaries#8444

Closed
scottwittenburg wants to merge 10 commits intospack:developfrom
scottwittenburg:package-hash-check-binaries
Closed

WIP: Package hash check binaries#8444
scottwittenburg wants to merge 10 commits intospack:developfrom
scottwittenburg:package-hash-check-binaries

Conversation

@scottwittenburg
Copy link
Copy Markdown
Contributor

@scottwittenburg scottwittenburg commented Jun 12, 2018

A new command addressing the steps documented in #8387

Copy link
Copy Markdown
Member

@tgamblin tgamblin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scottwittenburg: this looks mostly good. The logic is good.

I think the main questions I have are how we'll farm out the jobs from a release spec-set, and how this command fits into GitLab CI.

Any ideas? The spec-set can figure out what has changed, but then runners need to be scheduled to build the different dependencies. I wish we were actually having a telcon today so that we could discuss this, but let's do it on GitHub and/or email.

---
spec-set:
include: [ ape, atompaw, transset]
exclude: [binutils,tk]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forget if this is the case, but if there is an include list and an exclude list, the thing should start with what's in include and exclude should exclude from that set. So in this example, the exclude list wouldn't be necessary.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I'll keep that in mind. In the sample release file I provided (just as a start), I just left them both empty for the moment. I figured I would start populating the exclude list as I discover that some things don't work together.

#
# This YAML file describes a Spack test suite
#
spec-set:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like spec-set.

CDash test suites
---------------------

The ``spack test-suite`` command reads in a specialy formatted YAML file
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you going to keep test-suite? I think it needs a rework, but it would make sense if it is part of what drives the CI. The issue I see is that it's sequential, and if we want to test 2700 packages quickly, we need to farm them out to separate runners.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I was going to keep it, thinking it was in some ways separate from the release process. I'll give it another look, but like you say, we'll need to parallelize the builds. My feeling is that we'll want to "fan out" to as many little jobs as possible so we have maximum flexibility in how we distribute them. I have spent quite a bit of time discussing approaches in this area with @opadron, so it will be good when we can meet and discuss them. I'll make another comment on our current thinking along those lines somewhere else.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the test-suite command can be retired and we can start from scratch, given that you guys have much more flexible ideas about this.

level = "long"


_buildcache_path = os.path.join('build_cache')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs an os.path.join. Shouldn't this location be taken from binary_distribution? Oh... I just looked at that. It's not defined as a variable there, it's just a string written in lots of places... ugh. Can you make it into a variable in binary_distribution?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was thinking the same thing here. I'll do it in my next pass.

_pkg_yaml_match_regex = re.compile(r'<a href=[^>]+\>\s*([^<]+)\s*\<')


def changed_package_files(args):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be moved to some common place like spack.util.git so that this command and spack flake8 can use the same code.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you here, if we keep this logic I'll move it somewhere it can be shared by both. However, just recently I've been thinking that using git to determine what has changed is flawed in general. Maybe you can straighten me out though. If we only look at changed packages, then will we not also need, for each changed package, to somehow walk back up the dependency tree and add all dependent packages to the rebuild list? I've recently been wondering if we just need to do a full_hash on every package in the release spec-set, since that will truly discover if any package needs to be rebuilt. To avoid a lot of network traffic, we could think about keeping a slightly more detailed index on the binary mirror so we can fetch all the full hashes in a single request.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we only look at changed packages, then will we not also need, for each changed package, to somehow walk back up the dependency tree and add all dependent packages to the rebuild list?

Yes, that's true. Look at spack dependents for some code that does that -- it constructs an inverted dependency dict and uses that to get the static dependency information, which will tell you which packages could have changed.

I think computing the full hash for the full spec set is the best way to go. We need to do something equivalent to that. It may still be a win to run the code from spack dependents to figure out which packages could have changed, then only re-hash the things from the spec set that have those names, but it's probably only a small win at this point.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modifying the index to have more info is definitely a good idea.


def parse_yaml_list(index_html_contents):
result = []
match_list = _pkg_yaml_match_regex.findall(index_html_contents)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good but I think adding a parseable json file alongside index.html would be good.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. While I did add a yaml index to the buildcache command, I didn't yet update this to read it. Based on your comments on that PR (#8451) though, I think I'll change that to a json representation before updating this code.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically the undocumented policy we've been using is json for machine-generated stuff, yaml for human-edited stuff (config files). #2189 has a pretty lengthy discussion of how much faster json is.

except URLError as urlErr:
tty.error('Unable to open url %s' % index_path)
tty.error(urlErr)
return None
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These return Nones should probably raise some type of SpackError. Any SpackErrorr raised will end the command and print the message from the error, or a stack trace in debug mode.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

try:
pkg_yaml.index('-%s-' % pkg_name)
tty.msg(' %s' % pkg_yaml)
except Exception as expn:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a warning here? Dunno why this would happen.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the time I wasn't quite sure that I could count on the form of the file name. Just a warning sounds fine though.

return result


def get_mirror_rebuilds(mirror_name, mirror_url, pkg_change_set):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic in here looks good to me -- code could use some polish.

help="configuration scope containing mirrors to check")


def checkbinaries(parser, args):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check-binaries would be better - the function would be called check_binares in that case. does that work?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with me, I don't see why it wouldn't work.

@scottwittenburg
Copy link
Copy Markdown
Contributor Author

@tgamblin Thanks for the review. As I mentioned somewhere above, I wanted to mention what @opadron and I are currently thinking about farming out the build jobs. Please chime in Omar, if I've mis-spoken.

For one thing, we thought it would be nice if we were not tied too tightly to running on Kubernetes, so a purely gitlab-ci approach seemed like a good idea (though we might have more fine-grained control if we did want to tie ourselves to Kubernetes).

So gitlab-ci.yml documentation describes how jobs can be organized into stages, where all jobs in the same stage are run in parallel. Since they do not, however, provide any way to dynamically define those jobs, we're thinking of dynamically generating the gitlab-ci.yml file (perhaps on Travis would be good, where we could add it to a "throwaway" commit we push on top of the developers changes, and push that to our gitlab). The jobs we define in that file would associate a single package to build with any information needed to build it (like which container is needed, where it should be pushed afterwards, etc).

Then, via the magic of the work @opadron has done, those jobs can be run in parallel on the cluster. I've probably left out some important details in that high-level description, so feel free to poke holes in it to determine whether I just forgot to mention something, or if there are details we really haven't considered yet.

One thing I'm still not 100% clear on is where the different OS come in. Should OS be added as another dimension in the CombinatorialSpecSet? Or should we just have a list of a few we want to build on somewhere, and then run our release process for each one?

ping @aashish24 @bryonbean

@opadron
Copy link
Copy Markdown
Member

opadron commented Jul 3, 2018

@scottwittenburg your description is pretty spot-on, imo. Depending on how many builds we end up running, and how deep our pockets for AWS resources, we may want to limit the number of concurrent builds.

For the OS question, I think your latter suggestion- that we simply run our build matrix once for each of a static set of images that we specify up front- is a perfectly fine first step that we can adjust later if needed.

@tgamblin
Copy link
Copy Markdown
Member

@scottwittenburg @opadron: Sorry for the late reply. I, like @opadron, think the plan is great. I like the idea of generating the gitlab YAML file from a spec set. I don't expect gitlab-ci.yaml to change much between releases, so I don't think this approach will cause a lot of noise in the repo, either (not that there aren't a lot of commits already).

I also agree that we shouldn't tie the workflow to Kube, though kube is a useful place to run cloud containers. I agree with @opadron about where the OS configurations should come from. Handling OS's by running for multiple images works for me. I don't think we can really handle that elegantly at the spec set level -- let something else run Spack and have it build for the platform it's running on.

scottwittenburg and others added 9 commits July 10, 2018 11:36
- add CombinatorialSpecSet in spack.util.spec_set module.
  - class is iterable and encaspulated YAML parsing and validation.

- Adjust YAML format to be more generic
  - YAML test-suite format now has a `matrix` section, which can contain
    multiple lists of specs, generated different ways. Including:
    - specs: a raw list of specs.
    - packages: a list of package names and versions
    - compilers: a list of compiler names and versions

  - All of the elements of `matrix` are dimensions for the build matrix;
    we take the cartesian product of these lists of specs to generate a
    build matrix.  This means we can add things like [^mpich, ^openmpi]
    to get builds with different MPI versions.  It also means we can
    multiply the build matrix out with lots of different parameters.

- Add a schema format for test-suites

- Bug fixes:
  - fix bug with constraining an anonymous spec by name, add a test.
This reflects the more general usefulness of the class for other
purposes including, for example, specifying the set of specs that
comprise a release.
Now it no longer uses the git diff to find changes, but rather just
iterates over everything in the release spec_set.  Also relies on
updates to the buildcache command in another PR.
@scottwittenburg scottwittenburg force-pushed the package-hash-check-binaries branch from 964ddf8 to 7555d65 Compare July 11, 2018 20:00
@scottwittenburg
Copy link
Copy Markdown
Contributor Author

Take a look at the last commit here @opadron, for a first stab at what our .gitlab-ci.yaml might look like.

scottwittenburg added a commit to scottwittenburg/spack that referenced this pull request Jul 16, 2018
Also brings check_binaries.py command from spack#8444, and refactors it a
bit to allow it to check a single spec against a single remote binary
mirror.  This way each job can check it's own spec against the mirror
to see if it needs to do its work or just quit.
scottwittenburg added a commit to scottwittenburg/spack that referenced this pull request Jul 18, 2018
Also brings check_binaries.py command from spack#8444, and refactors it a
bit to allow it to check a single spec against a single remote binary
mirror.  This way each job can check it's own spec against the mirror
to see if it needs to do its work or just quit.
scottwittenburg added a commit to scottwittenburg/spack that referenced this pull request Aug 10, 2018
Also brings check_binaries.py command from spack#8444, and refactors it a
bit to allow it to check a single spec against a single remote binary
mirror.  This way each job can check it's own spec against the mirror
to see if it needs to do its work or just quit.
scottwittenburg added a commit to scottwittenburg/spack that referenced this pull request Aug 14, 2018
Also brings check_binaries.py command from spack#8444, and refactors it a
bit to allow it to check a single spec against a single remote binary
mirror.  This way each job can check it's own spec against the mirror
to see if it needs to do its work or just quit.
@scottwittenburg
Copy link
Copy Markdown
Contributor Author

I'm closing this because it has been mostly superseded by #8718

scottwittenburg added a commit to scottwittenburg/spack that referenced this pull request Aug 23, 2018
Also brings check_binaries.py command from spack#8444, and refactors it a
bit to allow it to check a single spec against a single remote binary
mirror.  This way each job can check it's own spec against the mirror
to see if it needs to do its work or just quit.
scottwittenburg added a commit to scottwittenburg/spack that referenced this pull request Oct 12, 2018
Also brings check_binaries.py command from spack#8444, and refactors it a
bit to allow it to check a single spec against a single remote binary
mirror.  This way each job can check it's own spec against the mirror
to see if it needs to do its work or just quit.
scottwittenburg added a commit to scottwittenburg/spack that referenced this pull request Dec 11, 2018
Also brings check_binaries.py command from spack#8444, and refactors it a
bit to allow it to check a single spec against a single remote binary
mirror.  This way each job can check it's own spec against the mirror
to see if it needs to do its work or just quit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants