R versions by trws · Pull Request #29258 · spack/spack

trws · 2022-03-01T00:37:25Z

Fix issues discussed in #29250.

This change favors urls found in a scraped page over those provided by the package from `url_for_version`. In most cases this doesn't matter, but R specifically returns known bad URLs in some cases, and the fallback path for a failed fetch uses `fetch_remote_versions` to find a substitute. This fixes that problem. fixes spack#29204

Checksum was only actually scraping when called with no versions. It now always scrapes and then selects URLs from the set of URLs known to exist whenever possible. fixes spack#25831

glennpj · 2022-03-01T02:08:48Z

I can confirm that this fixes the issues discussed in #29250, including #29204 and #26977, which could be closed by this. I checked

fetching the version set equal to that in the url, when that archive has been moved on the remote and needs list_url
checksum without a specific version on the command line
checksum with a specific version on the command line.

All seems to work now.

sethrj · 2022-03-08T16:54:03Z

lib/spack/spack/cmd/checksum.py


-    url_dict = {}
+    # Otherwise, see what versions we can find online
+    url_dict = pkg.fetch_remote_versions()


Before this was done only conditionally (no args given) because it can be extremely expensive (e.g. qt which does a lot of spidering). What's the runtime of spack checksum [email protected] on this branch?

It is expensive, but it's also the only way to get correct behavior for packages where the URLs generated by their packages may or may not be correct. The direct answer to your question is that [email protected] takes 3 minutes and 43 seconds.

As far as I can think right now, the only way to deal with this better is to make it illegal for a package to generate broken links, but then R packages can't work. We could work around that by changing the contract for generating URLs from a version to produce a list of URLs, at least one of which must be valid if the version is valid, but that would be a breaking change.

I suppose one intermediate step might be that we can do a check on the provided url and see if we get a 404, and only then fetch remote versions if we hit a failure from the package generated url?

Yuck. Does it take that long to start downloading a valid version, such as 5.15.2? I know the qt "list URLs" is a bit of an outlier but it's going to be ugly if a new version comes out but takes minutes to verify the URL, even if we know it exists.

For spack checksum yes it does, for spack fetch it does not (spack fetch [email protected] takes 16 seconds total with download time), this only gets called if the first URL fetch tries fails. We could do basically the same thing for checksum to shortcut it for valid URLs, it's not quite that simple but it is possible.

Yes, it would be ideal if you could make pkg.fetch_remote_versions a lazy evaluation... it's something of a nuclear option, and because it uses the URL spidering can be unstable -- best if the checksum command only calls it if really necessary.

Ok, so actually it should use the urls attribute rather than the url attribute, that will let both be specified and both will be tried. The way it is now, with one as list_url will also work after this change.

To be more complete about it, the code used to assume that the package could always provide a single url that could be determined to be valid without doing a test fetch. That doesn't really work in the general case, this makes the fallback cases a whole lot more explicit, and uses them in more places. The fetch strategy already did some of this, but checksum and similar did not.

The urls attribute currently fails for fetching the older versions of packages listed in the package file. Interestingly, checksum works for any versions not listed in the package file. Is that something you are trying to fix here as well?

Sorry, I should have pulled the latest changes before posting. With the most recent changes in this PR, spack checksum now works with the urls attribute. Fetching/staging still fails though.

Ok now that's interesting. The change I had but walked back would fix that, I expected current behavior would as well. I'm tempted to say keep it list_url for now, and we can open something to discuss whether the fetch strategy should use my new logic for using fetch checks by default. Honestly I think it should, but there's a unit test that walks every version in every package, and generates fetch strategies for them. That is a recipe for disaster so I walked it back to use just the new url listing stuff, but it should check all the mirrors rather than just dying... that seems like a separate issue though since it dies currently too I guess.

trws · 2022-03-09T00:08:31Z

As far as I can tell, this version now solves the issues, and avoids massive performance regression on heavy spidering packages. @sethrj, @glennpj, you good with this?

lib/spack/spack/cmd/checksum.py

lib/spack/spack/package.py

Co-authored-by: Seth R. Johnson <[email protected]>

lib/spack/spack/package.py

.flake8

scheibelp

I have a couple questions and a couple of requests.

lib/spack/spack/package.py

lib/spack/spack/cmd/checksum.py

lib/spack/spack/test/url_fetch.py

lib/spack/spack/util/web.py

trws · 2022-03-09T20:33:25Z

Ok, I think that takes care of your issues @scheibelp. The one thing that feels
unfinished here is I feel like I ought to be using find_valid_url_for_version
in _from_merged_attrs since the fetchers do not seem to actually try all the
mirrors for some reason and it un-breaks certain kinds of broken package links.
I backed out that change because there's a unit-test that would test-fetch every
single package version in all of spack with that change in there. I still think
it would be a good idea to do it, but it clearly violates an assumption in terms
of expense to oo it right now, so I'm leaving that for later.

trws · 2022-03-10T20:06:37Z

Any chance of a re-review @scheibelp?

lib/spack/spack/package.py

lib/spack/spack/util/web.py

scheibelp · 2022-03-11T01:37:21Z

Thanks for the edits! FYI there appears to be a unit test failing. Let me know if that implies that my suggestion in #29258 (comment) wasn't good - my goal is to avoid duplicate code though so I'm hoping it will work.

I missed this one because we call substitute on a URL that doesn't contain a version component. I'm not sure how that's supposed to work, but apparently it's required by at least one mock package, so back in it goes.

trws · 2022-03-13T23:50:39Z

@scheibelp it looks like the suggestion was just fine, I had missed a substitution in a case where the url doesn't have a version to replace, which seems interesting, but apprently it's required. Aside from the codecov light it's looking green.

lib/spack/spack/util/web.py

trws added 2 commits February 28, 2022 16:33

consider what links actually exist in all cases

48e76dd

Checksum was only actually scraping when called with no versions. It now always scrapes and then selects URLs from the set of URLs known to exist whenever possible. fixes spack#25831

trws requested review from adamjstewart and tldahlgren March 1, 2022 00:37

spackbot-app bot added commands fetching utilities labels Mar 1, 2022

bow to the wrath of flake8

8d90d28

glennpj mentioned this pull request Mar 1, 2022

Fetching not up-to-date versions fails #29204

Closed

3 tasks

trws requested a review from scheibelp March 1, 2022 22:37

scheibelp self-assigned this Mar 1, 2022

glennpj mentioned this pull request Mar 7, 2022

Fix r- package downloads from list_url #29250

Closed

sethrj reviewed Mar 8, 2022

View reviewed changes

test-fetch urls from package, prefer if successful

95c1be6

spackbot-app bot added the tests General test capability(ies) label Mar 8, 2022

sethrj reviewed Mar 9, 2022

View reviewed changes

lib/spack/spack/cmd/checksum.py Outdated Show resolved Hide resolved

lib/spack/spack/package.py Outdated Show resolved Hide resolved

trws and others added 2 commits March 9, 2022 08:33

Update lib/spack/spack/package.py

032674b

Co-authored-by: Seth R. Johnson <[email protected]>

reword as suggested

6b029d2

sethrj previously approved these changes Mar 9, 2022

View reviewed changes

trws enabled auto-merge (squash) March 9, 2022 16:52

adamjstewart reviewed Mar 9, 2022

View reviewed changes

lib/spack/spack/package.py Outdated Show resolved Hide resolved

trws disabled auto-merge March 9, 2022 17:28

re-enable mypy specific ignore and ignore pyflakes

cf9d811

trws dismissed sethrj’s stale review via cf9d811 March 9, 2022 17:33

spackbot-app bot added the flake8 label Mar 9, 2022

adamjstewart reviewed Mar 9, 2022

View reviewed changes

.flake8 Outdated Show resolved Hide resolved

remove flake8 ignore from .flake8

2b8ba61

scheibelp requested changes Mar 9, 2022

View reviewed changes

address review comments

4d6a0a7

scheibelp reviewed Mar 10, 2022

View reviewed changes

lib/spack/spack/package.py Outdated Show resolved Hide resolved

lib/spack/spack/package.py Outdated Show resolved Hide resolved

lib/spack/spack/util/web.py Show resolved Hide resolved

address comments

a8619ad

add sneaky missing substitute

b04f8d3

I missed this one because we call substitute on a URL that doesn't contain a version component. I'm not sure how that's supposed to work, but apparently it's required by at least one mock package, so back in it goes.

scheibelp reviewed Mar 17, 2022

View reviewed changes

lib/spack/spack/util/web.py Show resolved Hide resolved

trws added 2 commits March 17, 2022 15:19

Merge branch 'develop' into r-versions

21c6155

Merge branch 'develop' into r-versions

7f24e40

scheibelp approved these changes Mar 18, 2022

View reviewed changes

Merge branch 'develop' into r-versions

675812a

trws enabled auto-merge (squash) March 18, 2022 23:44

trws merged commit 9e01e17 into spack:develop Mar 19, 2022

glennpj mentioned this pull request Mar 19, 2022

update CRAN R packages #28786

Merged

Conversation

trws commented Mar 1, 2022

Uh oh!

glennpj commented Mar 1, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glennpj Mar 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

trws commented Mar 9, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

scheibelp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

trws commented Mar 9, 2022

Uh oh!

trws commented Mar 10, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

scheibelp commented Mar 11, 2022

Uh oh!

trws commented Mar 13, 2022

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

glennpj Mar 8, 2022 •

edited

Loading