Skip to content

R versions#29258

Merged
trws merged 14 commits intospack:developfrom
trws:r-versions
Mar 19, 2022
Merged

R versions#29258
trws merged 14 commits intospack:developfrom
trws:r-versions

Conversation

@trws
Copy link
Copy Markdown
Contributor

@trws trws commented Mar 1, 2022

Fix issues discussed in #29250.

trws added 2 commits February 28, 2022 16:33
This change favors urls found in a scraped page over those provided by
the package from `url_for_version`.  In most cases this doesn't matter,
but R specifically returns known bad URLs in some cases, and the
fallback path for a failed fetch uses `fetch_remote_versions` to find a
substitute.  This fixes that problem.

fixes spack#29204
Checksum was only actually scraping when called with no versions.  It
now always scrapes and then selects URLs from the set of URLs known to
exist whenever possible.

fixes spack#25831
@glennpj
Copy link
Copy Markdown
Contributor

glennpj commented Mar 1, 2022

I can confirm that this fixes the issues discussed in #29250, including #29204 and #26977, which could be closed by this. I checked

  1. fetching the version set equal to that in the url, when that archive has been moved on the remote and needs list_url
  2. checksum without a specific version on the command line
  3. checksum with a specific version on the command line.

All seems to work now.

@trws trws requested a review from scheibelp March 1, 2022 22:37
@scheibelp scheibelp self-assigned this Mar 1, 2022

url_dict = {}
# Otherwise, see what versions we can find online
url_dict = pkg.fetch_remote_versions()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this was done only conditionally (no args given) because it can be extremely expensive (e.g. qt which does a lot of spidering). What's the runtime of spack checksum [email protected] on this branch?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is expensive, but it's also the only way to get correct behavior for packages where the URLs generated by their packages may or may not be correct. The direct answer to your question is that [email protected] takes 3 minutes and 43 seconds.

As far as I can think right now, the only way to deal with this better is to make it illegal for a package to generate broken links, but then R packages can't work. We could work around that by changing the contract for generating URLs from a version to produce a list of URLs, at least one of which must be valid if the version is valid, but that would be a breaking change.

I suppose one intermediate step might be that we can do a check on the provided url and see if we get a 404, and only then fetch remote versions if we hit a failure from the package generated url?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yuck. Does it take that long to start downloading a valid version, such as 5.15.2? I know the qt "list URLs" is a bit of an outlier but it's going to be ugly if a new version comes out but takes minutes to verify the URL, even if we know it exists.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For spack checksum yes it does, for spack fetch it does not (spack fetch [email protected] takes 16 seconds total with download time), this only gets called if the first URL fetch tries fails. We could do basically the same thing for checksum to shortcut it for valid URLs, it's not quite that simple but it is possible.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it would be ideal if you could make pkg.fetch_remote_versions a lazy evaluation... it's something of a nuclear option, and because it uses the URL spidering can be unstable -- best if the checksum command only calls it if really necessary.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so actually it should use the urls attribute rather than the url attribute, that will let both be specified and both will be tried. The way it is now, with one as list_url will also work after this change.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be more complete about it, the code used to assume that the package could always provide a single url that could be determined to be valid without doing a test fetch. That doesn't really work in the general case, this makes the fallback cases a whole lot more explicit, and uses them in more places. The fetch strategy already did some of this, but checksum and similar did not.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The urls attribute currently fails for fetching the older versions of packages listed in the package file. Interestingly, checksum works for any versions not listed in the package file. Is that something you are trying to fix here as well?

Copy link
Copy Markdown
Contributor

@glennpj glennpj Mar 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I should have pulled the latest changes before posting. With the most recent changes in this PR, spack checksum now works with the urls attribute. Fetching/staging still fails though.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok now that's interesting. The change I had but walked back would fix that, I expected current behavior would as well. I'm tempted to say keep it list_url for now, and we can open something to discuss whether the fetch strategy should use my new logic for using fetch checks by default. Honestly I think it should, but there's a unit test that walks every version in every package, and generates fetch strategies for them. That is a recipe for disaster so I walked it back to use just the new url listing stuff, but it should check all the mirrors rather than just dying... that seems like a separate issue though since it dies currently too I guess.

@spackbot-app spackbot-app bot added the tests General test capability(ies) label Mar 8, 2022
@trws
Copy link
Copy Markdown
Contributor Author

trws commented Mar 9, 2022

As far as I can tell, this version now solves the issues, and avoids massive performance regression on heavy spidering packages. @sethrj, @glennpj, you good with this?

sethrj
sethrj previously approved these changes Mar 9, 2022
@trws trws enabled auto-merge (squash) March 9, 2022 16:52
@trws trws disabled auto-merge March 9, 2022 17:28
Copy link
Copy Markdown
Member

@scheibelp scheibelp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a couple questions and a couple of requests.

@trws
Copy link
Copy Markdown
Contributor Author

trws commented Mar 9, 2022

Ok, I think that takes care of your issues @scheibelp. The one thing that feels
unfinished here is I feel like I ought to be using find_valid_url_for_version
in _from_merged_attrs since the fetchers do not seem to actually try all the
mirrors for some reason and it un-breaks certain kinds of broken package links.
I backed out that change because there's a unit-test that would test-fetch every
single package version in all of spack with that change in there. I still think
it would be a good idea to do it, but it clearly violates an assumption in terms
of expense to oo it right now, so I'm leaving that for later.

@trws
Copy link
Copy Markdown
Contributor Author

trws commented Mar 10, 2022

Any chance of a re-review @scheibelp?

@scheibelp
Copy link
Copy Markdown
Member

Thanks for the edits! FYI there appears to be a unit test failing. Let me know if that implies that my suggestion in #29258 (comment) wasn't good - my goal is to avoid duplicate code though so I'm hoping it will work.

I missed this one because we call substitute on a URL that doesn't
contain a version component.  I'm not sure how that's supposed to work,
but apparently it's required by at least one mock package, so back in it
goes.
@trws
Copy link
Copy Markdown
Contributor Author

trws commented Mar 13, 2022

@scheibelp it looks like the suggestion was just fine, I had missed a substitution in a case where the url doesn't have a version to replace, which seems interesting, but apprently it's required. Aside from the codecov light it's looking green.

@trws trws enabled auto-merge (squash) March 18, 2022 23:44
@trws trws merged commit 9e01e17 into spack:develop Mar 19, 2022
@glennpj glennpj mentioned this pull request Mar 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants