Conversation
This change favors urls found in a scraped page over those provided by the package from `url_for_version`. In most cases this doesn't matter, but R specifically returns known bad URLs in some cases, and the fallback path for a failed fetch uses `fetch_remote_versions` to find a substitute. This fixes that problem. fixes spack#29204
Checksum was only actually scraping when called with no versions. It now always scrapes and then selects URLs from the set of URLs known to exist whenever possible. fixes spack#25831
|
I can confirm that this fixes the issues discussed in #29250, including #29204 and #26977, which could be closed by this. I checked
All seems to work now. |
lib/spack/spack/cmd/checksum.py
Outdated
|
|
||
| url_dict = {} | ||
| # Otherwise, see what versions we can find online | ||
| url_dict = pkg.fetch_remote_versions() |
There was a problem hiding this comment.
Before this was done only conditionally (no args given) because it can be extremely expensive (e.g. qt which does a lot of spidering). What's the runtime of spack checksum [email protected] on this branch?
There was a problem hiding this comment.
It is expensive, but it's also the only way to get correct behavior for packages where the URLs generated by their packages may or may not be correct. The direct answer to your question is that [email protected] takes 3 minutes and 43 seconds.
As far as I can think right now, the only way to deal with this better is to make it illegal for a package to generate broken links, but then R packages can't work. We could work around that by changing the contract for generating URLs from a version to produce a list of URLs, at least one of which must be valid if the version is valid, but that would be a breaking change.
I suppose one intermediate step might be that we can do a check on the provided url and see if we get a 404, and only then fetch remote versions if we hit a failure from the package generated url?
There was a problem hiding this comment.
Yuck. Does it take that long to start downloading a valid version, such as 5.15.2? I know the qt "list URLs" is a bit of an outlier but it's going to be ugly if a new version comes out but takes minutes to verify the URL, even if we know it exists.
There was a problem hiding this comment.
For spack checksum yes it does, for spack fetch it does not (spack fetch [email protected] takes 16 seconds total with download time), this only gets called if the first URL fetch tries fails. We could do basically the same thing for checksum to shortcut it for valid URLs, it's not quite that simple but it is possible.
There was a problem hiding this comment.
Yes, it would be ideal if you could make pkg.fetch_remote_versions a lazy evaluation... it's something of a nuclear option, and because it uses the URL spidering can be unstable -- best if the checksum command only calls it if really necessary.
There was a problem hiding this comment.
Ok, so actually it should use the urls attribute rather than the url attribute, that will let both be specified and both will be tried. The way it is now, with one as list_url will also work after this change.
There was a problem hiding this comment.
To be more complete about it, the code used to assume that the package could always provide a single url that could be determined to be valid without doing a test fetch. That doesn't really work in the general case, this makes the fallback cases a whole lot more explicit, and uses them in more places. The fetch strategy already did some of this, but checksum and similar did not.
There was a problem hiding this comment.
The urls attribute currently fails for fetching the older versions of packages listed in the package file. Interestingly, checksum works for any versions not listed in the package file. Is that something you are trying to fix here as well?
There was a problem hiding this comment.
Sorry, I should have pulled the latest changes before posting. With the most recent changes in this PR, spack checksum now works with the urls attribute. Fetching/staging still fails though.
There was a problem hiding this comment.
Ok now that's interesting. The change I had but walked back would fix that, I expected current behavior would as well. I'm tempted to say keep it list_url for now, and we can open something to discuss whether the fetch strategy should use my new logic for using fetch checks by default. Honestly I think it should, but there's a unit test that walks every version in every package, and generates fetch strategies for them. That is a recipe for disaster so I walked it back to use just the new url listing stuff, but it should check all the mirrors rather than just dying... that seems like a separate issue though since it dies currently too I guess.
Co-authored-by: Seth R. Johnson <[email protected]>
scheibelp
left a comment
There was a problem hiding this comment.
I have a couple questions and a couple of requests.
|
Ok, I think that takes care of your issues @scheibelp. The one thing that feels |
|
Any chance of a re-review @scheibelp? |
|
Thanks for the edits! FYI there appears to be a unit test failing. Let me know if that implies that my suggestion in #29258 (comment) wasn't good - my goal is to avoid duplicate code though so I'm hoping it will work. |
I missed this one because we call substitute on a URL that doesn't contain a version component. I'm not sure how that's supposed to work, but apparently it's required by at least one mock package, so back in it goes.
|
@scheibelp it looks like the suggestion was just fine, I had missed a substitution in a case where the url doesn't have a version to replace, which seems interesting, but apprently it's required. Aside from the codecov light it's looking green. |
Fix issues discussed in #29250.