Fix r- package downloads from list_url#29250
Conversation
Fixes spack#26977 Fixes spack#29204 When the cran attribute is set, a url attribute is derived. This was using the first version found in the array of versions. The expectation wsa that if that version was not present at the URL that it would fall back to list_url. That was not happening. Instead, the version in the derived url was always associated with the url and was not falling back to list_url. Thus, if the most recent version of a package in spack was older than what was in CRAN, the download would fail, because the archive had been moved. Using an older version for the derived url would work but sometimes there is only one version present in spack until the package is updated. Instead, use a fake version that would never exist in the wild so that spack does not lock what it thinks is the most current version to the url attribute.
|
This change will probably impact url parsing stats. |
|
This seems more like a workaround than a solution. If a version isn't found at the specified URL, we should always fall back to |
That was the expectation but I can not say for sure that the fallback worked in the past within the window of CRAN having a newer version than what was newest in spack, with the url pointing to it. Generally speaking, the url pattern points to an older version when a new version is added. That was fairly consistent in the r- packages. With the introduction of the |
|
It's possible, I tried to reproduce the original behavior whenever the version generated didn't match anything on the page, but maybe there's a corner case where the archive URL specified is inaccurate? The behavior I intended, before I go look back at it, was to treat the archive URLs specified by a package as accurate, and not change them based on what's on the page. If the URL specified by the package is wrong, that might be different behavior than before because the old behavior was something like "use the last plausibly matching version found on the page." Let me poke at this for a minute, I didn't test this directly outside of checksum and create. |
|
Doing some more poking, if I set the url attribute to an older version, fetching fails for that version as well. So, it is not an issue of being the newest version in the package file but rather any version that the url is set to can not be fetched as the fallback is broken. I see this with tcsh as well so this is not specific to r-packages. So, yes, this PR is a workaround to get r- packages working, actually leveraging the @adamjstewart Does this sound okay?
|
|
Ok, I found at least one bug that's causing part of this problem. Have a look here: spack/lib/spack/spack/cmd/checksum.py Lines 77 to 85 in 6eef12c Those lines overwrite the selections from my updates in web, to incorrect values for these packages. For ellipsis, before removing those: After removing: That still doesn't resolve the specific version issue though, which uses a completely different codepath than what I touched. I don't see anything in here that looks like a fallback though. Still looking. |
|
Yeah, I see it. This one is caused by the fact list_url isn't ever actually checked, there's no fallback to it, there's a fallback to a dynamic fetch strategy that's set up to use fetch_remote_versions which uses list_url internally. My changes made the urls provided by the package get prioritized over versions found by web scraping in this one corner case, PR forthcoming for both issues. |
|
Closing in favor of #29258. |
Fixes #29204
When the cran attribute is set, a url attribute is derived. This was
using the first version found in the array of versions. The expectation
wsa that if that version was not present at the URL that it would fall
back to list_url. That was not happening. Instead, the version in the
derived url was always associated with the url and was not falling back
to list_url. Thus, if the most recent version of a package in spack was
older than what was in CRAN, the download would fail, because the
archive had been moved. Using an older version for the derived url
would work but sometimes there is only one version present in spack
until the package is updated. Instead, use a fake version that would
never exist in the wild so that spack does not lock what it thinks is
the most current version to the url attribute.