Skip to content

Fix r- package downloads from list_url#29250

Closed
glennpj wants to merge 1 commit intospack:developfrom
glennpj:r_downloads
Closed

Fix r- package downloads from list_url#29250
glennpj wants to merge 1 commit intospack:developfrom
glennpj:r_downloads

Conversation

@glennpj
Copy link
Copy Markdown
Contributor

@glennpj glennpj commented Feb 28, 2022

Fixes #29204

When the cran attribute is set, a url attribute is derived. This was
using the first version found in the array of versions. The expectation
wsa that if that version was not present at the URL that it would fall
back to list_url. That was not happening. Instead, the version in the
derived url was always associated with the url and was not falling back
to list_url. Thus, if the most recent version of a package in spack was
older than what was in CRAN, the download would fail, because the
archive had been moved. Using an older version for the derived url
would work but sometimes there is only one version present in spack
until the package is updated. Instead, use a fake version that would
never exist in the wild so that spack does not lock what it thinks is
the most current version to the url attribute.

Fixes spack#26977
Fixes spack#29204

When the cran attribute is set, a url attribute is derived. This was
using the first version found in the array of versions. The expectation
wsa that if that version was not present at the URL that it would fall
back to list_url. That was not happening. Instead, the version in the
derived url was always associated with the url and was not falling back
to list_url. Thus, if the most recent version of a package in spack was
older than what was in CRAN, the download would fail, because the
archive had been moved. Using an older version for the derived url
would work but sometimes there is only one version present in spack
until the package is updated. Instead, use a fake version that would
never exist in the wild so that spack does not lock what it thinks is
the most current version to the url attribute.
@glennpj
Copy link
Copy Markdown
Contributor Author

glennpj commented Feb 28, 2022

This change will probably impact url parsing stats.

@adamjstewart
Copy link
Copy Markdown
Member

This seems more like a workaround than a solution. If a version isn't found at the specified URL, we should always fall back to list_url. @trws could this be caused by your recent spack checksum changes?

@glennpj
Copy link
Copy Markdown
Contributor Author

glennpj commented Feb 28, 2022

This seems more like a workaround than a solution. If a version isn't found at the specified URL, we should always fall back to list_url

That was the expectation but I can not say for sure that the fallback worked in the past within the window of CRAN having a newer version than what was newest in spack, with the url pointing to it. Generally speaking, the url pattern points to an older version when a new version is added. That was fairly consistent in the r- packages. With the introduction of the cran attribute the url was always getting set to point to the most recent version that spack knew about. Prior to the cran attribute that scenario is not something that would have been tested. Now that the cran attribute has been added to all CRAN packages, the problem will be more prevalent. My goal with this PR is to get r- package downloads working ASAP. I understand that there could be a more systemic problem that needs to be dealt with.

@glennpj
Copy link
Copy Markdown
Contributor Author

glennpj commented Feb 28, 2022

@trws could this be caused by your recent spack checksum changes?

I do not think so because #26977 was prior to that. At the time of that issue there were only a few r- packages using the cran attribute.

@trws
Copy link
Copy Markdown
Contributor

trws commented Feb 28, 2022

It's possible, I tried to reproduce the original behavior whenever the version generated didn't match anything on the page, but maybe there's a corner case where the archive URL specified is inaccurate?

The behavior I intended, before I go look back at it, was to treat the archive URLs specified by a package as accurate, and not change them based on what's on the page. If the URL specified by the package is wrong, that might be different behavior than before because the old behavior was something like "use the last plausibly matching version found on the page." Let me poke at this for a minute, I didn't test this directly outside of checksum and create.

@glennpj
Copy link
Copy Markdown
Contributor Author

glennpj commented Feb 28, 2022

Doing some more poking, if I set the url attribute to an older version, fetching fails for that version as well. So, it is not an issue of being the newest version in the package file but rather any version that the url is set to can not be fetched as the fallback is broken. I see this with tcsh as well so this is not specific to r-packages.

So, yes, this PR is a workaround to get r- packages working, actually leveraging the cran attribute to fake the version to ensure that the version does not equal what is in the url attribute. I am going to try to gather more information and create an issue specific for the failure to fall back to list_url.

@adamjstewart Does this sound okay?

  1. Get r-packages working for now because there will be more breaking as time goes on, and there are a lot of them.
  2. Have an issue focused on url/list_url, not specific to r-packages, as this is not specific to R.
  3. Revert this PR once things are fixed?

@trws
Copy link
Copy Markdown
Contributor

trws commented Feb 28, 2022

Ok, I found at least one bug that's causing part of this problem. Have a look here:

# And ensure the specified version URLs take precedence, if available
try:
explicit_dict = {}
for v in pkg.versions:
if not v.isdevelop():
explicit_dict[v] = pkg.url_for_version(v)
url_dict.update(explicit_dict)
except spack.package.NoURLError:
pass

Those lines overwrite the selections from my updates in web, to incorrect values for these packages. For ellipsis, before removing those:

==> [2022-02-28-15:37:02.702619] Found 8 versions of r-ellipsis:

  0.3.2    https://cloud.r-project.org/src/contrib/ellipsis_0.3.2.tar.gz
  0.3.1    https://cloud.r-project.org/src/contrib/ellipsis_0.3.1.tar.gz
  0.3.0    https://cloud.r-project.org/src/contrib/ellipsis_0.3.0.tar.gz
  0.2.0.1  https://cloud.r-project.org/src/contrib/ellipsis_0.2.0.1.tar.gz
  0.2.0    https://cloud.r-project.org/src/contrib/Archive/ellipsis/ellipsis_0.2.0.tar.gz
  0.1.0    https://cloud.r-project.org/src/contrib/Archive/ellipsis/ellipsis_0.1.0.tar.gz
  0.0.2    https://cloud.r-project.org/src/contrib/Archive/ellipsis/ellipsis_0.0.2.tar.gz
  0.0.1    https://cloud.r-project.org/src/contrib/Archive/ellipsis/ellipsis_0.0.1.tar.gz

After removing:

  0.3.2    https://cloud.r-project.org/src/contrib/ellipsis_0.3.2.tar.gz
  0.3.1    https://cloud.r-project.org/src/contrib/Archive/ellipsis/ellipsis_0.3.1.tar.gz
  0.3.0    https://cloud.r-project.org/src/contrib/Archive/ellipsis/ellipsis_0.3.0.tar.gz
  0.2.0.1  https://cloud.r-project.org/src/contrib/Archive/ellipsis/ellipsis_0.2.0.1.tar.gz
  0.2.0    https://cloud.r-project.org/src/contrib/Archive/ellipsis/ellipsis_0.2.0.tar.gz
  0.1.0    https://cloud.r-project.org/src/contrib/Archive/ellipsis/ellipsis_0.1.0.tar.gz
  0.0.2    https://cloud.r-project.org/src/contrib/Archive/ellipsis/ellipsis_0.0.2.tar.gz
  0.0.1    https://cloud.r-project.org/src/contrib/Archive/ellipsis/ellipsis_0.0.1.tar.gz

That still doesn't resolve the specific version issue though, which uses a completely different codepath than what I touched. I don't see anything in here that looks like a fallback though. Still looking.

@glennpj
Copy link
Copy Markdown
Contributor Author

glennpj commented Feb 28, 2022

Here is some more info. I created a branch to the commit just prior to #28989. That gets fetching from list_url working again for r-packages and tcsh.

I also realized that #26977 is actually a different problem, and thus not fixed by this PR workaround.

@glennpj
Copy link
Copy Markdown
Contributor Author

glennpj commented Mar 1, 2022

@trws The bug you found is #25831, which was also discussed a bit in #26977.

@trws
Copy link
Copy Markdown
Contributor

trws commented Mar 1, 2022

Yeah, I see it. This one is caused by the fact list_url isn't ever actually checked, there's no fallback to it, there's a fallback to a dynamic fetch strategy that's set up to use fetch_remote_versions which uses list_url internally. My changes made the urls provided by the package get prioritized over versions found by web scraping in this one corner case, PR forthcoming for both issues.

@trws trws mentioned this pull request Mar 1, 2022
@glennpj
Copy link
Copy Markdown
Contributor Author

glennpj commented Mar 7, 2022

Closing in favor of #29258.

@glennpj glennpj closed this Mar 7, 2022
@glennpj glennpj deleted the r_downloads branch March 7, 2022 01:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fetching not up-to-date versions fails

3 participants