Skip to content

PythonPackage: add pypi attribute to infer homepage/url/list_url#17587

Merged
alalazo merged 7 commits intospack:developfrom
adamjstewart:features/pypi-versions
Dec 29, 2020
Merged

PythonPackage: add pypi attribute to infer homepage/url/list_url#17587
alalazo merged 7 commits intospack:developfrom
adamjstewart:features/pypi-versions

Conversation

@adamjstewart
Copy link
Copy Markdown
Member

@adamjstewart adamjstewart commented Jul 19, 2020

I just discovered https://pypi.org/simple/ and PEP 503. This finally allows us to use spack versions and spack checksum on PyPI packages properly! For example:

Before

$ spack versions py-numpy
==> Safe versions (already checksummed):
  master  1.18.4  1.18.1  1.17.4  1.17.1  1.16.5  1.16.2  1.15.4  1.15.1  1.14.5  1.14.2  1.13.3  1.12.1  1.11.2  1.10.4  1.9.1
  1.19.0  1.18.3  1.18.0  1.17.3  1.17.0  1.16.4  1.16.1  1.15.3  1.15.0  1.14.4  1.14.1  1.13.1  1.12.0  1.11.1  1.9.3
  1.18.5  1.18.2  1.17.5  1.17.2  1.16.6  1.16.3  1.16.0  1.15.2  1.14.6  1.14.3  1.14.0  1.13.0  1.11.3  1.11.0  1.9.2
==> Remote versions (not yet checksummed):
==> Warning: Found no unchecksummed versions for py-numpy

After

$ spack versions py-numpy
==> Safe versions (already checksummed):
  master  1.18.4  1.18.1  1.17.4  1.17.1  1.16.5  1.16.2  1.15.4  1.15.1  1.14.5  1.14.2  1.13.3  1.12.1  1.11.2  1.10.4  1.9.1
  1.19.0  1.18.3  1.18.0  1.17.3  1.17.0  1.16.4  1.16.1  1.15.3  1.15.0  1.14.4  1.14.1  1.13.1  1.12.0  1.11.1  1.9.3
  1.18.5  1.18.2  1.17.5  1.17.2  1.16.6  1.16.3  1.16.0  1.15.2  1.14.6  1.14.3  1.14.0  1.13.0  1.11.3  1.11.0  1.9.2
==> Remote versions (not yet checksummed):
  1.19.0rc2  1.17.0rc2  1.16.0rc1  1.14.0rc1  1.12.1rc1  1.12.0b1   1.11.0rc2  1.10.2        1.9.0  1.8.0  1.7.0  1.6.0
  1.19.0rc1  1.17.0rc1  1.15.0rc2  1.13.0rc2  1.12.0rc2  1.11.2rc1  1.11.0rc1  1.10.1        1.8.2  1.7.2  1.6.2
  1.18.0rc1  1.16.0rc2  1.15.0rc1  1.13.0rc1  1.12.0rc1  1.11.1rc1  1.11.0b3   1.10.0.post2  1.8.1  1.7.1  1.6.1

This PR modifies PythonPackage to accept a pypi variable that defines the homepage, url, and list_url for every Python package like @alalazo and others have done for GNU/SourceForge/etc. Progress so far:

  • Add pypi variable to PythonPackage base class
  • Auto-populate homepage, url, and list_url if pypi is set
  • Update spack create to set pypi instead of url
  • Add spack versions/checksum/create support for PyPI URLs
  • Update PythonPackage documentation

If this works well, we can do the same for cran, cpan, and other similar language-specific software repositories.

Closes #2281
Closes #2335
Alternative to #2718

@adamjstewart adamjstewart force-pushed the features/pypi-versions branch from 91497e2 to 36d13e3 Compare December 22, 2020 02:03
Copy link
Copy Markdown
Contributor

@cosmicexplorer cosmicexplorer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really really awesome direction to see and it made me excited right as I was about to close my laptop! I think this was a very well-executed change.

Separately, broke out #20504 in case that could make this kind of pattern more robust by not pinning packages to the global pypi repo. I don't think that should block this workstream at all, but rather should be considered as part of a separate arc for options and configuration in general.


if not hasattr(self, 'list_depth'):
self.list_depth = 0

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great change.

# However, SourceForge downloads still need to end in '/download'.
url_regex += r'(\/download)?$'
url_regex += r'(\/download)?'
# PyPI adds #sha256=... to the end of the URL
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocker and I would have made the same choice, but if I extend this to CRAN/CPAN, I think that point would be a good time to break some of this method out by repo.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could make this more generic and use #... but there are so many URL formats that I'm not sure if this is a good idea.

Copy link
Copy Markdown
Member

@becker33 becker33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM

Copy link
Copy Markdown
Member

@tgamblin tgamblin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like everything about this.

return 'https://pypi.org/project/' + name + '/'

@property
def url(self):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I love this. allows non-pypi packages to keep working just fine.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to add:

else:
    raise AttributeError('Must define `pypi` or `url`')

for these fields? We do this for other mixins like sourceforce/gnu mirrors. But I don't think it's strictly necessary.

@adamjstewart adamjstewart changed the title Add support for spack versions/checksum on PyPI PythonPackage: add pypi attribute to infer homepage/url/list_url Dec 22, 2020
Comment on lines +62 to +63
Note: this function is called by `spack versions` and `spack checksum`
but not by `spack fetch` or `spack install`.
Copy link
Copy Markdown
Member Author

@adamjstewart adamjstewart Dec 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@becker33 This is another complaint I have about our current API. The URL processing logic for spack versions/checksum/create and spack fetch/install is almost entirely separate. This leads to a lot of duplication, and a lot of logic that only exists in one or another.

@adamjstewart
Copy link
Copy Markdown
Member Author

adamjstewart commented Dec 23, 2020

Hit a new sticking point with this. PyPI has a lot of URL formats!

Simple PyPI URLs:

PyPI URLs containing hash:

It basically boils down to the following formats:

https://<hostname>/packages/<type>/<first character of project>/<project>/<download file>
https://<hostname>/packages/<two character hash>/<two character hash>/<longer hash>/<download file>

There are two problems with this:

  1. <type> is not always source. For wheels, it can be anything like py3, cp37, py2.py3, etc.
  2. PyPI URLs containing a hash do not contain the <project> or <type>. <project> usually matches the <download file> name, but not always. For example, opencensus-context/opencensus_context-0.1.1-py2.py3-none-any.whl

We could add <type> to the pypi specification, like pypi = 'source/numpy/numpy-1.19.4.zip', but we would have to guess source for all hashed URLs. Also, PythonPackage doesn't currently support wheels, you have to extend Package, so I think I'm going to ignore wheels for now.

For now, we'll just have to guess the correct <project> from the hashed URL. It's usually the same...

# https://<hostname>/packages/<two character hash>/<two character hash>/<longer hash>/<download file>
# e.g. https://pypi.io/packages/c5/63/a48648ebc57711348420670bb074998f79828291f68aebfff1642be212ec/numpy-1.19.4.zip
# e.g. https://files.pythonhosted.org/packages/c5/63/a48648ebc57711348420670bb074998f79828291f68aebfff1642be212ec/numpy-1.19.4.zip
# e.g. https://files.pythonhosted.org/packages/c5/63/a48648ebc57711348420670bb074998f79828291f68aebfff1642be212ec/numpy-1.19.4.zip#sha256=141ec3a3300ab89c7f2b0775289954d193cc8edb621ea05f99db9cb181530512
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested spack create -f with all of the above URLs

# Peek inside the compressed file.
if stage.archive_file.endswith('.zip'):
if (stage.archive_file.endswith('.zip') or
'.zip#' in stage.archive_file):
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The archive_file gets saved with the #sha256=... in the filename. If there is a way to fix that on the stage.py side of things, I would prefer that.

@adamjstewart adamjstewart marked this pull request as ready for review December 23, 2020 19:10
@adamjstewart adamjstewart marked this pull request as draft December 23, 2020 19:11
@adamjstewart adamjstewart marked this pull request as ready for review December 23, 2020 20:00
@adamjstewart
Copy link
Copy Markdown
Member Author

Okay, I think this is ready for a final review. I would like to update existing Python packages in a follow-up PR if possible.

@adamjstewart adamjstewart requested a review from alalazo December 28, 2020 15:08
@alalazo alalazo merged commit 05f8e08 into spack:develop Dec 29, 2020
@adamjstewart adamjstewart deleted the features/pypi-versions branch December 29, 2020 16:07
@adamjstewart
Copy link
Copy Markdown
Member Author

@cosmicexplorer feel free to submit similar PRs for CRAN and CPAN. I'm going to convert the rest of PyPI URLs to pypi in a follow-up PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature A feature is missing in Spack fetching python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Spack doesn't treat pypi.io links properly Fix PyPI Downloads --- The Right Way

5 participants