Skip to content

Auto-Generation of Python Packages from PyPI and Anaconda #2749

@citibeth

Description

@citibeth

@adamjstewart

The purpose of this issue is to explore an alternative to #2718 that goes beyond simple determination of PyPI URLs, and more toward automatic generation of Spack recipes based scraping the contents of the PyPI and Anaconda databases.

(Please ignore obvious formatting issues for now; eg. proper construction of Spack classnames, version specs in depends_on(), etc. Those are all fixable).

PyPI

Enclosed is a script providing basic auto-conversion from PyPI to Spack. You can try it out, for example, with:

python pypi-to-spack.py pandas | less

You can also try it out on other projects: basemap, scipy, numpy, etc. I've learned the following lessons from experience with this script:

  1. Not all packages provide a PyPI URL at all; some are downloaded from non-PyPI sites (see basemap). For those that don't provide such a PyPI URL, there is no PyPI-provided hash; we would have to download it ourselves.
  2. Not all packages provide a real usable URL for every version (see numpy), even if real useable URLs exist.
  3. Usually, all but the last PyPI version is "hidden." Things look a lot cleaner if you don't show hidden versions.
  4. Packages are frequently nearly bereft of dependencies (see basemap). Even when dependencies are provided, they are often provided sporadically; frequently not even for the latest version! And even on versions where they're provided, dependencies are often incomplete compared to what we have in Spack (see pandas).
  5. PyPI has 96,000 recipes, most of them junk. We should not attempt to convert all of them; but rather, we should use PyPI as a source to mine Spack recipes for things we want. Best of all, we can recursively

pypi-to-spack.py.txt
6. If the PyPI URL works, there is no need for #2718, the full messy URL can be placed, automatically, in the generated package.py file. (And if it doesn't work, there is no PyPI URL to be had; so there isn't a problem)

Anaconda

Anaconda also has a database of recipes, on a per-version basis. See, for example, their recipe for [email protected]: https://github.com/ContinuumIO/anaconda-recipes/blob/master/pandas/meta.yaml

Anaconda recipes have a much more complete set of dependencies; separated into build and run, which match well with Spack's dep mechanism. On the weaker side, they have no checksums, apparently relying on the goodwill of their download sources. In any case... it would be reasonable to scrape the dependencies out of the Anaconda recipes to augment information drawn from PyPI above.

Conclusions

Auto-generating is an intriguing possibility. It would be most useful as a way to automatically chase down an entire DAG of stuff from PyPI, and create Spack recipes for all packages that have not yet been put into Spack. Imagine running one spack create command end ending up with 10 new packages...

However, there are many caveats:

  1. Some Spack recipes have extra stuff that does not come out of PyPI --- env vars that need to be set before running setup.py, for example (see numpy/package.py). These will have to be added by hand.

  2. It would be nice if there's a way to auto-update auto-generated recipes with new versions in the future. That should be possible, if we're careful to make them machine-parseable.

  3. A similar technique could be used to update existing Python packages with new URLs and versions, from PyPI. That could eliminate the need for further work on a pypi fetch method, moving toward this idea instead.

  4. Info from PyPI will need to be augmented from elsewhere to provide accurate dependencies. That seems to be doable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions