Auto-Generation of Python Packages from PyPI and Anaconda

@adamjstewart 

The purpose of this issue is to explore an alternative to #2718 that goes beyond simple determination of PyPI URLs, and more toward automatic generation of Spack recipes based scraping the contents of the PyPI and Anaconda databases.

(Please ignore obvious formatting issues for now; eg. proper construction of Spack classnames, version specs in `depends_on()`, etc.  Those are all fixable).

PyPI
-----

Enclosed is a script providing basic auto-conversion from PyPI to Spack.  You can try it out, for example, with:
```
python pypi-to-spack.py pandas | less
```
You can also try it out on other projects: `basemap`, `scipy`, `numpy`, etc.  I've learned the following lessons from experience with this script:

1. Not all packages provide a PyPI URL at all; some are downloaded from non-PyPI sites (see `basemap`).  For those that don't provide such a PyPI URL, there is no PyPI-provided hash; we would have to download it ourselves.
2. Not all packages provide a real usable URL for every version (see `numpy`), even if real useable URLs exist.
3. Usually, all but the last PyPI version is "hidden."  Things look a lot cleaner if you don't show hidden versions.
4. Packages are frequently nearly bereft of dependencies (see `basemap`).  Even when dependencies are provided, they are often provided sporadically; frequently not even for the latest version!  And even on versions where they're provided, dependencies are often incomplete compared to what we have in Spack (see `pandas`).
5. PyPI has 96,000 recipes, most of them junk.  We should not attempt to convert all of them; but rather, we should use PyPI as a source to mine Spack recipes for things we want.  Best of all, we can recursively 

[pypi-to-spack.py.txt](https://github.com/LLNL/spack/files/687694/pypi-to-spack.py.txt)
6. If the PyPI URL works, there is no need for #2718, the full messy URL can be placed, automatically, in the generated `package.py` file.  (And if it doesn't work, there is no PyPI URL to be had; so there isn't a problem)

Anaconda
-----------

Anaconda also has a database of recipes, on a per-version basis.  See, for example, their recipe for `pandas@0.19.2`: https://github.com/ContinuumIO/anaconda-recipes/blob/master/pandas/meta.yaml

Anaconda recipes have a much more complete set of dependencies; separated into `build` and `run`, which match well with Spack's dep mechanism.  On the weaker side, they have no checksums, apparently relying on the goodwill of their download sources.  In any case... it would be reasonable to scrape the dependencies out of the Anaconda recipes to augment information drawn from PyPI above.

Conclusions
-------------

Auto-generating is an intriguing possibility.  It would be most useful as a way to automatically chase down an entire DAG of stuff from PyPI, and create Spack recipes for all packages that have not yet been put into Spack.  Imagine running one `spack create` command end ending up with 10 new packages...

However, there are many caveats:

1. Some Spack recipes have extra stuff that does not come out of PyPI --- env vars that need to be set before running `setup.py`, for example (see `numpy/package.py`).  These will have to be added by hand.

2. It would be nice if there's a way to auto-update auto-generated recipes with new versions in the future.  That should be possible, if we're careful to make them machine-parseable.

3. A similar technique could be used to update existing Python packages with new URLs and versions, from PyPI.  That could eliminate the need for further work on a `pypi` fetch method, moving toward this idea instead.

4. Info from PyPI will need to be augmented from elsewhere to provide accurate dependencies.  That seems to be doable.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto-Generation of Python Packages from PyPI and Anaconda #2749

PyPI

Anaconda

Conclusions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Auto-Generation of Python Packages from PyPI and Anaconda #2749

Description

PyPI

Anaconda

Conclusions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions