WIP: Install python applications into virtualenvs#8364
WIP: Install python applications into virtualenvs#8364hartzell wants to merge 2 commits intospack:developfrom
Conversation
- able to install httpie (and it's prereqs), bumpversion and py-flake into virtualenvs rooted at their prefix + /libexec. - their prefix + /.spack dir isn't being created, so the final log writing step's commented out. I couldn't find a good way to pass the fact that a virtual environment was/should be used to all of the moving bits of machinery and for the sake of prototyping I just stuffed it into the environment (`VENV_PATH`). Wheee....
|
IF you haven't done so already, consider adding a variant like this to (I
think) `py-virtualenv`:
variant('copy', values=str, default='standard',
description='Allow multiple copies of the "same" spec to be
installed')
This will allow you to install as many `py-virtualenv` instances as you
like, with different stuff in each one of them.
|
|
[edits: typos and clarification]
I'm not sure what that would For Python3, the equivalent library is in the core ( But how we install the virtual-environment making tool is an implementation detail. Each application should have its own virtualenv tucked inside its prefix (which is how Homebrew does it). That virtualenv is created by It would be nice if e.g. the httpie application could say The others would continue to blithely use But, I'm afraid that there's some sort of concretization incompatibility with having different "flavors" of python dependencies (one +venv, the others not) within the same spec. |
|
I didn't have a chance to look at the implementation yet, so some of these questions might clear up once I read through it. These are from what I thought while reading through you proposal without particular order:
Regarding the concretisation issue: I don't think that should stop us developing a new idea (if it is useful) @scheibelp and @tgamblin should probably chime in and say if this conflicts with the new concretiser™️ On a more general note: I mentioned this in #8360 but it also fits here. I don't think that our current implementation of the way we expose installed packages to users really fits to spack. spack's core is the DAG and intuitively I always expected a Now I don't know enough about |
|
I'm pulling some quotes out of order from @healthers comments so that I can use them to support the narrative. Thanks for the thoughts!
Don't dig too deeply into the implementation, it's a hack that I pulled off with blunt instruments to convince myself that it the idea might be workable. The idea, now that might be golden. Or not....
The virtualenv docs give a nice introduction, starting with:
Imagine back in the simple olden days, there was one installation of python, things are installed with/into it, and everyone sees the same everything. If you wanted isolation, you might install another copy of python somewhere else, and when you used it's Python virtual environments are a middle ground. They share nearly all of a single Python installation but when one virtualenv is "activated", anything installed by that virtual environment's python tool chain ends up inside that virtual environment. If you have two environments that use The implementation is an elegant hack, the video Reverse-engineering Ian Bicking's brain does a marvelous job of explaining how it works (worked, Python3 is different?).
While I'd like to Fix All The Things(tm), I haven't figured out how. This proposal offers one thing, a reliable way to install python applications. The "application-ness" of a Spack python package would be declared by the package author/maintainer, although there might be a variant knob (below). The easiest thing to do would make it either-or. Packages that provide libraries aren't applications. Packages that provide executables are applications. Bumpversion is an application. It uses a Python package or two and is composed of it's own Python bits. It doesn't offer the end user anything beyond the "binary". It probably would have been easier if were in C/Go/..., but... I'm not entirely sure what to do with packages that offer applications and libraries. In the model that I'm proposing, peeking inside the virtualenv would be against the rules, so no peeking! As part of the ongoing Go dependency management/vendoring discussion, it's become clear that it's bad design for something to be both a library and an application. Their vendoring headache is similar to what we're hitting here. But fixing the Python ecosystem might take some time, so....
I don't think that Homebrew installs python libraries. If you're building a Python project that uses libraries, they'll leave you at the mercy of the Python (Perl, R, ...) ecosystem. They'll install "programs" though, and track the libraries they require as "resources" that are check-summed and etc... But that won't work for us, without changing Spacks modus operandi. Now that you mention it though.... And to your final point, yes, this ends up reinstalling things.
Sharing leads to all kinds of complications (again, Go has a proverb: "A little copying is better than a little dependency.") I'd happily have a dozen extra packages installed to have the That said, if implementing this required a big, ugly hairball that complicated everything that lived in the same sub-directory, I don't think it would be worth it. But if it can leverage existing work in concretiz-er and string-valued variants, then it's just a nice bit of isolated machinery (he says, hopefully). |
emphatic +100 for that. Even if it ends up only being an option for the "only applications" packages I would vote for adding this (though this isn't really democracy^^) I still think There is one problem that I see right now and that is what happens when I want to load
Just for reference: Not sure how sane using this ends up being in the end though ;) (hint: I don't use it) |
How do views handle applications that want to different versions of a python library? I wonder how often that occurs?
The shbang line of the They'd also be safe w.r.t. activated packages (which appear in the sub-directories of the Spack python). I'm not sure how much harm one could do with PYTHONPATH though. It would be great it one could tell Python (Perl, R) to just ignore it (I've actually used Perl's SITECUSTOMIZE to manipulate PERL5LIB...).
Well, learn something new every day. It looks like that ends up getting installed into the site packages directory of the python being used, so every user sees it and there's no opting out. They don't seem to install it into the Cellar as a separate entity (a la Spack). |
|
I still think views would be the ultimate solution™️ because they
wouldn't require redundant installations [...] and shouldn't take over this
issue.
I'm happy with loading a bunch of modules --- which is pretty equivalent to
views. Either way, I think they are preferable to, and more flexible than,
melding with virtualenv or some other language-specific environment system.
|
No they aren't, loading multiple
They don't! What they are essentially recreating is the "there is one The one big problem that remains is if you cross boundaries, i.e. either want to install things with
That would be the other solution, i.e. manipulating the search pattern of each package system manually in order to work around the "multiple |
|
I have an idea for using Python's site customization machinery to implement "rpath-ing for python packages" that could be alternative to using virtualenvs (probably with a different set of nasty smells, but we'll see). I'll prototype something an throw it out for feedback. |
TL;DR, with this PR, in a clean tree, you should be able to install a Python package and run its script from its `prefix.bin` without setting any environment variables or ... E.g. ```console spack install py-flake8 (module purge; /home/hartzell/tmp/spack-rpath.py/opt/spack/linux-rhel7-x86_64/gcc-4.8.5/py-flake8-3.5.0-4gkbvq2u3si6jxsmlapdeolds4wgwzdx/bin/flake8 --help) ``` --- This is an alternative approach to solving the problems addressed in PR spack#8364. Issue spack#8343 (a flake8 failure) is an example of the problem in real life. The key bits to this approach are Spack's DAG, Python's `site.addsitedir` function and Python's `sitecustomize.py` file. The implementation is a proof of concept hack, don't get too hung up on the code itself. The problem, in a nutshell, is adding the various directories into which we've installed an application's python prerequisites onto its `sys.path`. The current approach is to either: - `activate` the prereq's, which links them into the Python tree, which is on `sys.path` by default; or - add them via `PYTHONPATH` (using modulefiles or ...). The problem with the first approach is that only one version can be activated at a time and everyone using that Spack tree is stuck with it. The problem with the second approach is that directories added via `PYTHONPATH` are second class citizens, the directories themselves are searched **BUT** the `*.pth` files they contain are not processed. Lesser problems with this approach include `PYTHONPATH`'s global nature, its availability for finger poking and the complexity of juggling the modulefiles (e.g recursively loading prerequisites). This solution parallels what `rpath` does for shared libraries, fixing from whence an application loads its libraries *at build time*, not at run time. There are two components: 1. When Python packages are installed, they install a file containing the paths to all of the their python prerequisites (`.spack-rpaths`) within their `prefix`. 2. The Python package installs a `sitecustomize.py` script, which Python runs very early in the interpreter's startup. The `sitecustomize.py` code checks for a `.spack-rpaths` file. If it finds one it uses `site.addsitedir` to add the directories it contains to `sys.path`. Directories that are added to `sys.path` via `site.addsitedir` *do* process `*.pth` files, so the magic they contain is invoked as expected. Potentially sticky bits include: - The biggest roadblock to this approach is that `sitecustomize.py` is processed *so* early that `sys.argv` has not been created yet, so discovering the path to the directory in which the script lives is magical. This prototype grabs it from `/proc/self/cmdline`. I've included an alternate solution that is either really elegant or too-cute-by-half (or both...), see the comments in `sitecustomize.py` for the gory details. - Dealing with deployments that use the system python. They might need to install our `sitecustomize.py`, they *might* be able to leverage *usercustomize*, or they might need to use one of the other techniques. - I haven't played with Python3 yet. - I suspect that a sufficiently determined user could break things by setting `PYTHONPATH`. Beyond that, there's a bit of engineering to be done. Something similar might be workable for Perl using its `sitecustomize` support. Perhaps R and ... too.
TL;DR, with this PR, in a clean tree, you should be able to install a Python package and run its script from its `prefix.bin` without setting any environment variables or ... E.g. ```console spack install py-flake8 (module purge; /home/hartzell/tmp/spack-rpath.py/opt/spack/linux-rhel7-x86_64/gcc-4.8.5/py-flake8-3.5.0-4gkbvq2u3si6jxsmlapdeolds4wgwzdx/bin/flake8 --help) ``` --- This is an alternative approach to solving the problems addressed in PR spack#8364. Issue spack#8343 (a flake8 failure) is an example of the problem in real life. The key bits to this approach are Spack's DAG, Python's `site.addsitedir` function and Python's `sitecustomize.py` file. The implementation is a proof of concept hack, don't get too hung up on the code itself. The problem, in a nutshell, is adding the various directories into which we've installed an application's python prerequisites onto its `sys.path`. The current approach is to either: - `activate` the prereq's, which links them into the Python tree, which is on `sys.path` by default; or - add them via `PYTHONPATH` (using modulefiles or ...). The problem with the first approach is that only one version can be activated at a time and everyone using that Spack tree is stuck with it. The problem with the second approach is that directories added via `PYTHONPATH` are second class citizens, the directories themselves are searched **BUT** the `*.pth` files they contain are not processed. Lesser problems with this approach include `PYTHONPATH`'s global nature, its availability for finger poking and the complexity of juggling the modulefiles (e.g recursively loading prerequisites). This solution parallels what `rpath` does for shared libraries, fixing from whence an application loads its libraries *at build time*, not at run time. There are two components: 1. When Python packages are installed, they install a file containing the paths to all of the their python prerequisites (`.spack-rpaths`) within their `prefix`. 2. The Python package installs a `sitecustomize.py` script, which Python runs very early in the interpreter's startup. The `sitecustomize.py` code checks for a `.spack-rpaths` file. If it finds one it uses `site.addsitedir` to add the directories it contains to `sys.path`. Directories that are added to `sys.path` via `site.addsitedir` *do* process `*.pth` files, so the magic they contain is invoked as expected. Potentially sticky bits include: - The biggest roadblock to this approach is that `sitecustomize.py` is processed *so* early that `sys.argv` has not been created yet, so discovering the path to the directory in which the script lives is magical. This prototype grabs it from `/proc/self/cmdline`. I've included an alternate solution that is either really elegant or too-cute-by-half (or both...), see the comments in `sitecustomize.py` for the gory details. - Dealing with deployments that use the system python. They might need to install our `sitecustomize.py`, they *might* be able to leverage *usercustomize*, or they might need to use one of the other techniques. - I haven't played with Python3 yet. - I suspect that a sufficiently determined user could break things by setting `PYTHONPATH`. Beyond that, there's a bit of engineering to be done. Something similar might be workable for Perl using its `sitecustomize` support. Perhaps R and ... too.
|
@hartzell is this PR still a WIP? Trying to close old stale PRs. |
|
#20430 describes another attempt to solve this issue. |
|
Since this PR is old and conflicts, and I never got a response from @hartzell, I'm going to close it. Feel free to reopen if it's something you still want to work on. |
|
@adamjstewart -- Thanks for the mention. I'm stuck in a place where contributing is difficult. Closing this is appropriate. #20430 seems interesting.... |
[I hit the green "submit" button too quickly and had to edit this comment to add, well, all of it...]
I'd like to explore adding the option of installing Python applications into Python virtualenvs, making their Python dependencies "build" only and giving them some of the robustness that rpath's bring to compiled applications (no need to depend on environment variables to find what they need at runtime).
I've mentioned this on the Spack google group and gotten some feedback from @healther and @citibeth, but most of what the bits they discussed involved ways to set up the environment. I believe that installing into virtualenvs is orthogonal to the work they pointed me at.
I was recently exposed to Homebrew's use of virtualenvs when I submitted a Formula for
bumpversion.This PR/branch is a hack to demonstrate how it might behave. It is not a final implementation.
With that said, in a clone with this branch checked out and nothing else installed, one can (tested on CentOS 7):
The final 3 installations don't complete happily, but they finish the important POC bits. In each prefix there will be a
libexecdirectory (e.g..../spack-virtualenv/opt/spack/linux-rhel7-x86_64/gcc-4.8.5/py-flake8-3.5.0-yqduo7ftn2ucmnamrn3lrlwkdsx7d4a7/libexec/) that has abinsubdir that contains the application. E.g.That
flake8will run with no special environment settings, in fact, adding thatbindirectory to thePATHis enough to runspack flake8(and then fix the uglies...).A final implementation (stealing from Homebrew) would link the applications into
prefix/bin, keeping thelibexecdir private.I had trouble finding a pleasant way to pass the fact that a virtualenv was being used from the top level item (e.g.
py-flake8) into the other layers.One clean idea would be for the top level app to depend on Python and specify a variant to Python that included the name of the virtualenv. I'm not sure how to get that to concretize cleanly with all of the dependencies, which are simply depending on Python. Perhaps a virtual dependency?
Alternatively, setting a boolean in the top-level application and figuring out how to pass the info down through its
do_installand into the layers below seems to be the best bet. Perhaps there's a way to adjust thespec's or to pass alone the extra info. It's tempting to do something with the Python package'sset_dependent_package, but I couldn't figure it out nicely.I would love feedback on how this might happen.
I think that python packages being installed into virtualenvs should have their prefix adjusted to point into the top level app's prefix (so that they don't clash with any real installs), they should be recorded in the db as part of a virtualenv install (or perhaps not as all).
In conclusion: if this works, then applications can be self-contained and more reliable (avoiding e.g. the py-flake8 module loading issue). The resulting packages will play nicely with Environments and etc.... It's complementary to Spack's other methods for handling add-on packages (environment modification, activation, views) and would still use Spack's package definitions, preserving reproducibility and etc...
Feedback?