-
Notifications
You must be signed in to change notification settings - Fork 2.4k
rationale for process sandboxing on linux, then a proposal for a how to emulate sandboxing on all the other platforms #20260
Description
In #20256, we whittled down some vague ideas into a few concrete, 100% backwards-compatible proposed changes to spec syntax in #20256 (comment). That branched off into this issue.
Motivation
Bootstrapping the clingo solver sometimes requires doubly-bootstrapping python (#20159)
When testing whether clingo could bootstrap from a known-to-be-very-old environment like spack/centos6 last week for #20159, I found that spack install python would fail if I installed a python of the same major and minor version as the currently-selected python, and it would fail with the error being something like libpython2.6.so.1.0 had zero bytes. Since a working python is needed to install python, and I think simply moving the library made it impossible to run python at all, I resolved the issue by installing a version of python 2.7 through spack, spack loading it, then rebuilding the same [email protected], since CMake (needed for clingo) assumes you have specific python header files installed in specific places and just errors without much info if it's not configured nicely.
In contrast to the well-commented bootstrap script that's currently necessary:
spack/share/spack/qa/run-clingo-py2-bootstrap
Lines 38 to 40 in 6c81808
| # Game plan: install the same python, with a much less broken installation | |
| # (headers and libraries where clingo can find them). But first, we'll need to | |
| # install yet another python, then work our way back. |
we and our users would probably be able to use a single spack install ... command to do this bootstrapping instead, if we could easily chroot the python install process!
Other Examples: Git (#18895) / GPG (#18454)
A much more common type of failure was described by @opadron in #18895, wherein running git in a spack test will sometimes pull in other user and site configs that are not intended and cause errors. @adamjstewart noted again in #18454 the importance of shielding specifically user config files from the spack invocation (unless spack brings them in intentionally through spack external find) with regards to signing gpg commits.
In particular, I know git lets you configure things per-repo, but I personally use gpg in just a single directory, with plenty of very host-specific settings. And I think it would be a bad thing if people were forced to do hacky things in spack or with their gpg install (like temporarily moving the gpg directory, which I've done before, not with spack) because of behaviors spack invokes by accident like this. It seems like both git and gpg would be really great integration tests for this sort of thing.
Description
Goals:
- Expose
fakechrootas a high-quality, well-documented spack package which can be used in any configure/build/install phase.,- use
fakechrootor another technique to fully sandbox linux invocations ofgit(from Spack runs git without isolation #18895) andgpg(from spack test: no gpg sign #18454). - verify with the affected users that the issues are resolved.
- use
- Expose a virtual package named
user-chrootwhich allows alternative implementations, and implement a stub of the virtual package for OpenBSD. - (In the nearish future): use
fakechroot, theuser-chrootvirtual dependency, and an OpenBSD implementation as motivating examples for the overhauling of dependency types in a proposal for how to represent virtual dependencies in the spec syntax #20256 (comment). - Gradually establish a canonical feedback loop for spack users to report and to get help with isolation breach errors, especially on platforms where we don't have a good sandboxing implementation.
MVP: Demonstrate that fakechroot enables obvious, clear fixes for the git and gpg config issues
- In a proposal for how to represent virtual dependencies in the spec syntax #20256 (comment), I proposed bringing in
fakechrootto do performant and conceptually simple filesystem sandboxing (which I believe should be 100% sufficient for most use cases in spack). fakechrootappears to be well-maintained and difficult to use incorrectly. Let's demonstrate that by shipping this.- Since most of spack CI tests on linux and it's easy to create a new virtual linux environment to reproduce bugs in sandboxing, this is not expected to represent tech debt, for example.
Extension to Non-Linux platforms
While many non-linux systems (especially BSDs) may have extremely mature sandboxing abilities, the interface will not be 100% compatible with fakechroot -- this is fine, but it suddenly raises a few implementation questions:
- While
fakechrootcan hopefully address our user needs here on linux entirely (crossing fingers), how much comparative effort to implement would afakechroot-compatible interface be, for spack developers and/or spack users, across all our supported platforms? - What kind of documentation and/or tooling would be necessary for spack users to contribute to the "feedback loop" described in goal (4) above?
- And in particular, might we be able to use the output of syscall tracing mechanisms like linux's
straceto be a good avenue for a platform-independent way of (at least) identifying exactly which files are incorrectly being consumed by something spack is trying to install?
1. Background: OSX sandboxing on Nix (NixOS/nix#434, NixOS/nixpkgs#18506)
The reason I believe we might want to start having any opinion about OSX sandboxing about this now is motivated by some web searches the other day for "OSX fakechroot". x86 OSX is the other operating system I personally frequent besides arch linux. As per usual, there's a very relevant issue from the nix repo (NixOS/nix#434) (from 2015), which describes the supposedly-similar but of course minimally-documented sandbox-exec tool.
...and that functionality was apparently working on nix in 2015-2016ish, and broke sometime in 2016, which produced another issue, which led to this very concise, mature, and thoughtful summary from @copumpkin (NixOS/nixpkgs#18506 (comment)):
We did have sandboxed builds working well for a long time (it was all I used for a
while, and I did the initial work to make them work) but they broke and I
have far less time now than I did when I first put this whole thing
together. In the absence of me doing all of the above, who makes it happen
and how? Documentation certainly helps but if I had eager Darwin developers
breathing down my neck aching to help fix the stdenv I might allocate more
of my free time to doing that instead of fixing the things myself.
We should evaluate how the ecosystem has evolved since that 2016 thread, but seeing that a major package manager/operating system with lots of deep knowledge of their environment had such difficulty with this exact problem, and seeing how they described part of the difficulty as coming from the lack of a really strong connection with enthusiastic OSX users, I think that we may want to consider the OSX sandboxing problem as "too hard for right now, but we should definitely put it back on the table if we ever need to scale up our connection with our enthusiastic OSX user base".
2. Back to the drawing board: what is fakechroot doing for us that doesn't require linux namespaces?
- Avoid ever needing to perform a double bootstrap process for a tool as is currently done for clingo in clean up how spack builds clingo #20159 in our
spack/centos6CI shard.- So: To avoid ever pulling in anything from the runtime library dir currently on disk while we try to configure/build/install a spack version of that same thing.
- In the current reference tickets Spack runs git without isolation #18895 and spack test: no gpg sign #18454, instead of binary files, our goal is "just" to:
- Determine a robust way (specifically for the
gitandgpgtools right now) to configure those tools to avoid looking at the global user config, as well as any site-specific config which those tools may define? - The use case in these two tickets seems pretty clearly to be "just": "we want to allow spack to invoke those tools internally in any possible way which would avoid reading some or all user/site config."
- This means we can consider e.g. patching the source to implement this, if that's the easiest way. We can gate it behind a default-
Falsevariant named e.g.+noreadconfig.
- This means we can consider e.g. patching the source to implement this, if that's the easiest way. We can gate it behind a default-
- Determine a robust way (specifically for the
3. Proposal: Use syscall tracing to find when, why, and how files are being accessed
To reiterate, the reason I'm creating this ticket at all, is because, like @adamjstewart expressed in #18895, I believe this is a really generalizable problem which will pop up in the future. Further motivating me was the primary source from nix quoted above confirming that we can't expect sandboxing to "just work" on OSX. So we'd like to start a discussion on other methods of achieving the above, without relying on linux namespaces (which is how docker and fakechroot are implemented).
My thought process here is that while we likely can't expect:
- To have a highly robust ("spack-quality") filesystem isolation mechanism (like
fakechroot) that will even have the same command-line API across even a plurality of our supported platforms!- Note that for right now, we are pretty sure
fakechrootis sufficient for all of spack's current purposes (let's call it the platonic ideal).
- Note that for right now, we are pretty sure
- To have a directly comparable syscall tracing mechanism across platforms. This is expected to differ widely across platforms and even hosts, and I believe we should assume that our users may have good reasons to use their own syscall tracer too.
- This is the kind of thing Spack makes easy to describe, though.
However, my reach goal here would be to answer:
Is there a middle ground between fakechroot and syscall tracing which goes further than strace in not just diagnosing the isolation breaches, but providing a way (even a hacky way) to fix them without patching the package sources?
The correct answer to this may be: "not right now, but we'll end up building it along the way."
Here's one way to architect the "canonical feedback loop" in goal 4 from above:
4. Proposal: Canonical feedback loop for responding to spack user reports of isolation breaches
Ask 3 questions:
- What file paths are being fetched (or: "what resources/caches are being read/written") that cause a speific configure/build/install to fail? This should be answered as a best-guess response to "what was the package developer trying to achieve"? Keep in mind your best guess too as to how much control the package developer themselves may have over the error (e.g. if it's happening in CMake).
- example: "CMake expects the Python installation to contain several specific header files, without comments describing why CMake expects/needs those to be there. The git blame turned up these possible explanations, but nothing conclusive."
- example: "Python's generated (?) install script manually searches /lib64, finds the shared library for the current python 2.6.6, and somehow (not sure yet) prefers that over the shared library that spack just built."
- short-term: Can this be worked around via patch()/etc? long-term: Who do we think is most responsible for this specific build failure, and how active are their contributions?
- example: "This CMake FindPython module has been broken on centos6 since time immemorial: <try to explain why if possible."
- example: "There's a known race condition in this one tool that's only on github and the Arch User Repositories, which has had an issue open for just a couple weeks. We've left a (linked) comment."
- example: "This codebase uses a custom fork of the gnu autotools which is difficult to debug: <explain why>"
- Can the resource the developer was trying to achieve feasibly and usefully be expressed as a platform-generic virtual dependency as per section (3) of a proposal for how to represent virtual dependencies in the spec syntax #20256 (comment)?
- example: "We could not find any reason stated for CMake to have assumed this specific subset of files was going to be in the python config dir."
- example: "The python install script just requires a working python installation to run successfully -- during
install, it doesn't actually need to haveLD_LIBRARY_PATHset to anything besides its own just-built python, right?"- example response: "Ah! So we need to require (by editing the
ComplexResourcedefinition for%py) that all python-providing packages expose a uniform way to set onlyPATHand notLD_LIBRARY_PATH, for some build phase?"
- example response: "Ah! So we need to require (by editing the
Appendix
Consider investigating if any "container runtimes" can be converted into something like fakechroot.
- Docker is a very widely used container runtime, which spack already supports and has CI running for.
- It would be very good to not become dependent on the
dockertool itself to enforce any separation, or to use Dockerfiles instead of spack environment files.- All we're looking for is a fake root/chroot filesystem -- we want to make it as easy and reliable as possible for spack to achieve this.
- It's not currently clear to me whether container runtimes can be expected to provide any useful sandboxing guarantees without a lot of work, especially across platforms. Let's take some time to research that at some point.
- It would be very good to not become dependent on the
Apply the Spack User Survey
- Make use of data from the 2020 spack user survey on what kinds of sandboxing or system/app tracing needs users want support for, on each of our supported platforms.