rationale for process sandboxing on linux, then a proposal for a how to emulate  sandboxing on all the other platforms

In #20256, we whittled down some vague ideas into a few concrete, 100%  backwards-compatible proposed changes to spec syntax in https://github.com/spack/spack/issues/20256#issuecomment-739434724. That branched off into this issue.

# Motivation

## Bootstrapping the clingo solver sometimes requires doubly-bootstrapping python (#20159)
When testing whether `clingo` could bootstrap from a known-to-be-very-old environment like `spack/centos6` last week for #20159, I found that `spack install python` would fail if I installed a python of the same major and minor version as the currently-selected python, and it would fail with the error being something like `libpython2.6.so.1.0  had zero bytes`. Since a working `python` is needed to install python, and I think simply moving the library made it impossible to run python at all, I resolved the issue by installing a version of python 2.7 through spack, `spack load`ing it, then rebuilding the same `python@2.6.6`, since CMake (needed for clingo) assumes you have specific python header files installed in specific places and just errors without much info if it's not configured nicely.

In contrast to the well-commented bootstrap script that's currently necessary:
https://github.com/spack/spack/blob/6c81808bbdcfccbcda9c580ed95d920e2d75570a/share/spack/qa/run-clingo-py2-bootstrap#L38-L40

we and our users would probably be able to use a *single* `spack install ...`  command to do this bootstrapping  instead, if we could easily chroot the python install process!

## Other Examples: Git (#18895) / GPG (#18454)

A much more common type of failure was described by @opadron in #18895, wherein running `git` in a spack test will sometimes pull in other user and site configs that are not intended and cause errors. @adamjstewart noted again in #18454 the importance of shielding specifically user config files from the spack invocation (unless spack brings them in intentionally through `spack external find`) with regards to signing gpg commits.

In particular, I know *git* lets you configure things per-repo, but I personally use *gpg* in just a single directory, with plenty of very host-specific settings. And I think it would be a bad thing if people were forced to do hacky things in spack or with their gpg install (like temporarily moving the gpg directory, which I've done before, not with spack) because of behaviors spack invokes by accident like this. It seems like both git and gpg would be really great integration tests for this sort of thing.



------------------

# Description

Goals:
1. [ ] Expose `fakechroot` as a high-quality, well-documented spack package which can be used in any configure/build/install phase.,
    - use `fakechroot` or another technique to fully sandbox linux invocations of `git` (from #18895) and `gpg` (from #18454).
    - verify with the affected users that the issues are resolved. 
2. [ ] Expose a *virtual package* named `user-chroot` which allows alternative implementations, and implement a stub of the virtual package for OpenBSD.
3. [ ] *(In the nearish future):* use `fakechroot`, the `user-chroot` virtual dependency, and an OpenBSD implementation as motivating examples for the overhauling of dependency types in https://github.com/spack/spack/issues/20256#issuecomment-739434724.
4. [ ] Gradually establish a canonical feedback loop for spack users to report and to get help with isolation breach errors, especially on platforms where we don't have a good sandboxing implementation.

## MVP: Demonstrate that `fakechroot` enables obvious, clear fixes for the git and gpg config issues  
- In https://github.com/spack/spack/issues/20256#issuecomment-739434724, I proposed bringing in `fakechroot` to do performant and conceptually simple filesystem sandboxing (which I *believe* should be 100% sufficient for most use cases in spack).
- `fakechroot` appears to be well-maintained and difficult to use incorrectly. **Let's demonstrate that by shipping this.**
    - Since most of spack CI tests on linux and it's easy to create a new virtual linux environment to reproduce bugs in sandboxing, this is not expected to represent tech debt, for example.

## Extension to Non-Linux platforms
While many non-linux systems (especially BSDs) may have extremely mature sandboxing abilities, the interface will not be 100% compatible with `fakechroot` -- this is fine, but it suddenly raises a few implementation questions:

- While `fakechroot` can hopefully address our user needs here on linux entirely (crossing fingers), how much comparative effort to implement would a `fakechroot`-compatible interface be, for spack developers and/or spack users, across all our supported platforms?
- What kind of documentation and/or tooling would be necessary for spack users to contribute to the "feedback loop" described in goal (4) above?
- **And in particular, might we be able to use the output of syscall tracing mechanisms like linux's `strace` to be a good avenue for a platform-independent way of (at least) identifying exactly which files are incorrectly being consumed by something spack is trying to install?**

### 1. Background: OSX sandboxing on Nix (NixOS/nix#434, NixOS/nixpkgs#18506)
The reason I believe we might want to start having any opinion about OSX sandboxing about this now is motivated by some web searches the other day for "OSX `fakechroot`". x86 OSX is the other operating system I personally frequent besides arch linux. As per usual, there's a very relevant issue from the nix repo (NixOS/nix#434) **(from 2015)**, which describes the supposedly-similar but of course minimally-documented `sandbox-exec` tool.

...and that functionality was apparently working on nix in 2015-2016ish, and broke sometime in 2016, which produced another issue, which led to this very concise, mature, and thoughtful summary from @copumpkin (https://github.com/NixOS/nixpkgs/issues/18506#issuecomment-246198736):
> We did have sandboxed builds working well for a long time (it was all I used for a
> while, and I did the initial work to make them work) but they broke and I
> have far less time now than I did when I first put this whole thing
> together. In the absence of me doing all of the above, who makes it happen
> and how? Documentation certainly helps but if I had eager Darwin developers
> breathing down my neck aching to help fix the stdenv I might allocate more
> of my free time to doing that instead of fixing the things myself.

**We should evaluate how the ecosystem has evolved since that 2016 thread**, but seeing that a major package manager/operating system with lots of deep knowledge of their environment had such difficulty with this exact problem, and seeing how they described part of the difficulty as coming from the lack of a really strong connection with enthusiastic OSX users, I think that we may want to **consider the OSX sandboxing problem as "too hard for right now, but we should definitely put it back on the table if we ever need to scale up our connection with our enthusiastic OSX user base".**

### 2. Back to the drawing board: what is `fakechroot` doing for us that doesn't require linux namespaces? 
- Avoid ever needing to perform a double bootstrap process for a tool as is currently done for clingo in #20159 in our `spack/centos6` CI shard.
    - [ ] *So:* **To avoid ever pulling in anything from the runtime library dir currently on disk while we try to configure/build/install a spack version of that same thing.**
- In the current reference tickets #18895 and #18454, instead of binary files, our goal is "just" to:
    - [ ] **Determine a robust way (specifically for the `git` and `gpg` tools right now) to configure those tools to avoid looking at the global user config, as well as any site-specific config which those tools may define?**
    - The use case in these two tickets seems pretty clearly to be "just": "we want to allow spack to invoke those tools internally in any possible way which would avoid reading some or all user/site config."
        - [ ] This means we can consider e.g. patching the source to implement this, if that's the easiest way. We can gate it behind a default-`False` variant named e.g. `+noreadconfig`.

### 3. Proposal: Use syscall tracing to find when, why, and how files are being accessed
To reiterate, the reason I'm creating this ticket at all, is because, like @adamjstewart expressed in #18895, I believe this is a really generalizable problem which will pop up in the future. Further motivating me was the primary source from nix quoted above confirming that we can't expect sandboxing to "just work" on OSX. So we'd like to start a discussion on other methods of achieving the above, *without* relying on linux namespaces (which is how docker and `fakechroot` are implemented).

My thought process here is that while **we likely *can't* expect**:
- To have a highly robust ("spack-quality") *filesystem isolation* mechanism (like `fakechroot`) that will even have the same command-line API across even a *plurality* of our supported platforms!
    - Note that **for right now, we are pretty sure `fakechroot` is sufficient for all of spack's current purposes** (let's call it the platonic ideal).
- To have a directly comparable syscall tracing mechanism across platforms. This is expected to differ widely across platforms and even hosts, and I *believe* we should assume that our users may have good reasons to use their own syscall tracer too.
    - This is the kind of thing Spack makes easy to describe, though. 

----------

However, my reach goal here would be to answer:

**Is there a middle ground between `fakechroot` and syscall tracing which goes further than `strace` in not just *diagnosing* the isolation breaches, but providing a way (even a hacky way) to *fix* them *without* patching the package sources?**

*The correct answer to this *may* be: "not right now, but we'll end up building it along the way."*

Here's one way to architect the "canonical feedback loop" in goal 4 from above:

### 4. Proposal: Canonical feedback loop for responding to spack user reports of isolation breaches
Ask 3 questions:

1. [ ] **What file paths are being fetched (or: "what resources/caches are being read/written") that cause a speific configure/build/install to fail?** This should be answered as a best-guess response to "what was the package developer *trying to achieve*"? Keep in mind your best guess too as to how much control the package developer themselves may have over the error (e.g. if it's happening in CMake).
    - *example:* "CMake expects the Python installation to contain several specific header files, without comments describing why CMake expects/needs those to be there. The git blame turned up these possible explanations, but nothing conclusive."
    - *example:* "Python's generated (?) install script manually searches /lib64, finds the shared library for the current python 2.6.6, and *somehow* (not sure yet) prefers that over the shared library that spack just built."
2. [ ] *short-term:* **Can this be worked around via patch()/etc? *long-term:* Who do we think is *most* responsible for this specific build failure, and how active are their contributions?** 
    - *example:* "This CMake FindPython module has been broken on centos6 since time immemorial: <try to explain why if possible."
    - *example:* "There's a known race condition in this one tool that's only on github and the Arch User Repositories, which has had an issue open for just a couple weeks. We've left a (linked) comment."
    - *example:* "This codebase uses a custom fork of the gnu autotools which is difficult to debug: \<explain why\>" 
3. [ ] **Can the resource the developer was *trying to achieve* feasibly and usefully be expressed as a *platform-generic virtual dependency* as per section (3) of https://github.com/spack/spack/issues/20256#issuecomment-739434724**?
    -  *example:* "We could not find any reason stated for CMake to have assumed *this specific subset of files* was going to be in the python config dir."
    - *example:* "The python install script just requires a working python installation to run successfully -- during `install`, it doesn't actually need to have `LD_LIBRARY_PATH` set to anything besides its own just-built python, right?"
        - *example response:* "Ah! So we need to require (by editing the `ComplexResource` definition for `%py`) that all python-providing packages expose a uniform way to set *only* `PATH` and *not* `LD_LIBRARY_PATH`, for some build phase?"

# Appendix
## Consider investigating if any "container runtimes" can be converted into something like `fakechroot`. 
- [Docker](docker.com) is a very widely used container runtime, which spack already supports and [has CI running for](https://github.com/spack/spack/blob/3843f43e6983d9c513218a2c0e25c27e5be76994/lib/spack/spack/test/container/docker.py#L62).
    - It would be very good to not become dependent on the `docker` tool itself to enforce any separation, or to use Dockerfiles instead of spack environment files.
        - All we're looking for is a fake root/chroot filesystem -- we want to make it as easy and reliable as possible for spack to achieve this. 
    - It's not currently clear to me whether container runtimes can be expected to provide *any* useful sandboxing guarantees without a lot of work, especially across platforms. Let's take some time to research that at some point. 

## Apply the Spack User Survey
- Make use of data from [the 2020 spack user survey](https://spack.io/spack-user-survey-2020/) on what kinds of sandboxing or system/app tracing needs users want support for, on each of our supported platforms.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rationale for process sandboxing on linux, then a proposal for a how to emulate sandboxing on all the other platforms #20260

Motivation

Bootstrapping the clingo solver sometimes requires doubly-bootstrapping python (#20159)

Other Examples: Git (#18895) / GPG (#18454)

Description

MVP: Demonstrate that `fakechroot` enables obvious, clear fixes for the git and gpg config issues

Extension to Non-Linux platforms

1. Background: OSX sandboxing on Nix (NixOS/nix#434, NixOS/nixpkgs#18506)

2. Back to the drawing board: what is `fakechroot` doing for us that doesn't require linux namespaces?

3. Proposal: Use syscall tracing to find when, why, and how files are being accessed

4. Proposal: Canonical feedback loop for responding to spack user reports of isolation breaches

Appendix

Consider investigating if any "container runtimes" can be converted into something like `fakechroot`.

Apply the Spack User Survey

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	# Game plan: install the same python, with a much less broken installation
	# (headers and libraries where clingo can find them). But first, we'll need to
	# install yet another python, then work our way back.

rationale for process sandboxing on linux, then a proposal for a how to emulate sandboxing on all the other platforms #20260

Description

Motivation

Bootstrapping the clingo solver sometimes requires doubly-bootstrapping python (#20159)

Other Examples: Git (#18895) / GPG (#18454)

Description

MVP: Demonstrate that fakechroot enables obvious, clear fixes for the git and gpg config issues

Extension to Non-Linux platforms

1. Background: OSX sandboxing on Nix (NixOS/nix#434, NixOS/nixpkgs#18506)

2. Back to the drawing board: what is fakechroot doing for us that doesn't require linux namespaces?

3. Proposal: Use syscall tracing to find when, why, and how files are being accessed

4. Proposal: Canonical feedback loop for responding to spack user reports of isolation breaches

Appendix

Consider investigating if any "container runtimes" can be converted into something like fakechroot.

Apply the Spack User Survey

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

MVP: Demonstrate that `fakechroot` enables obvious, clear fixes for the git and gpg config issues

2. Back to the drawing board: what is `fakechroot` doing for us that doesn't require linux namespaces?

Consider investigating if any "container runtimes" can be converted into something like `fakechroot`.