Experimental binding of shared ELF libraries#31948
Conversation
c5a0d40 to
809ff91
Compare
|
Yup, @haampie and I talked about this, brought it up in the packaging taxonomy channel too. We need to do some testing to see what the effect is, and if it would let us get away with removing any of what we are doing with rewriting currently, but it could be a really nice way to deal with some problems, potentially even get musl working with shrinkwrap if we can touch the dependencies. |
|
Very cool — I remember us discussing this. With @trws I keep discussing how store-based systems should go deeper. Why does the code do so much of the ELF checking itself but also relies on patchelf? |
809ff91 to
ef95a9e
Compare
I wanted to prevent weird situations where executables get sonames, patchelf happily does that. And I did not want to rely too much on conventions such as executables go in However, I've found that Maybe a better answer to the Q "is this a shared object" is: yes if it is |
ef95a9e to
b669f52
Compare
|
b669f52 to
b1bae2a
Compare
020d358 to
a16cf86
Compare
a16cf86 to
13345c5
Compare
|
@tgamblin this seems to work just fine ™️ The only build failure is now mfem on aarch64 amazon linux, but that failure is legit since cuda toolkit requires a newer glibc. See #32206 and attempt 1 and attempt 2, 2 attempts because gitlab ci has tons of spurious kubernetes isues. Note that I tweaked CI to have a recent patchelf, unclear if ultimately it was necessary or not. How to proceed from here? Enable this universally? Hide it behind a config flag? (I would prefer not, since the config value is not reflected in the hash -- different sonames under same hash doesn't sound right). Document it? Announce it? |
|
I like this approach a lot (it could be ported to EasyBuild as well as an alternative to its rpath support), as it eliminates both LD_LIBRARY_PATH and the somewhat unreliable LD_RUN_PATH (alleviated by using a wrapper for One thing I'm not sure about are the legal issues though, say you patch the soname of CUDA or MKL libraries, is that copyright or EULA infringement? Or only if you distribute said patched libraries? |
|
I'm not sure if But yeah, it's some form of binding/fixing/pinning, to I just went with |
|
@tgamblin this is ready for review again, I've also added documentation (https://spack--31948.org.readthedocs.build/en/31948/config_yaml.html#shared-linking-bind) |
Right now, Also I'm curious: I'd expect that our relocation logic would need to be updated, or is the string replacement it does "generic enough" to cover this also? |
It's still a heuristic; spack doesn't exactly know what to look for, only the build system knows what it needs, and only the linker knows what's linked.
It doesn't, it's covered by generic binary "text" replacement |
I see that if the build system found all the needed libraries, then at that point the needed libraries are mentioned in the ELF data (assuming that there are not equivalently-named libraries with different directories in the dependencies) and there is no need for per-package logic to locate those libraries. That being said, sometimes we use Generally though this puts less pressure on |
|
I've moved elf parsing to #33628, so this PR can focus just on changing sonames. |
Replace sonames of ELF shared libraries with their absolute path, so the linker copies a path, and the dynamic linker loads this path directly instead of performing a search. This hook can be enabled by setting `config:shared_linking:bind:true`.
6506af9 to
22d063c
Compare
|
Rebased now that #33628 is merged |
tgamblin
left a comment
There was a problem hiding this comment.
LGTM -- minor backward compatibility requests.
Also, is it possible to get more coverage on this?
| shared_linking: 'rpath' | ||
| # Control how shared libraries are located at runtime on Linux. See the | ||
| # the Spack documentation for details. | ||
| shared_linking: |
There was a problem hiding this comment.
ensure that shared_linking: <string> still works.
There was a problem hiding this comment.
I've added a test for that
| # Since the order is important, we just fix it here, it's not like | ||
| # hooks get added every other day. |
There was a problem hiding this comment.
Is this comment still relevant? I think it's not since you now use ensure_last
5a6fc42 to
f010311
Compare
|
CI is failing with the known |
Adds another post install hook that loops over the install prefix, looking for shared libraries type of ELF files, and sets the soname to their own absolute paths. The idea being, whenever somebody links against those libraries, the linker copies the soname (which is the absolute path to the library) as a "needed" library, so that at runtime the dynamic loader realizes the needed library is a path which should be loaded directly without searching. As a result: 1. rpaths are not used for the fixed/static list of needed libraries in the dynamic section (only for _actually_ dynamically loaded libraries through `dlopen`), which largely solves the issue that Spack's rpaths are a heuristic (`<prefix>/lib` and `<prefix>/lib64` might not be where libraries really are...) 2. improved startup times (no library search required)
This PR does several things:
- [x] Allow any character to appear in the quoted values of variants and flags.
- [x] Allow easier passing of quoted flags on the command line, e.g. `cflags="-O2 -g"`.
- [x] Handle quoting better in spec output, using single quotes around double
quotes and vice versa.
- [x] Disallow spaces around `=` and `==` when parsing variants and flags.
## Motivation
This PR is motivated by the issues above and by ORNL's
[tips for launching at scale on Frontier](https://docs.olcf.ornl.gov/systems/frontier_user_guide.html#tips-for-launching-at-scale).
ORNL recommends using `sbcast --send-libs` to broadcast executables and their
libraries to compute nodes when running large jobs (e.g., 80k ranks). For an
executable named `exe`, `sbcast --send-libs` stores the needed libraries in a
directory alongside the executable called `exe_libs`. ORNL recommends pointing
`LD_LIBRARY_PATH` at that directory so that `exe` will find the local libraries and
not overwhelm the filesystem.
There are other ways to mitigate this problem:
* You could build with `RUNPATH` using `spack config add config:shared_linking:type:runpath`,
which would make `LD_LIBRARY_PATH` take precedence over Spack's `RUNPATHs`.
I don't recommend this one because `RUNPATH` can cause many other things to go wrong.
* You could use `spack config add config:shared_linking:bind:true`, added in #31948, which
will greatly reduce the filesystem load for large jobs by pointing `DT_NEEDED` entries in
ELF *directly* at the needed `.so` files instead of relying on `RPATH` search via soname.
I have not experimented with this at 80,000 ranks, but it should help quite a bit.
* You could use [Spindle](https://github.com/hpc/Spindle) (as LLNL does on its machines)
which should transparently fix this without any changes to your executable and without
any need to use `sbcast` or other tools.
But we want to support the `sbcast` use case as well.
## `sbcast` and Spack
Spack's `RPATHs` break the `sbcast` fix because they're considered with higher precedence
than `LD_LIBRARY_PATH`. So Spack applications will still end up hitting the shared filesystem
when searching for libraries. We can avoid this by injecting some `ldflags` in to the build, e.g.,
if were were going to launch, say, `LAMMPS` at scale, we could add another `RPATH`
specifically for use with `sbcast`:
spack install lammps ldflags='-Wl,-rpath=$ORIGIN/lmp_libs'
This will put the `lmp_libs` directory alongside `LAMMPS`'s `lmp` executable first in the
`RPATH`, so it will be searched before any directories on the shared filesystem.
## Issues with quoting
Before this PR, the command above would've errored out for two reasons:
1. `$` wasn't an allowed character in our spec parser.
2. You would've had to double quote the flags to get them to pass through correctly:
spack install lammps ldflags='"-Wl,-rpath=$ORIGIN/lmp_libs"'
This is ugly and I don't think many users will easily figure it out. The behavior was added in
#29282, and it improved parsing of specs passed as a single string, e.g.:
spack install 'lammps ldflags="-Wl,-rpath=$ORIGIN/lmp_libs"'
but a lot of users are naturally going to try to quote arguments *directly* on the command
line, without quoting their entire spec. #29282 used a heuristic to detect unquoted flags
and warn the user, but the warning could be confusing. In particular, if you wrote
`cflags="-O2 -g"` on the command line, it would break the flags up, warn, and tell you
that you could fix the issue by writing `cflags="-O2 -g"` even though you just wrote
that. It's telling you to *quote* that value, but the user has to know to double quote.
## New heuristic for quoted arguments from the CLI
There are only two places where we allow arbitrary quoted strings in specs: flags and
variant values, so this PR adds a simpler heuristic to the CLI parser: if an argument in
`sys.argv` starts with `name=...`, then we assume the whole argument is quoted.
This means you can write:
spack install bzip2 cflags="-O2 -g"
directly on the command line, without multiple levels of quoting. This also works:
spack install 'bzip2 cflags="-O2 -g"'
The only place where this heuristic runs into ambiguity is if you attempt to pass
anonymous specs that start with `name=...` as one large string. e.g., this will be
interpreted as one large flag value:
spack find 'cflags="-O2 -g" ~bar +baz'
This sets `cflags` to `"-O2 -g" ~bar +baz`, which is likely not what you wanted. You
can fix this easily by either removing the quotes:
spack find cflags="-O2 -g" ~bar +baz
Or by adding a space at the start, which has the same effect:
spack find ' cflags="-O2 -g" ~bar +baz'
You may wonder why we don't just look for quotes inside of flag arguments, and the
reason is that you *might* want them there. If you are passing arguments like:
spack install zlib cppflags="-D DEBUG_MSG1='quick fox' -D DEBUG_MSG2='lazy dog'"
You *need* the quotes there. So we've opted for one potentially confusing, but easily
fixed outcome vs. limiting what you can put in your quoted strings.
## Quotes in formatted spec output
In addition to being more lenient about characters accepted in quoted strings, this PR fixes
up spec formatting a bit. We now format quoted strings in specs with single quotes, unless
the string has a single quote in it, in which case we JSON-escape the string (i.e., we add
`\` before `"` and `\`).
zlib cflags='-D FOO="bar"'
zlib cflags="-D FOO='bar'"
zlib cflags="-D FOO='bar' BAR=\"baz\""
This PR does several things:
- [x] Allow any character to appear in the quoted values of variants and flags.
- [x] Allow easier passing of quoted flags on the command line, e.g. `cflags="-O2 -g"`.
- [x] Handle quoting better in spec output, using single quotes around double
quotes and vice versa.
- [x] Disallow spaces around `=` and `==` when parsing variants and flags.
## Motivation
This PR is motivated by the issues above and by ORNL's
[tips for launching at scale on Frontier](https://docs.olcf.ornl.gov/systems/frontier_user_guide.html#tips-for-launching-at-scale).
ORNL recommends using `sbcast --send-libs` to broadcast executables and their
libraries to compute nodes when running large jobs (e.g., 80k ranks). For an
executable named `exe`, `sbcast --send-libs` stores the needed libraries in a
directory alongside the executable called `exe_libs`. ORNL recommends pointing
`LD_LIBRARY_PATH` at that directory so that `exe` will find the local libraries and
not overwhelm the filesystem.
There are other ways to mitigate this problem:
* You could build with `RUNPATH` using `spack config add config:shared_linking:type:runpath`,
which would make `LD_LIBRARY_PATH` take precedence over Spack's `RUNPATHs`.
I don't recommend this one because `RUNPATH` can cause many other things to go wrong.
* You could use `spack config add config:shared_linking:bind:true`, added in spack#31948, which
will greatly reduce the filesystem load for large jobs by pointing `DT_NEEDED` entries in
ELF *directly* at the needed `.so` files instead of relying on `RPATH` search via soname.
I have not experimented with this at 80,000 ranks, but it should help quite a bit.
* You could use [Spindle](https://github.com/hpc/Spindle) (as LLNL does on its machines)
which should transparently fix this without any changes to your executable and without
any need to use `sbcast` or other tools.
But we want to support the `sbcast` use case as well.
## `sbcast` and Spack
Spack's `RPATHs` break the `sbcast` fix because they're considered with higher precedence
than `LD_LIBRARY_PATH`. So Spack applications will still end up hitting the shared filesystem
when searching for libraries. We can avoid this by injecting some `ldflags` in to the build, e.g.,
if were were going to launch, say, `LAMMPS` at scale, we could add another `RPATH`
specifically for use with `sbcast`:
spack install lammps ldflags='-Wl,-rpath=$ORIGIN/lmp_libs'
This will put the `lmp_libs` directory alongside `LAMMPS`'s `lmp` executable first in the
`RPATH`, so it will be searched before any directories on the shared filesystem.
## Issues with quoting
Before this PR, the command above would've errored out for two reasons:
1. `$` wasn't an allowed character in our spec parser.
2. You would've had to double quote the flags to get them to pass through correctly:
spack install lammps ldflags='"-Wl,-rpath=$ORIGIN/lmp_libs"'
This is ugly and I don't think many users will easily figure it out. The behavior was added in
spack#29282, and it improved parsing of specs passed as a single string, e.g.:
spack install 'lammps ldflags="-Wl,-rpath=$ORIGIN/lmp_libs"'
but a lot of users are naturally going to try to quote arguments *directly* on the command
line, without quoting their entire spec. spack#29282 used a heuristic to detect unquoted flags
and warn the user, but the warning could be confusing. In particular, if you wrote
`cflags="-O2 -g"` on the command line, it would break the flags up, warn, and tell you
that you could fix the issue by writing `cflags="-O2 -g"` even though you just wrote
that. It's telling you to *quote* that value, but the user has to know to double quote.
## New heuristic for quoted arguments from the CLI
There are only two places where we allow arbitrary quoted strings in specs: flags and
variant values, so this PR adds a simpler heuristic to the CLI parser: if an argument in
`sys.argv` starts with `name=...`, then we assume the whole argument is quoted.
This means you can write:
spack install bzip2 cflags="-O2 -g"
directly on the command line, without multiple levels of quoting. This also works:
spack install 'bzip2 cflags="-O2 -g"'
The only place where this heuristic runs into ambiguity is if you attempt to pass
anonymous specs that start with `name=...` as one large string. e.g., this will be
interpreted as one large flag value:
spack find 'cflags="-O2 -g" ~bar +baz'
This sets `cflags` to `"-O2 -g" ~bar +baz`, which is likely not what you wanted. You
can fix this easily by either removing the quotes:
spack find cflags="-O2 -g" ~bar +baz
Or by adding a space at the start, which has the same effect:
spack find ' cflags="-O2 -g" ~bar +baz'
You may wonder why we don't just look for quotes inside of flag arguments, and the
reason is that you *might* want them there. If you are passing arguments like:
spack install zlib cppflags="-D DEBUG_MSG1='quick fox' -D DEBUG_MSG2='lazy dog'"
You *need* the quotes there. So we've opted for one potentially confusing, but easily
fixed outcome vs. limiting what you can put in your quoted strings.
## Quotes in formatted spec output
In addition to being more lenient about characters accepted in quoted strings, this PR fixes
up spec formatting a bit. We now format quoted strings in specs with single quotes, unless
the string has a single quote in it, in which case we JSON-escape the string (i.e., we add
`\` before `"` and `\`).
zlib cflags='-D FOO="bar"'
zlib cflags="-D FOO='bar'"
zlib cflags="-D FOO='bar' BAR=\"baz\""
This PR does several things:
- [x] Allow any character to appear in the quoted values of variants and flags.
- [x] Allow easier passing of quoted flags on the command line, e.g. `cflags="-O2 -g"`.
- [x] Handle quoting better in spec output, using single quotes around double
quotes and vice versa.
- [x] Disallow spaces around `=` and `==` when parsing variants and flags.
## Motivation
This PR is motivated by the issues above and by ORNL's
[tips for launching at scale on Frontier](https://docs.olcf.ornl.gov/systems/frontier_user_guide.html#tips-for-launching-at-scale).
ORNL recommends using `sbcast --send-libs` to broadcast executables and their
libraries to compute nodes when running large jobs (e.g., 80k ranks). For an
executable named `exe`, `sbcast --send-libs` stores the needed libraries in a
directory alongside the executable called `exe_libs`. ORNL recommends pointing
`LD_LIBRARY_PATH` at that directory so that `exe` will find the local libraries and
not overwhelm the filesystem.
There are other ways to mitigate this problem:
* You could build with `RUNPATH` using `spack config add config:shared_linking:type:runpath`,
which would make `LD_LIBRARY_PATH` take precedence over Spack's `RUNPATHs`.
I don't recommend this one because `RUNPATH` can cause many other things to go wrong.
* You could use `spack config add config:shared_linking:bind:true`, added in spack#31948, which
will greatly reduce the filesystem load for large jobs by pointing `DT_NEEDED` entries in
ELF *directly* at the needed `.so` files instead of relying on `RPATH` search via soname.
I have not experimented with this at 80,000 ranks, but it should help quite a bit.
* You could use [Spindle](https://github.com/hpc/Spindle) (as LLNL does on its machines)
which should transparently fix this without any changes to your executable and without
any need to use `sbcast` or other tools.
But we want to support the `sbcast` use case as well.
## `sbcast` and Spack
Spack's `RPATHs` break the `sbcast` fix because they're considered with higher precedence
than `LD_LIBRARY_PATH`. So Spack applications will still end up hitting the shared filesystem
when searching for libraries. We can avoid this by injecting some `ldflags` in to the build, e.g.,
if were were going to launch, say, `LAMMPS` at scale, we could add another `RPATH`
specifically for use with `sbcast`:
spack install lammps ldflags='-Wl,-rpath=$ORIGIN/lmp_libs'
This will put the `lmp_libs` directory alongside `LAMMPS`'s `lmp` executable first in the
`RPATH`, so it will be searched before any directories on the shared filesystem.
## Issues with quoting
Before this PR, the command above would've errored out for two reasons:
1. `$` wasn't an allowed character in our spec parser.
2. You would've had to double quote the flags to get them to pass through correctly:
spack install lammps ldflags='"-Wl,-rpath=$ORIGIN/lmp_libs"'
This is ugly and I don't think many users will easily figure it out. The behavior was added in
spack#29282, and it improved parsing of specs passed as a single string, e.g.:
spack install 'lammps ldflags="-Wl,-rpath=$ORIGIN/lmp_libs"'
but a lot of users are naturally going to try to quote arguments *directly* on the command
line, without quoting their entire spec. spack#29282 used a heuristic to detect unquoted flags
and warn the user, but the warning could be confusing. In particular, if you wrote
`cflags="-O2 -g"` on the command line, it would break the flags up, warn, and tell you
that you could fix the issue by writing `cflags="-O2 -g"` even though you just wrote
that. It's telling you to *quote* that value, but the user has to know to double quote.
## New heuristic for quoted arguments from the CLI
There are only two places where we allow arbitrary quoted strings in specs: flags and
variant values, so this PR adds a simpler heuristic to the CLI parser: if an argument in
`sys.argv` starts with `name=...`, then we assume the whole argument is quoted.
This means you can write:
spack install bzip2 cflags="-O2 -g"
directly on the command line, without multiple levels of quoting. This also works:
spack install 'bzip2 cflags="-O2 -g"'
The only place where this heuristic runs into ambiguity is if you attempt to pass
anonymous specs that start with `name=...` as one large string. e.g., this will be
interpreted as one large flag value:
spack find 'cflags="-O2 -g" ~bar +baz'
This sets `cflags` to `"-O2 -g" ~bar +baz`, which is likely not what you wanted. You
can fix this easily by either removing the quotes:
spack find cflags="-O2 -g" ~bar +baz
Or by adding a space at the start, which has the same effect:
spack find ' cflags="-O2 -g" ~bar +baz'
You may wonder why we don't just look for quotes inside of flag arguments, and the
reason is that you *might* want them there. If you are passing arguments like:
spack install zlib cppflags="-D DEBUG_MSG1='quick fox' -D DEBUG_MSG2='lazy dog'"
You *need* the quotes there. So we've opted for one potentially confusing, but easily
fixed outcome vs. limiting what you can put in your quoted strings.
## Quotes in formatted spec output
In addition to being more lenient about characters accepted in quoted strings, this PR fixes
up spec formatting a bit. We now format quoted strings in specs with single quotes, unless
the string has a single quote in it, in which case we JSON-escape the string (i.e., we add
`\` before `"` and `\`).
zlib cflags='-D FOO="bar"'
zlib cflags="-D FOO='bar'"
zlib cflags="-D FOO='bar' BAR=\"baz\""
Basically implements this idea: https://stoppels.ch/2022/08/04/stop-searching-for-shared-libraries.html (inspired by the macOS
install_nametricks)In short:
Adds another post install hook that loops over the install prefix, looking for shared libraries type of ELF files, and sets the soname to their own absolute paths.
The idea being, whenever somebody links against those libraries, the linker copies the soname (which is the absolute path to the library) as a "needed" library, so that at runtime the dynamic loader realizes the needed library is a path which should be loaded directly without searching.
As a result:
dlopen), which largely solves the issue that Spack's rpaths are a heuristic (<prefix>/liband<prefix>/lib64might not be where libraries really are...)To enable, run