ELF: verify needed libraries are resolvable#28109
ELF: verify needed libraries are resolvable#28109haampie wants to merge 5 commits intospack:developfrom
Conversation
0579a95 to
b824425
Compare
1535df1 to
7397928
Compare
scheibelp
left a comment
There was a problem hiding this comment.
I haven't thoroughly reviewed this yet but have some high-level questions about the implementation:
- I'm wondering if we can avoid maintaining a whitelist of libraries which can be missing
- I think this PR includes some optimization of RPATH rewriting which could be moved to another PR
The other comments are minor in comparison.
| 'libstdc++', | ||
| 'libtsan', | ||
| 'libubsan', | ||
| ]) |
There was a problem hiding this comment.
Curating this list will likely be an ongoing (and initially-frequent) process. I'm wondering if we can circumvent this curation by searching for libraries that meet the following criteria:
- The library is present in a dependency
- We are getting it from somewhere else
This would involve perhaps creating a DB of all library files in all dependencies (which would be useful for other purposes e.g. better error messages when an installation fails because of missing symbols).
There was a problem hiding this comment.
Makes sense, but it does not help with errors like #28114: if multiple libraries are supported and the build links to the first it finds on the system (e.g. berkely-db/intel-mkl on the system, whereas gdbm/openblas was the spack dependency).
There was a problem hiding this comment.
I'm also thinking that something stateless is better than yet another cache
There was a problem hiding this comment.
Also we have to deal with stuff like MKL which:
- installs libs with the same filename, soname, arch in different directories depending on what interface the user needs (32 vs 64 bit integers on a 64 bit arch). There is actually no way of knowing the required library.
- installs a copy of Qt for their updater/installer
So "the library is present in a dependency" is not a guarantee that it's meant to be exposed to the user, which is the reason I went for spec[x].libs.directories as candidate search paths.
There was a problem hiding this comment.
Notice that the MKL issue also means we can't just randomly inject rpaths to MKL in the first place.
There was a problem hiding this comment.
And... turns out .libs can't be trusted either. For mkl it returns glibc libs 😩. I don't see how to make any progress here.
There was a problem hiding this comment.
Makes sense, but it does not help with errors like #28114: if multiple libraries are supported and the build links to the first it finds on the system (e.g. berkely-db/intel-mkl on the system, whereas gdbm/openblas was the spack dependency).
Sorry I don't think I understand: wouldn't running ldd for example give the locations of all found dependency libraries? In this case for example I think the error we would detect is:
berkeley-dbinstalled by Spack has a libraryXlddreports findingX, but in a system path- Now we know there's an issue, because we should be getting
Xfrom the Spack package
One place where my suggestion wouldn't work is if Spack's Python package did not mention berkeley-db as a transitive dependency (in that case Spack may not install it, and hence may not know that X is provided). It looks like that might happen for example if one uses an external openssl.
So "the library is present in a dependency" is not a guarantee that it's meant to be exposed to the user,
.libs should not return such libraries, so we can also harvest the names from that (.libs is a LibraryList object which can give you .directories or .filenames - in the latter case the individual library files).
I'm also thinking that something stateless is better than yet another cache
I agree that caches have plenty of issues, although IMO in this case the benefits are worthwhile. If we can find a way to limit the complaints that end users get then this solution could be the right way to go. I think curating skip_list will take significant work:
- I assume each different compiler will have libraries (right now you have ones associated with GCC, but I assume there will need to be at least one entry for Intel,
nvhpc,go, etc.- Also, whether or not it's "OK" to find some of these on the system depends on whether we've installed a compiler with Spack
- I assume there may be minor differences for different operating systems (MacOS, and perhaps other Linux distributions)
That doesn't actually seem so bad to manage, now that I list it out. I wonder though if I'm missing something.
Perhaps this can be mitigated by only optionally running this post-install hook. I think that for example it would always make sense to run this for all CI jobs on all systems.
| if s.external: | ||
| continue | ||
| try: | ||
| dirs = spec[s.name].libs.directories |
There was a problem hiding this comment.
I think this is causing a CI failure:
spack.build_environment.ChildError: AttributeError: Query of package 'mpileaks' for 'libs' failed
- this is now calling
.libsfor every installed package - the default implementation fails if no libs are found
- the mock packages don't have libraries
So one solution for example would be to define .libs for the mock mpileaks package.
There was a problem hiding this comment.
I think this hook should not run in tests, only maybe in an integration test.
There was a problem hiding this comment.
Generally the build tests and CI jobs are sufficient to exercise this so I mostly agree. I think at least one unit test should run the hook though (so e.g. if something about the API of .libs changes, it is easy to see that the hook also needs to be updated).
ff74354 to
8125f23
Compare
8125f23 to
1eeacbb
Compare
|
I've now resurrected and rebased this PR to basically just check if dependencies can be resolved. |
|
Another "improvement" could be checking that the resolved paths are actually inside the prefixes of the non-external direct link deps, and if not, warn about that. So verify that any(realpath(resolved_lib).startswith(realpath(s.prefix)) for s in [spec] + spec.dependencies(deptype="link"))Or maybe include transitive link deps, since maybe the direct dep is linked statically and pulls in transitive deps that are doing shared linking. |
Detected using spack#28109
|
Does this close #1304? |
|
No, doesn't resolve symbols |
09677b5 to
d96d107
Compare
|
@spackbot rebuild everything |
|
I've started that pipeline for you! |
Detected using spack#28109 ``` bin/ccache-swig libz.so.1 => not found ```
This commit adds a post-install hook that walks over the prefix and verifies that each ELF executable/library that takes part in shared linking can in fact resolve its direct depenendencies. This is slightly simpler than running `ldd` in the sense that it is not recursive (we only care about the package) and its verification logic is very minimal (we require ET_DYN type elf files for libs, and they should match the class / data / isa as the parent). Also it is platform independent, so we don't have to worry about calling the correct `ldd`. This check does not *fail* the install, it just prints it as an error message.
3fe8cb2 to
f6fbd42
Compare
|
If we add a "strict" mode for CI, what do we do with stuff like Do we make per-package exceptions in |
Detected using spack#28109
Detected using spack#28109
|
Closing this in favor of #47365 |

This commit adds a post-install hook that walks over the prefix and
verifies that each ELF executable/library that takes part in shared
linking can in fact resolve its direct depenendencies.
This is slightly simpler than running
lddin the sense that it is notrecursive (we only care about the package) and its verification logic is
very minimal (we require ET_DYN type elf files for libs, and they should
match the class / data / isa of the parent).
Also it is platform independent, so we don't have to worry about calling
the correct
ldd.This check does not fail the install, it just prints it as an error
message.
Examples
llvmpythonperlPerformance
For me it takes
97.2 msto process the LLVM prefix (6.8GB)