Skip to content

ELF: verify needed libraries are resolvable#28109

Closed
haampie wants to merge 5 commits intospack:developfrom
haampie:feature/check-needed-libs-elf-files
Closed

ELF: verify needed libraries are resolvable#28109
haampie wants to merge 5 commits intospack:developfrom
haampie:feature/check-needed-libs-elf-files

Conversation

@haampie
Copy link
Copy Markdown
Member

@haampie haampie commented Dec 21, 2021

This commit adds a post-install hook that walks over the prefix and
verifies that each ELF executable/library that takes part in shared
linking can in fact resolve its direct depenendencies.

This is slightly simpler than running ldd in the sense that it is not
recursive (we only care about the package) and its verification logic is
very minimal (we require ET_DYN type elf files for libs, and they should
match the class / data / isa of the parent).

Also it is platform independent, so we don't have to worry about calling
the correct ldd.

This check does not fail the install, it just prints it as an error
message.

Examples

llvm

In [1]: import spack.hooks.check_dynamic_linking_elf

In [2]: spec = spack.store.db.query("llvm")[0]

In [3]: spack.hooks.check_dynamic_linking_elf.post_install(spec)

==> Error: Not all executables/libraries can resolve their dependencies:
bin/llvm-omp-device-info
        libomp.so => /home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/llvm-15.0.0-neuno6erto7nzlbttmzuepmgujk3c76t/bin/../lib/libomp.so
        libomptarget.so.15 => /home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/llvm-15.0.0-neuno6erto7nzlbttmzuepmgujk3c76t/bin/../lib/libomptarget.so.15
        libhwloc.so.15 => not found

lib/libomp.so
        libhwloc.so.15 => not found

lib/libompd.so
        libomp.so => not found
        libhwloc.so.15 => not found

lib/libomptarget.rtl.amdgpu.so.15
        libelf.so.1 => not found
        libz.so.1 => not found
        libtinfo.so.6 => not found

lib/libomptarget.rtl.cuda.so.15
        libelf.so.1 => not found
        libz.so.1 => not found
        libtinfo.so.6 => not found

lib/libomptarget.rtl.x86_64.so.15
        libffi.so.8 => not found
        libelf.so.1 => not found
        libz.so.1 => not found
        libtinfo.so.6 => not found

lib/libomptarget.so.15
        libz.so.1 => not found
        libtinfo.so.6 => not found

libexec/llvm/llvm-omp-device-info
        libomp.so => not found
        libomptarget.so.15 => not found
        libhwloc.so.15 => not found

python

lib/python3.9/lib-dynload/_crypt.cpython-39-x86_64-linux-gnu.so
        libcrypt.so.1 => not found

lib/python3.9/lib-dynload/nis.cpython-39-x86_64-linux-gnu.so
        libtirpc.so.3 => not found

perl

lib/5.36.0/x86_64-linux-thread-multi/CORE/libperl.so
        libcrypt.so.1 => not found

Performance

For me it takes 97.2 ms to process the LLVM prefix (6.8GB)

@spackbot-app spackbot-app bot added the tests General test capability(ies) label Dec 21, 2021
@haampie haampie force-pushed the feature/check-needed-libs-elf-files branch from 0579a95 to b824425 Compare December 21, 2021 13:50
@haampie haampie force-pushed the feature/check-needed-libs-elf-files branch 2 times, most recently from 1535df1 to 7397928 Compare December 21, 2021 14:15
@haampie
Copy link
Copy Markdown
Member Author

haampie commented Dec 21, 2021

Notice that the above shows a legitimate problem with python picking up a system lib (libdb-5.3.so is not found in an rpath, and should likely have been libgdbm.so which is a lib installed by spack):

Screenshot from 2021-12-21 15-39-42

Which proves this is useful

Copy link
Copy Markdown
Member

@scheibelp scheibelp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't thoroughly reviewed this yet but have some high-level questions about the implementation:

  • I'm wondering if we can avoid maintaining a whitelist of libraries which can be missing
  • I think this PR includes some optimization of RPATH rewriting which could be moved to another PR

The other comments are minor in comparison.

'libstdc++',
'libtsan',
'libubsan',
])
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curating this list will likely be an ongoing (and initially-frequent) process. I'm wondering if we can circumvent this curation by searching for libraries that meet the following criteria:

  • The library is present in a dependency
  • We are getting it from somewhere else

This would involve perhaps creating a DB of all library files in all dependencies (which would be useful for other purposes e.g. better error messages when an installation fails because of missing symbols).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, but it does not help with errors like #28114: if multiple libraries are supported and the build links to the first it finds on the system (e.g. berkely-db/intel-mkl on the system, whereas gdbm/openblas was the spack dependency).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also thinking that something stateless is better than yet another cache

Copy link
Copy Markdown
Member Author

@haampie haampie Dec 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we have to deal with stuff like MKL which:

  • installs libs with the same filename, soname, arch in different directories depending on what interface the user needs (32 vs 64 bit integers on a 64 bit arch). There is actually no way of knowing the required library.
  • installs a copy of Qt for their updater/installer

So "the library is present in a dependency" is not a guarantee that it's meant to be exposed to the user, which is the reason I went for spec[x].libs.directories as candidate search paths.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice that the MKL issue also means we can't just randomly inject rpaths to MKL in the first place.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And... turns out .libs can't be trusted either. For mkl it returns glibc libs 😩. I don't see how to make any progress here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, but it does not help with errors like #28114: if multiple libraries are supported and the build links to the first it finds on the system (e.g. berkely-db/intel-mkl on the system, whereas gdbm/openblas was the spack dependency).

Sorry I don't think I understand: wouldn't running ldd for example give the locations of all found dependency libraries? In this case for example I think the error we would detect is:

  • berkeley-db installed by Spack has a library X
  • ldd reports finding X, but in a system path
  • Now we know there's an issue, because we should be getting X from the Spack package

One place where my suggestion wouldn't work is if Spack's Python package did not mention berkeley-db as a transitive dependency (in that case Spack may not install it, and hence may not know that X is provided). It looks like that might happen for example if one uses an external openssl.

So "the library is present in a dependency" is not a guarantee that it's meant to be exposed to the user,

.libs should not return such libraries, so we can also harvest the names from that (.libs is a LibraryList object which can give you .directories or .filenames - in the latter case the individual library files).

I'm also thinking that something stateless is better than yet another cache

I agree that caches have plenty of issues, although IMO in this case the benefits are worthwhile. If we can find a way to limit the complaints that end users get then this solution could be the right way to go. I think curating skip_list will take significant work:

  • I assume each different compiler will have libraries (right now you have ones associated with GCC, but I assume there will need to be at least one entry for Intel, nvhpc, go, etc.
    • Also, whether or not it's "OK" to find some of these on the system depends on whether we've installed a compiler with Spack
  • I assume there may be minor differences for different operating systems (MacOS, and perhaps other Linux distributions)

That doesn't actually seem so bad to manage, now that I list it out. I wonder though if I'm missing something.

Perhaps this can be mitigated by only optionally running this post-install hook. I think that for example it would always make sense to run this for all CI jobs on all systems.

if s.external:
continue
try:
dirs = spec[s.name].libs.directories
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is causing a CI failure:

spack.build_environment.ChildError: AttributeError: Query of package 'mpileaks' for 'libs' failed

(at https://github.com/spack/spack/runs/4598014455?check_suite_focus=true):

  • this is now calling .libs for every installed package
  • the default implementation fails if no libs are found
  • the mock packages don't have libraries

So one solution for example would be to define .libs for the mock mpileaks package.

Copy link
Copy Markdown
Member Author

@haampie haampie Dec 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this hook should not run in tests, only maybe in an integration test.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally the build tests and CI jobs are sufficient to exercise this so I mostly agree. I think at least one unit test should run the hook though (so e.g. if something about the API of .libs changes, it is easy to see that the hook also needs to be updated).

@haampie haampie force-pushed the feature/check-needed-libs-elf-files branch from ff74354 to 8125f23 Compare November 8, 2022 13:43
@spackbot-app spackbot-app bot added the core PR affects Spack core functionality label Nov 8, 2022
@haampie haampie force-pushed the feature/check-needed-libs-elf-files branch from 8125f23 to 1eeacbb Compare November 8, 2022 13:44
@haampie haampie changed the title Warn when elf files with dt needed section can't locate libraries in direct link type deps ELF: verify needed libraries are resolvable Nov 8, 2022
@haampie
Copy link
Copy Markdown
Member Author

haampie commented Nov 8, 2022

I've now resurrected and rebased this PR to basically just check if dependencies can be resolved.

@haampie haampie requested a review from trws November 8, 2022 13:48
@haampie
Copy link
Copy Markdown
Member Author

haampie commented Nov 8, 2022

Another "improvement" could be checking that the resolved paths are actually inside the prefixes of the non-external direct link deps, and if not, warn about that. So verify that

any(realpath(resolved_lib).startswith(realpath(s.prefix)) for s in [spec] + spec.dependencies(deptype="link"))

Or maybe include transitive link deps, since maybe the direct dep is linked statically and pulls in transitive deps that are doing shared linking.

haampie added a commit to haampie/spack that referenced this pull request Nov 11, 2022
@adamjstewart
Copy link
Copy Markdown
Member

Does this close #1304?

@haampie
Copy link
Copy Markdown
Member Author

haampie commented Nov 12, 2022

No, doesn't resolve symbols

@spackbot-app spackbot-app bot added defaults gitlab Issues related to gitlab integration labels Nov 14, 2022
@haampie haampie force-pushed the feature/check-needed-libs-elf-files branch from 09677b5 to d96d107 Compare November 14, 2022 10:49
@haampie
Copy link
Copy Markdown
Member Author

haampie commented Nov 14, 2022

@spackbot rebuild everything

@spackbot-app
Copy link
Copy Markdown

spackbot-app bot commented Nov 14, 2022

I've started that pipeline for you!

@haampie haampie mentioned this pull request Nov 14, 2022
haampie added a commit to haampie/spack that referenced this pull request Nov 14, 2022
Detected using spack#28109

```
bin/ccache-swig
        libz.so.1 => not found
```
This commit adds a post-install hook that walks over the prefix and
verifies that each ELF executable/library that takes part in shared
linking can in fact resolve its direct depenendencies.

This is slightly simpler than running `ldd` in the sense that it is not
recursive (we only care about the package) and its verification logic is
very minimal (we require ET_DYN type elf files for libs, and they should
match the class / data / isa as the parent).

Also it is platform independent, so we don't have to worry about calling
the correct `ldd`.

This check does not *fail* the install, it just prints it as an error
message.
@haampie haampie force-pushed the feature/check-needed-libs-elf-files branch from 3fe8cb2 to f6fbd42 Compare November 14, 2022 15:35
@tgamblin tgamblin self-requested a review November 14, 2022 15:53
@haampie
Copy link
Copy Markdown
Member Author

haampie commented Nov 14, 2022

If we add a "strict" mode for CI, what do we do with stuff like cuda / intel-mkl / nvhpc that assume zlib to be a system dependency, or worse, have dependencies on internal deps like Qt5 and fail to set $ORIGIN type of rpaths?

Do we make per-package exceptions in packages.yaml? Do we use patchelf to fix up the binaries? Do we add a per-package ignore list for directories/files?

alalazo pushed a commit that referenced this pull request Nov 14, 2022
charmoniumQ pushed a commit to charmoniumQ/spack that referenced this pull request Nov 19, 2022
amd-toolchain-support pushed a commit to amd-toolchain-support/spack that referenced this pull request Feb 16, 2023
@haampie
Copy link
Copy Markdown
Member Author

haampie commented Nov 1, 2024

Closing this in favor of #47365

@haampie haampie closed this Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core PR affects Spack core functionality defaults gitlab Issues related to gitlab integration libraries tests General test capability(ies) update-package

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants