Speed up relocation and fix bug#33608
Conversation
| paths_to_relocate = [old_prefix, old_layout_root] | ||
| paths_to_relocate.extend(prefix_to_hash.keys()) | ||
| files_to_relocate = list( | ||
| filter( | ||
| lambda pathname: not relocate.file_is_relocatable( | ||
| pathname, paths_to_relocate=paths_to_relocate | ||
| ), | ||
| map( | ||
| lambda filename: os.path.join(workdir, filename), | ||
| buildinfo["relocate_binaries"], | ||
| ), | ||
| ) | ||
| ) |
There was a problem hiding this comment.
Before we were filtering buildinfo["relocate_binaries"], so the semantic is different (not saying it was correct before). Why are we doing a lot of extra processing in relocate.file_is_relocatable? Does that processing cover some edge cases or is it plain wrong or unneeded?
There was a problem hiding this comment.
This can safely be dropped, currently it's horrible, and removing it makes Spack only better.
The old logic is as follows: take all strings minus the rpath from a binary (since patchelf can handle that). If among this list you find old_prefix or old_layout_root, it means this binary needs to be patched.
This is ridiculous because:
a. it's testing only two of many strings searched from the prefix_to_prefix_bin map, so it's simply incorrect;
b. it's sequential;
c. it adds a dependency on binutil's strings executable;
d. currently we only ever call binary text replacement after rpath substitution, so there's no need to special case rpaths.
Maybe in the past this filter was useful for performance reasons (calling strings is faster than scanning for strings in a binary using Python?), but clearly that's no longer true after #33590
Currently there's a slow sequential step in binary relocation where all strings of a binary are collected, with rpaths removed, and then filtered for the old install root. This is completely unnecessary, and also incorrect, since we replace more than just the old install root in the prefix to prefix mapping. And in fact the prefix to prefix mapping is parallel, and a single pass. So even as an optimization, this filter makes no sense anymore. Therefor we remove it
Currently there's a slow sequential step in binary relocation where all strings of a binary are collected, with rpaths removed, and then filtered for the old install root. This is completely unnecessary, and also incorrect, since we replace more than just the old install root in the prefix to prefix mapping. And in fact the prefix to prefix mapping is parallel, and a single pass. So even as an optimization, this filter makes no sense anymore. Therefor we remove it
|
@haampie thank you for fixing my bone-headed implementation. |
|
If I recall correctly I was trying to preserve backwards compatibility. |
relocate_linkswas skipped for macOSLLVM before:
LLVM after:
All in all llvm installs in < 1 min, the bottleneck is decompressing the tarball now.