_replace_prefix_bin performance improvements#33590
Conversation
- single pass over the binary data matching all prefixes - collect offsets and replacement strings - do in-place updates with `fseek` / `fwrite`, since typically our replacement touch O(few bytes) while the file is O(many megabytes) - be nice: leave the file untouched if some string can't be replaced
|
This LGTM. Do we have tests already to catch regressions on binary replacement? Also, I think we can extract this as a generic function somewhere else ( |
|
There are tests in relocate.py covering this code. However, I don't immediately see where the order of replacement is tested. At the same time, I believe the order is currently irrelevant, because spack's install_tree root is added before more specific individual package prefixes in the list of paths to be replaced. |
|
Performance improvements on a larger file such as using this benchmark: import spack.relocate
import collections
paths = collections.OrderedDict([
(b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/llvm-15.0.0-neuno6erto7nzlbttmzuepmgujk3c76t", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/llvm-15.0.0-neuno6erto7nzlbttmzuepmgujk3c76t"),
(b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/binutils-2.38-mskm4arih4xwrw2d2lghyhl2jl2nks3e", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/binutils-2.38-mskm4arih4xwrw2d2lghyhl2jl2nks3e"),
(b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/diffutils-3.8-ule36qgztu5bimfjcocoryg5avs4rwgy", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/diffutils-3.8-ule36qgztu5bimfjcocoryg5avs4rwgy"),
(b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/libiconv-1.16-km7kydpek6i7iibmpspaohrgioblmglm", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/libiconv-1.16-km7kydpek6i7iibmpspaohrgioblmglm"),
(b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/gettext-0.21-jkli62ejsoo3azoyghawslzjvmb4ewaz", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/gettext-0.21-jkli62ejsoo3azoyghawslzjvmb4ewaz"),
(b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/bzip2-1.0.8-h42dtjuvxugmdpuo5rlk6sjg2duyg5w5", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/bzip2-1.0.8-h42dtjuvxugmdpuo5rlk6sjg2duyg5w5"),
(b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/libxml2-2.10.1-2vshpol4c6v5mlbgvxgdiaq7dy3vdzir", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/libxml2-2.10.1-2vshpol4c6v5mlbgvxgdiaq7dy3vdzir"),
(b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/pkgconf-1.8.0-iklqdkh3mbfffezn6lyggdedvtyibyxx", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/pkgconf-1.8.0-iklqdkh3mbfffezn6lyggdedvtyibyxx"),
(b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/xz-5.2.5-orcksoaevfyzqy7kuxc66ialqzgrwns7", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/xz-5.2.5-orcksoaevfyzqy7kuxc66ialqzgrwns7"),
(b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/zlib-1.2.12-2ujwj4a64tabxejmzahaomaldrr5dmnq", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/zlib-1.2.12-2ujwj4a64tabxejmzahaomaldrr5dmnq"),
(b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/ncurses-6.3-azrbhhcayypnn2kowjogobujyemr2w7m", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/ncurses-6.3-azrbhhcayypnn2kowjogobujyemr2w7m"),
(b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/tar-1.34-6t622t5aivlp2ygex27xwdwnxnu45yqr", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/tar-1.34-6t622t5aivlp2ygex27xwdwnxnu45yqr"),
(b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/pigz-2.7-y42zhvfcw2dauecwgl3lbngaevy6xobc", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/pigz-2.7-y42zhvfcw2dauecwgl3lbngaevy6xobc"),
(b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/zstd-1.5.2-fxri5kl4yfnpplt4sktpdn76lqw37h6q", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/zstd-1.5.2-fxri5kl4yfnpplt4sktpdn76lqw37h6q"),
(b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/cmake-3.24.2-3gknjyutdbhqsmbdjfdocxyktee4252h", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/cmake-3.24.2-3gknjyutdbhqsmbdjfdocxyktee4252h"),
(b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/openssl-1.1.1q-eflbj65ngbwc7mqeth3d2yiipsxhkcax", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/openssl-1.1.1q-eflbj65ngbwc7mqeth3d2yiipsxhkcax"),
(b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/ca-certificates-mozilla-2022-07-19-hratelo734mvaqops3jndppjtutq6iue", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/ca-certificates-mozilla-2022-07-19-hratelo734mvaqops3jndppjtutq6iue"),
(b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/perl-5.34.1-edydndkp7xawnv6c4r2epad5n6caqlxw", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/perl-5.34.1-edydndkp7xawnv6c4r2epad5n6caqlxw"),
(b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/berkeley-db-18.1.40-u4wfrzfc4djyyzct27gvegbgihxbumnk", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/berkeley-db-18.1.40-u4wfrzfc4djyyzct27gvegbgihxbumnk"),
(b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/gdbm-1.19-g2jotp3ow6kfv5g4qux5nq66bntwsz2w", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/gdbm-1.19-g2jotp3ow6kfv5g4qux5nq66bntwsz2w"),
(b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/readline-8.1.2-agjouwu3npgqyrvzfjb75ph4pms3c2t3", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/readline-8.1.2-agjouwu3npgqyrvzfjb75ph4pms3c2t3"),
(b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/hwloc-2.8.0-jpao3xm3c24a3y6wj7333fo3ogqn3iva", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/hwloc-2.8.0-jpao3xm3c24a3y6wj7333fo3ogqn3iva"),
(b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/libpciaccess-0.16-owvxfxjjpl36pcbfl75tzwdiqjy7m7ej", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/libpciaccess-0.16-owvxfxjjpl36pcbfl75tzwdiqjy7m7ej"),
(b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/libtool-2.4.7-bkk7x63dajsidgrargwlq2uz54vfuwi6", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/libtool-2.4.7-bkk7x63dajsidgrargwlq2uz54vfuwi6"),
(b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/m4-1.4.19-hytjnglgcfs2yrtzqpx62yav6tiqngcb", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/m4-1.4.19-hytjnglgcfs2yrtzqpx62yav6tiqngcb"),
(b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/libsigsegv-2.13-pvj57oryejtweco7q4om7th7pvydos5b", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/libsigsegv-2.13-pvj57oryejtweco7q4om7th7pvydos5b"),
(b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/util-macros-1.19.3-mbfsvholbz2cxlj5a7vufhp32f7h65iw", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/util-macros-1.19.3-mbfsvholbz2cxlj5a7vufhp32f7h65iw"),
(b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/libedit-3.1-20210216-suu52efjcke6xcgwzhybltph2t7dr45k", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/libedit-3.1-20210216-suu52efjcke6xcgwzhybltph2t7dr45k"),
(b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/ninja-1.11.0-o27z37vl2xbqx2a5a4bywnf2kav47fn6", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/ninja-1.11.0-o27z37vl2xbqx2a5a4bywnf2kav47fn6"),
(b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/python-3.9.13-b3r5vkv222ccjmksol4g25qlzz5rkv4n", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/python-3.9.13-b3r5vkv222ccjmksol4g25qlzz5rkv4n"),
(b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/expat-2.4.8-j2r2ooee6vvfm7uhjwdt6awcpejkppk6", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/expat-2.4.8-j2r2ooee6vvfm7uhjwdt6awcpejkppk6"),
(b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/libbsd-0.11.5-uuy3kkqg52xoks5vzrweob46sv3yv3tn", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/libbsd-0.11.5-uuy3kkqg52xoks5vzrweob46sv3yv3tn"),
(b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/libmd-1.0.4-2e5ue6hyza2wvsgvw4hdwbs4hhxrgpqd", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/libmd-1.0.4-2e5ue6hyza2wvsgvw4hdwbs4hhxrgpqd"),
(b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/libffi-3.4.2-v26hdizegf47qvxiuql6hjzhm36iinqc", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/libffi-3.4.2-v26hdizegf47qvxiuql6hjzhm36iinqc"),
(b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/sqlite-3.39.2-3wc4melozoungmncx2mmyiuj24fp6a6k", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/sqlite-3.39.2-3wc4melozoungmncx2mmyiuj24fp6a6k"),
(b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/util-linux-uuid-2.37.4-q2jt4jcwb5husdprtgm4pqfh3s42vnnx", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/util-linux-uuid-2.37.4-q2jt4jcwb5husdprtgm4pqfh3s42vnnx"),
(b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/perl-data-dumper-2.173-lka4uq6p72r2rcwiwm5fm4tfifbtqrvg", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/perl-data-dumper-2.173-lka4uq6p72r2rcwiwm5fm4tfifbtqrvg"),
])
%timeit spack.relocate._replace_prefix_bin("libclang-cpp.so.15", paths)Goes from to So about 4x faster. This benchmark is not really relocating, since it's applying the relocation logic on the same file over and over. But I expect actually replacing prefixes to be faster anyways, because it's doesn't need to write 152MB but only the prefixes (a few KB max). |
|
@trws does it make sense to merge this before your changes? I think it's a net improvement in performance and adds a test for correctness too. If I understand correctly your change would affect the replacement value but not necessarily how it is replaced in the file, right? It remains an in-place update |
alalazo
left a comment
There was a problem hiding this comment.
I'l trigger a rebuild everything to test drive this and then merge
|
@spackbot rebuild everything |
|
I've started that pipeline for you! |
|
Tested a fair amount of rebuilds, and they succeed. I canceled the long ones, since we don't want to wait hours on rebuilding indexes or |
- single pass over the binary data matching all prefixes - collect offsets and replacement strings - do in-place updates with `fseek` / `fwrite`, since typically our replacement touch O(few bytes) while the file is O(many megabytes) - be nice: leave the file untouched if some string can't be replaced
- single pass over the binary data matching all prefixes - collect offsets and replacement strings - do in-place updates with `fseek` / `fwrite`, since typically our replacement touch O(few bytes) while the file is O(many megabytes) - be nice: leave the file untouched if some string can't be replaced
fseek/fwrite, since typically ourreplacement touch O(few bytes) while the file is O(many megabytes)
replaced
Note: the order of ordered dict is preserved, since the regex will greedily match the left-hand side of
|before backtracking and matching the rhs.