Skip to content

_replace_prefix_bin performance improvements#33590

Merged
alalazo merged 5 commits intospack:developfrom
haampie:fix/faster-binary-string-replacement
Oct 31, 2022
Merged

_replace_prefix_bin performance improvements#33590
alalazo merged 5 commits intospack:developfrom
haampie:fix/faster-binary-string-replacement

Conversation

@haampie
Copy link
Copy Markdown
Member

@haampie haampie commented Oct 28, 2022

  • single pass over the binary data matching all prefixes
  • collect offsets and replacement strings
  • do in-place updates with fseek / fwrite, since typically our
    replacement touch O(few bytes) while the file is O(many megabytes)
  • be nice: leave the file untouched if some string can't be
    replaced

Note: the order of ordered dict is preserved, since the regex will greedily match the left-hand side of | before backtracking and matching the rhs.

- single pass over the binary data matching all prefixes
- collect offsets and replacement strings
- do in-place updates with `fseek` / `fwrite`, since typically our
  replacement touch O(few bytes) while the file is O(many megabytes)
- be nice: leave the file untouched if some string can't be
  replaced
@spackbot-app spackbot-app bot added the core PR affects Spack core functionality label Oct 28, 2022
@alalazo alalazo requested a review from trws October 29, 2022 04:42
@alalazo alalazo self-assigned this Oct 29, 2022
@alalazo
Copy link
Copy Markdown
Member

alalazo commented Oct 29, 2022

This LGTM. Do we have tests already to catch regressions on binary replacement? Also, I think we can extract this as a generic function somewhere else (replace_binary_strings?).

@haampie
Copy link
Copy Markdown
Member Author

haampie commented Oct 29, 2022

There are tests in relocate.py covering this code.

However, I don't immediately see where the order of replacement is tested. At the same time, I believe the order is currently irrelevant, because spack's install_tree root is added before more specific individual package prefixes in the list of paths to be replaced.

@spackbot-app spackbot-app bot added the tests General test capability(ies) label Oct 29, 2022
@haampie
Copy link
Copy Markdown
Member Author

haampie commented Oct 29, 2022

Performance improvements on a larger file such as

152M	libclang-cpp.so.15

using this benchmark:

import spack.relocate
import collections

paths = collections.OrderedDict([
    (b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/llvm-15.0.0-neuno6erto7nzlbttmzuepmgujk3c76t", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/llvm-15.0.0-neuno6erto7nzlbttmzuepmgujk3c76t"),
    (b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/binutils-2.38-mskm4arih4xwrw2d2lghyhl2jl2nks3e", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/binutils-2.38-mskm4arih4xwrw2d2lghyhl2jl2nks3e"),
    (b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/diffutils-3.8-ule36qgztu5bimfjcocoryg5avs4rwgy", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/diffutils-3.8-ule36qgztu5bimfjcocoryg5avs4rwgy"),
    (b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/libiconv-1.16-km7kydpek6i7iibmpspaohrgioblmglm", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/libiconv-1.16-km7kydpek6i7iibmpspaohrgioblmglm"),
    (b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/gettext-0.21-jkli62ejsoo3azoyghawslzjvmb4ewaz", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/gettext-0.21-jkli62ejsoo3azoyghawslzjvmb4ewaz"),
    (b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/bzip2-1.0.8-h42dtjuvxugmdpuo5rlk6sjg2duyg5w5", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/bzip2-1.0.8-h42dtjuvxugmdpuo5rlk6sjg2duyg5w5"),
    (b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/libxml2-2.10.1-2vshpol4c6v5mlbgvxgdiaq7dy3vdzir", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/libxml2-2.10.1-2vshpol4c6v5mlbgvxgdiaq7dy3vdzir"),
    (b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/pkgconf-1.8.0-iklqdkh3mbfffezn6lyggdedvtyibyxx", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/pkgconf-1.8.0-iklqdkh3mbfffezn6lyggdedvtyibyxx"),
    (b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/xz-5.2.5-orcksoaevfyzqy7kuxc66ialqzgrwns7", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/xz-5.2.5-orcksoaevfyzqy7kuxc66ialqzgrwns7"),
    (b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/zlib-1.2.12-2ujwj4a64tabxejmzahaomaldrr5dmnq", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/zlib-1.2.12-2ujwj4a64tabxejmzahaomaldrr5dmnq"),
    (b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/ncurses-6.3-azrbhhcayypnn2kowjogobujyemr2w7m", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/ncurses-6.3-azrbhhcayypnn2kowjogobujyemr2w7m"),
    (b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/tar-1.34-6t622t5aivlp2ygex27xwdwnxnu45yqr", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/tar-1.34-6t622t5aivlp2ygex27xwdwnxnu45yqr"),
    (b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/pigz-2.7-y42zhvfcw2dauecwgl3lbngaevy6xobc", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/pigz-2.7-y42zhvfcw2dauecwgl3lbngaevy6xobc"),
    (b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/zstd-1.5.2-fxri5kl4yfnpplt4sktpdn76lqw37h6q", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/zstd-1.5.2-fxri5kl4yfnpplt4sktpdn76lqw37h6q"),
    (b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/cmake-3.24.2-3gknjyutdbhqsmbdjfdocxyktee4252h", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/cmake-3.24.2-3gknjyutdbhqsmbdjfdocxyktee4252h"),
    (b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/openssl-1.1.1q-eflbj65ngbwc7mqeth3d2yiipsxhkcax", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/openssl-1.1.1q-eflbj65ngbwc7mqeth3d2yiipsxhkcax"),
    (b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/ca-certificates-mozilla-2022-07-19-hratelo734mvaqops3jndppjtutq6iue", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/ca-certificates-mozilla-2022-07-19-hratelo734mvaqops3jndppjtutq6iue"),
    (b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/perl-5.34.1-edydndkp7xawnv6c4r2epad5n6caqlxw", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/perl-5.34.1-edydndkp7xawnv6c4r2epad5n6caqlxw"),
    (b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/berkeley-db-18.1.40-u4wfrzfc4djyyzct27gvegbgihxbumnk", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/berkeley-db-18.1.40-u4wfrzfc4djyyzct27gvegbgihxbumnk"),
    (b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/gdbm-1.19-g2jotp3ow6kfv5g4qux5nq66bntwsz2w", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/gdbm-1.19-g2jotp3ow6kfv5g4qux5nq66bntwsz2w"),
    (b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/readline-8.1.2-agjouwu3npgqyrvzfjb75ph4pms3c2t3", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/readline-8.1.2-agjouwu3npgqyrvzfjb75ph4pms3c2t3"),
    (b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/hwloc-2.8.0-jpao3xm3c24a3y6wj7333fo3ogqn3iva", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/hwloc-2.8.0-jpao3xm3c24a3y6wj7333fo3ogqn3iva"),
    (b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/libpciaccess-0.16-owvxfxjjpl36pcbfl75tzwdiqjy7m7ej", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/libpciaccess-0.16-owvxfxjjpl36pcbfl75tzwdiqjy7m7ej"),
    (b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/libtool-2.4.7-bkk7x63dajsidgrargwlq2uz54vfuwi6", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/libtool-2.4.7-bkk7x63dajsidgrargwlq2uz54vfuwi6"),
    (b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/m4-1.4.19-hytjnglgcfs2yrtzqpx62yav6tiqngcb", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/m4-1.4.19-hytjnglgcfs2yrtzqpx62yav6tiqngcb"),
    (b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/libsigsegv-2.13-pvj57oryejtweco7q4om7th7pvydos5b", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/libsigsegv-2.13-pvj57oryejtweco7q4om7th7pvydos5b"),
    (b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/util-macros-1.19.3-mbfsvholbz2cxlj5a7vufhp32f7h65iw", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/util-macros-1.19.3-mbfsvholbz2cxlj5a7vufhp32f7h65iw"),
    (b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/libedit-3.1-20210216-suu52efjcke6xcgwzhybltph2t7dr45k", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/libedit-3.1-20210216-suu52efjcke6xcgwzhybltph2t7dr45k"),
    (b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/ninja-1.11.0-o27z37vl2xbqx2a5a4bywnf2kav47fn6", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/ninja-1.11.0-o27z37vl2xbqx2a5a4bywnf2kav47fn6"),
    (b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/python-3.9.13-b3r5vkv222ccjmksol4g25qlzz5rkv4n", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/python-3.9.13-b3r5vkv222ccjmksol4g25qlzz5rkv4n"),
    (b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/expat-2.4.8-j2r2ooee6vvfm7uhjwdt6awcpejkppk6", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/expat-2.4.8-j2r2ooee6vvfm7uhjwdt6awcpejkppk6"),
    (b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/libbsd-0.11.5-uuy3kkqg52xoks5vzrweob46sv3yv3tn", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/libbsd-0.11.5-uuy3kkqg52xoks5vzrweob46sv3yv3tn"),
    (b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/libmd-1.0.4-2e5ue6hyza2wvsgvw4hdwbs4hhxrgpqd", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/libmd-1.0.4-2e5ue6hyza2wvsgvw4hdwbs4hhxrgpqd"),
    (b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/libffi-3.4.2-v26hdizegf47qvxiuql6hjzhm36iinqc", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/libffi-3.4.2-v26hdizegf47qvxiuql6hjzhm36iinqc"),
    (b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/sqlite-3.39.2-3wc4melozoungmncx2mmyiuj24fp6a6k", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/sqlite-3.39.2-3wc4melozoungmncx2mmyiuj24fp6a6k"),
    (b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/util-linux-uuid-2.37.4-q2jt4jcwb5husdprtgm4pqfh3s42vnnx", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/util-linux-uuid-2.37.4-q2jt4jcwb5husdprtgm4pqfh3s42vnnx"),
    (b"/home/harmen/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/perl-data-dumper-2.173-lka4uq6p72r2rcwiwm5fm4tfifbtqrvg", b"/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.2.0/perl-data-dumper-2.173-lka4uq6p72r2rcwiwm5fm4tfifbtqrvg"),
])

%timeit spack.relocate._replace_prefix_bin("libclang-cpp.so.15", paths)

Goes from

586 ms ± 16.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

to

136 ms ± 1.48 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

So about 4x faster.

This benchmark is not really relocating, since it's applying the relocation logic on the same file over and over. But I expect actually replacing prefixes to be faster anyways, because it's doesn't need to write 152MB but only the prefixes (a few KB max).

@haampie
Copy link
Copy Markdown
Member Author

haampie commented Oct 29, 2022

@trws does it make sense to merge this before your changes? I think it's a net improvement in performance and adds a test for correctness too. If I understand correctly your change would affect the replacement value but not necessarily how it is replaced in the file, right? It remains an in-place update

@trws
Copy link
Copy Markdown
Contributor

trws commented Oct 30, 2022 via email

Copy link
Copy Markdown
Member

@alalazo alalazo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'l trigger a rebuild everything to test drive this and then merge

@alalazo
Copy link
Copy Markdown
Member

alalazo commented Oct 31, 2022

@spackbot rebuild everything

@spackbot-app
Copy link
Copy Markdown

spackbot-app bot commented Oct 31, 2022

I've started that pipeline for you!

@alalazo
Copy link
Copy Markdown
Member

alalazo commented Oct 31, 2022

Tested a fair amount of rebuilds, and they succeed. I canceled the long ones, since we don't want to wait hours on rebuilding indexes or openfoam / paraview.

@alalazo alalazo merged commit 616d5a8 into spack:develop Oct 31, 2022
@haampie haampie deleted the fix/faster-binary-string-replacement branch October 31, 2022 09:09
becker33 pushed a commit to RikkiButler20/spack that referenced this pull request Nov 2, 2022
- single pass over the binary data matching all prefixes
- collect offsets and replacement strings
- do in-place updates with `fseek` / `fwrite`, since typically our
  replacement touch O(few bytes) while the file is O(many megabytes)
- be nice: leave the file untouched if some string can't be
  replaced
charmoniumQ pushed a commit to charmoniumQ/spack that referenced this pull request Nov 19, 2022
- single pass over the binary data matching all prefixes
- collect offsets and replacement strings
- do in-place updates with `fseek` / `fwrite`, since typically our
  replacement touch O(few bytes) while the file is O(many megabytes)
- be nice: leave the file untouched if some string can't be
  replaced
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core PR affects Spack core functionality tests General test capability(ies)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants