perf: various performance optimizations to hot spots (8.75x speedup)#253
Open
Rongronggg9 wants to merge 4 commits intoilikenwf:masterfrom
Open
perf: various performance optimizations to hot spots (8.75x speedup)#253Rongronggg9 wants to merge 4 commits intoilikenwf:masterfrom
Rongronggg9 wants to merge 4 commits intoilikenwf:masterfrom
Conversation
This is a follow-up of dc4ef45 ("perf: speed up (>3x) get_uris, display_downloadfile with bash builtin"). In that patch, I left the calculation of file size unchanged. It turns out that it is the severest performance hot spot remaining. The patch utilizes numfmt's cut-style formatting (--field=N) so that we no longer need to convert file sizes into human-readable forms repeatedly while parsing uris_full. This eliminates a lot of overheads of subshell and piping, which consist of expensive fork()s and execve()s. A brief test (excluding the overhead of apt --print-uris) shows that the speedup is 2.06x (5.86/2.85). This is done by executing the following: $ apt -y --print-uris dist-upgrade | tee test-data | wc -l 3347 $ grep -E "^'(http(s|)|(s|)ftp)://" test-data | wc -l 1659 $ echo $'#!/bin/sh\nexec cat "test-data"' >apt && chmod +x apt $ sudo PATH="$PWD:$PATH" APT_FAST_TIMEOUT=0 time apt-fast upgrade Before: 5.20user 1.32system 0:05.86elapsed 111%CPU (0avgtext+0avgdata 12380maxresident)k 0inputs+0outputs (0major+1356629minor)pagefaults 0swaps After: 2.20user 0.76system 0:02.85elapsed 103%CPU (0avgtext+0avgdata 12436maxresident)k 0inputs+0outputs (0major+616493minor)pagefaults 0swaps
The patch utilizes bash operator += to speed up the appending to DOWNLOAD_DISPLAY and the accumulation to DOWNLOAD_SIZE. This means the modification can be done in place, eliminating the overhead of template substitution and variable reassignment. A brief test (excluding the overhead of apt --print-uris) shows that the speedup is 1.28x (2.83/2.22). Before: 2.20user 0.76system 0:02.85elapsed 103%CPU (0avgtext+0avgdata 12436maxresident)k 0inputs+0outputs (0major+616493minor)pagefaults 0swaps After: 1.73user 0.60system 0:02.22elapsed 105%CPU (0avgtext+0avgdata 12300maxresident)k 0inputs+0outputs (0major+487300minor)pagefaults 0swaps
The only user of urldecode() is get_uris() and the former is only called in one place in a loop. It turns out that it is one of the severest performance hot spots remaining (the other one is get_auth(), but inlining it would hurt the readability). The patch inlines urldecode() into the only place calling it, while conditionally skipping it when the file name does not contain any urlencoded string. When a file name needs to be decoded, the inlining eliminates the overheads of subshell and function calls, which consist of an expensive fork(). When a file name doesn't need to be decoded (this is very likely, in my case, 73.8%), skipping decoding further eliminates an additional subshell to call printf. A brief test (excluding the overhead of apt --print-uris) shows that the speedup is 1.37x (2.22/1.62). Before: 1.73user 0.60system 0:02.22elapsed 105%CPU (0avgtext+0avgdata 12300maxresident)k 0inputs+0outputs (0major+487300minor)pagefaults 0swaps After: 1.25user 0.45system 0:01.62elapsed 105%CPU (0avgtext+0avgdata 12328maxresident)k 0inputs+0outputs (0major+331489minor)pagefaults 0swaps
It turns out that the severest performance hot spot remaining is get_auth(), especially when it is not explicitly disabled but no netrc is configured, which should be most users' case. In such a scenario, calling get_auth() is effectively a no-op with significant overhead. The patch disables APT_FAST_APT_AUTH when prepare_auth finds no netrc is configured and makes get_uris() only call get_auth when APT_FAST_APT_AUTH is enabled. This eliminates the overheads of subshells, empty loops, and function calls. Users needing apt_auth should be almost unaffected since the newly introduced conditional branch only has the additional cost of ~1us. A brief test (excluding the overhead of apt --print-uris) shows that the speedup is 2.42x (1.62/0.67). Before: 1.25user 0.45system 0:01.62elapsed 105%CPU (0avgtext+0avgdata 12328maxresident)k 0inputs+0outputs (0major+331489minor)pagefaults 0swaps After: 0.47user 0.23system 0:00.67elapsed 104%CPU (0avgtext+0avgdata 12344maxresident)k 0inputs+0outputs (0major+77138minor)pagefaults 0swaps
Contributor
Author
|
Fun fact: the speedup of #239 turns out to be 5.28x (30.89/5.85) while excluding the overhead of Before: After: |
zhiyuan1i
pushed a commit
to spark-store-project/spark-store
that referenced
this pull request
Mar 17, 2025
ilikenwf/apt-fast#253 Signed-off-by: shenmo <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a follow-up of dc4ef45 (#239, "perf: speed up (>3x) get_uris, display_downloadfile with bash builtin"). In that patch, I left some performance hot spots unchanged.
The patchset optimized the remaining hot spots while keeping the readability as much as possible. I used several optimization techniques, which were documented in the commit messages. Optimizations were split into several commits and were benchmarked individually. Check the commit messages out if you are confused about the mechanism of these techniques or would like to know the speedup when applying alone.
A brief test (excluding the overhead of
apt --print-uris) shows that the overall speedup is 8.75x (5.86/0.67). This is done by executing the following:Before:
After:
To maintainers: I wish my commit message in each commit (instead of only the cover letter) could be preserved when merging.