perf: various performance optimizations to hot spots (8.75x speedup) by Rongronggg9 · Pull Request #253 · ilikenwf/apt-fast

Rongronggg9 · 2025-03-15T19:33:10Z

This is a follow-up of dc4ef45 (#239, "perf: speed up (>3x) get_uris, display_downloadfile with bash builtin"). In that patch, I left some performance hot spots unchanged.

The patchset optimized the remaining hot spots while keeping the readability as much as possible. I used several optimization techniques, which were documented in the commit messages. Optimizations were split into several commits and were benchmarked individually. Check the commit messages out if you are confused about the mechanism of these techniques or would like to know the speedup when applying alone.

A brief test (excluding the overhead of apt --print-uris) shows that the overall speedup is 8.75x (5.86/0.67). This is done by executing the following:

$ apt -y --print-uris dist-upgrade | tee test-data | wc -l
3347
$ grep -E "^'(http(s|)|(s|)ftp)://" test-data | wc -l
1659
$ echo $'#!/bin/sh\nexec cat "test-data"' >apt && chmod +x apt
$ sudo PATH="$PWD:$PATH" APT_FAST_TIMEOUT=0 time apt-fast upgrade

Before:

5.20user 1.32system 0:05.86elapsed 111%CPU (0avgtext+0avgdata 12380maxresident)k
0inputs+0outputs (0major+1356629minor)pagefaults 0swaps

After:

0.47user 0.23system 0:00.67elapsed 104%CPU (0avgtext+0avgdata 12344maxresident)k
0inputs+0outputs (0major+77138minor)pagefaults 0swaps

To maintainers: I wish my commit message in each commit (instead of only the cover letter) could be preserved when merging.

This is a follow-up of dc4ef45 ("perf: speed up (>3x) get_uris, display_downloadfile with bash builtin"). In that patch, I left the calculation of file size unchanged. It turns out that it is the severest performance hot spot remaining. The patch utilizes numfmt's cut-style formatting (--field=N) so that we no longer need to convert file sizes into human-readable forms repeatedly while parsing uris_full. This eliminates a lot of overheads of subshell and piping, which consist of expensive fork()s and execve()s. A brief test (excluding the overhead of apt --print-uris) shows that the speedup is 2.06x (5.86/2.85). This is done by executing the following: $ apt -y --print-uris dist-upgrade | tee test-data | wc -l 3347 $ grep -E "^'(http(s|)|(s|)ftp)://" test-data | wc -l 1659 $ echo $'#!/bin/sh\nexec cat "test-data"' >apt && chmod +x apt $ sudo PATH="$PWD:$PATH" APT_FAST_TIMEOUT=0 time apt-fast upgrade Before: 5.20user 1.32system 0:05.86elapsed 111%CPU (0avgtext+0avgdata 12380maxresident)k 0inputs+0outputs (0major+1356629minor)pagefaults 0swaps After: 2.20user 0.76system 0:02.85elapsed 103%CPU (0avgtext+0avgdata 12436maxresident)k 0inputs+0outputs (0major+616493minor)pagefaults 0swaps

The patch utilizes bash operator += to speed up the appending to DOWNLOAD_DISPLAY and the accumulation to DOWNLOAD_SIZE. This means the modification can be done in place, eliminating the overhead of template substitution and variable reassignment. A brief test (excluding the overhead of apt --print-uris) shows that the speedup is 1.28x (2.83/2.22). Before: 2.20user 0.76system 0:02.85elapsed 103%CPU (0avgtext+0avgdata 12436maxresident)k 0inputs+0outputs (0major+616493minor)pagefaults 0swaps After: 1.73user 0.60system 0:02.22elapsed 105%CPU (0avgtext+0avgdata 12300maxresident)k 0inputs+0outputs (0major+487300minor)pagefaults 0swaps

The only user of urldecode() is get_uris() and the former is only called in one place in a loop. It turns out that it is one of the severest performance hot spots remaining (the other one is get_auth(), but inlining it would hurt the readability). The patch inlines urldecode() into the only place calling it, while conditionally skipping it when the file name does not contain any urlencoded string. When a file name needs to be decoded, the inlining eliminates the overheads of subshell and function calls, which consist of an expensive fork(). When a file name doesn't need to be decoded (this is very likely, in my case, 73.8%), skipping decoding further eliminates an additional subshell to call printf. A brief test (excluding the overhead of apt --print-uris) shows that the speedup is 1.37x (2.22/1.62). Before: 1.73user 0.60system 0:02.22elapsed 105%CPU (0avgtext+0avgdata 12300maxresident)k 0inputs+0outputs (0major+487300minor)pagefaults 0swaps After: 1.25user 0.45system 0:01.62elapsed 105%CPU (0avgtext+0avgdata 12328maxresident)k 0inputs+0outputs (0major+331489minor)pagefaults 0swaps

It turns out that the severest performance hot spot remaining is get_auth(), especially when it is not explicitly disabled but no netrc is configured, which should be most users' case. In such a scenario, calling get_auth() is effectively a no-op with significant overhead. The patch disables APT_FAST_APT_AUTH when prepare_auth finds no netrc is configured and makes get_uris() only call get_auth when APT_FAST_APT_AUTH is enabled. This eliminates the overheads of subshells, empty loops, and function calls. Users needing apt_auth should be almost unaffected since the newly introduced conditional branch only has the additional cost of ~1us. A brief test (excluding the overhead of apt --print-uris) shows that the speedup is 2.42x (1.62/0.67). Before: 1.25user 0.45system 0:01.62elapsed 105%CPU (0avgtext+0avgdata 12328maxresident)k 0inputs+0outputs (0major+331489minor)pagefaults 0swaps After: 0.47user 0.23system 0:00.67elapsed 104%CPU (0avgtext+0avgdata 12344maxresident)k 0inputs+0outputs (0major+77138minor)pagefaults 0swaps

Rongronggg9 · 2025-03-15T19:44:19Z

Fun fact: the speedup of #239 turns out to be 5.28x (30.89/5.85) while excluding the overhead of apt --print-uris.

Before:

32.24user 6.67system 0:30.89elapsed 125%CPU (0avgtext+0avgdata 12316maxresident)k
0inputs+0outputs (1major+8288603minor)pagefaults 0swaps

After:

5.40user 1.13system 0:05.85elapsed 111%CPU (0avgtext+0avgdata 12452maxresident)k
0inputs+0outputs (0major+1389815minor)pagefaults 0swaps

ilikenwf/apt-fast#253 Signed-off-by: shenmo <[email protected]>

aminya

I've been using this in production in setup-cpp, and it works without any issues. Nice speed boost!

Rongronggg9 added 4 commits March 16, 2025 03:31

zhiyuan1i pushed a commit to spark-store-project/spark-store that referenced this pull request Mar 17, 2025

* Perf: merge patches from Rongronggg9

668e5dc

ilikenwf/apt-fast#253 Signed-off-by: shenmo <[email protected]>

aminya mentioned this pull request May 21, 2025

fix: add apt-fast optimizations aminya/setup-cpp#402

Merged

aminya approved these changes May 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: various performance optimizations to hot spots (8.75x speedup)#253

perf: various performance optimizations to hot spots (8.75x speedup)#253
Rongronggg9 wants to merge 4 commits intoilikenwf:masterfrom
Rongronggg9:perf/various

Rongronggg9 commented Mar 15, 2025 •

edited

Loading

Uh oh!

Rongronggg9 commented Mar 15, 2025 •

edited

Loading

Uh oh!

aminya left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Rongronggg9 commented Mar 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Rongronggg9 commented Mar 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aminya left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Rongronggg9 commented Mar 15, 2025 •

edited

Loading

Rongronggg9 commented Mar 15, 2025 •

edited

Loading