profiling: add benchmark and profiling scripts by hawkw · Pull Request #406 · linkerd/linkerd2-proxy

hawkw · 2020-01-13T21:50:33Z

This is essentially @pothos' PR #278, but rebased & updated to work with
the current master. In addition, I've changed the profiling proxy to be
run as a separate bin target (run with cargo run --bin profile) rather
than a test case that does nothing on most test runs and runs the proxy
in profiling mode when an env var is set.

Description from the original PR:

A benchmark/profiling script for local development
or a CI helps to catch performance regressions early
and find spots for optimization.

The benchmark setup consists of a cargo test
that reuses the test infrastructure to forward
localhost connections. This test is skipped by
default unless an env var is set.
The benchmark load comes from a fortio server
and client for HTTP/gRPC req/s latency measurement
and from an iperf server and client for TCP throughput
measurement.
In addition to the fortio UI to inspect the benchmark
data, the results are also stored to a summary text file
which can be used to plot the difference of the summary
results of, e.g., two git branches.

The profiling setup is the same as above but also
runs "perf" or "memory-profilier" to sample the
call stacks at either runtime or on heap allocation
calls. This requires a special debug build with
optimizations, that can be generated with a build script.
The results can be inspected as interactive flamegraph
SVGs in the browser.

Please follow the instructions in the profiling/README.md
file on how the scripts are used.

Signed-off-by: Kai Lüke [email protected]

Closes #278.

Signed-off-by: Eliza Weisman [email protected]
Co-authored-by: Kai Lüke [email protected]

A benchmark/profiling script for local development or a CI helps to catch performance regressions early and find spots for optimization. The benchmark setup consists of a cargo test that reuses the test infrastructure to forward localhost connections. This test is skipped by default unless an env var is set. The benchmark load comes from a fortio server and client for HTTP/gRPC req/s latency measurement and from an iperf server and client for TCP throughput measurement. In addition to the fortio UI to inspect the benchmark data, the results are also stored to a summary text file which can be used to plot the difference of the summary results of, e.g., two git branches. The profiling setup is the same as above but also runs "perf" or "memory-profilier" to sample the call stacks at either runtime or on heap allocation calls. This requires a special debug build with optimizations, that can be generated with a build script. The results can be inspected as interactive flamegraph SVGs in the browser. Please follow the instructions in the profiling/README.md file on how the scripts are used. Signed-off-by: Kai Lüke <[email protected]>

Signed-off-by: Eliza Weisman <[email protected]>

Since the profiling binary is no longer run as a Rust test, we don't need to silently skip it when the required env var is unset. Instead, we can actually print that we expected it to be set and exit early. Signed-off-by: Eliza Weisman <[email protected]>

Signed-off-by: Eliza Weisman <[email protected]>

hawkw · 2020-01-13T21:53:07Z

I've done some manual testing of the profiling scripts added in this branch, and confirmed that everything seems to work — I've even generated some potentially interesting flamegraphs.

olix0r · 2020-01-14T19:05:58Z

@hawkw Is it feasible to put the binary in its own crate (i.e. linkerd/app/profile) so that we don't have to compile the binary when building the integration tests?

olix0r · 2020-01-14T19:29:52Z

The profiling script needs to ensure that iperf is available and fail if it's not. Currently, the command will just stall and emit nothing, which is difficult to debug.

olix0r · 2020-01-14T19:37:04Z

Trying to invoke fortio, the profile script hangs. Our invocation appears to use a deprecated flag:

:; fortio load -resolve 127.0.0.1 -c=4 -qps=4000 -t=10s -payload-size=10 '-labels=eliza-benchmark-and-profiling 2020Jan14_19h28m54s Iter: 1 Dur: 10s Conns: 4 Streams: 4' -json=http1outbound_bench-4000-rps.2020Jan14_19h28m54s.json -keepalive=false -H 'Host: transparency.test.svc.cluster.local' localhost:4140
flag provided but not defined: -resolve

hawkw · 2020-01-14T20:00:17Z

Trying to invoke fortio, the profile script hangs. Our invocation appears to use a deprecated flag:

:; fortio load -resolve 127.0.0.1 -c=4 -qps=4000 -t=10s -payload-size=10 '-labels=eliza-benchmark-and-profiling 2020Jan14_19h28m54s Iter: 1 Dur: 10s Conns: 4 Streams: 4' -json=http1outbound_bench-4000-rps.2020Jan14_19h28m54s.json -keepalive=false -H 'Host: transparency.test.svc.cluster.local' localhost:4140
flag provided but not defined: -resolve

hmm, this worked for me on ares — what version of fortio do you have installed? on ares, I see:

eliza@ares:~$ fortio -version
1.3.2-pre unknown go1.10.2

which i suspect is probably old?

olix0r · 2020-01-14T20:07:51Z

I'm using the most recent release from fortio: :; fortio version 1.3.1 2019-02-02 14:08 fd8f4a7177e9ea509f27105ae4e55e6c68ece6f7 go1.11.5 Which is a bit older than I imagined, but... okay. I'll install from source I guess

…

On Tue, Jan 14, 2020 at 12:00 PM Eliza Weisman ***@***.***> wrote: Trying to invoke fortio, the profile script hangs. Our invocation appears to use a deprecated flag: :; fortio load -resolve 127.0.0.1 -c=4 -qps=4000 -t=10s -payload-size=10 '-labels=eliza-benchmark-and-profiling 2020Jan14_19h28m54s Iter: 1 Dur: 10s Conns: 4 Streams: 4' -json=http1outbound_bench-4000-rps.2020Jan14_19h28m54s.json -keepalive=false -H 'Host: transparency.test.svc.cluster.local' localhost:4140 flag provided but not defined: -resolve hmm, this worked for me on ares — what version of fortio do you have installed? on ares, I see: ***@***.***:~$ fortio -version 1.3.2-pre unknown go1.10.2 which i suspect is probably old? — You are receiving this because your review was requested. Reply to this email directly, view it on GitHub <#406?email_source=notifications&email_token=AAB2YYS62OXFZ5ZHCBINHW3Q5YKVFA5CNFSM4KGJLH72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEI55OBA#issuecomment-574347012>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAB2YYRS2TB7AZ3L5RXSIZLQ5YKVFANCNFSM4KGJLH7Q> .

-- Oliver Gould <[email protected]>

olix0r · 2020-01-14T20:19:13Z

I wonder if we can dockerize the test harness. Or we could use something like Pex; but it would be great to figure out how to bundle dependencies for things like plot.py (or replace with something else that we can bundle better)

Traceback (most recent call last):
  File "profiling/plot.py", line 2, in <module>
    import pandas as pd
ModuleNotFoundError: No module named 'pandas'

hawkw · 2020-01-14T20:41:13Z

I wonder if we can dockerize the test harness. Or we could use something like Pex; but it would be great to figure out how to bundle dependencies for things like plot.py (or replace with something else that we can bundle better)

yeah, I'm trying to figure out a nicer solution for that...i don't know a whole lot about Python & how to bundle python deps with a script, but I'm doing some research.

olix0r · 2020-01-14T21:42:50Z

Another issue I've run into while testing this: ``` profiling/profiling-perf-fortio.sh: line 52: 32065 Segmentation fault (core dumped) PROFILING_SUPPORT_SERVER="127.0.0.1:$SERVER_PORT" perf record -F 2000 --call-graph dwarf $LINKERD_TEST_BIN &> "$LOG" + echo 'proxy failed' proxy failed + perf script + inferno-collapse-perf WARNING: The perf.data file's data size field is 0 which is unexpected. Was the 'perf record' command properly terminated? profiling/profiling-perf-fortio.sh: line 52: 32118 Segmentation fault (core dumped) perf script 32119 Done | inferno-collapse-perf > "out_$NAME.$ID.folded" + killall iperf fortio ``` I think this means that the profile binary is segfaulting? I'll need to try manually running it to verify

…

On Tue, Jan 14, 2020 at 12:41 PM Eliza Weisman ***@***.***> wrote: I wonder if we can dockerize the test harness. Or we could use something like Pex; but it would be great to figure out how to bundle dependencies for things like plot.py (or replace with something else that we can bundle better) yeah, I'm trying to figure out a nicer solution for that...i don't know a whole lot about Python & how to bundle python deps with a script, but I'm doing some research. — You are receiving this because your review was requested. Reply to this email directly, view it on GitHub <#406?email_source=notifications&email_token=AAB2YYWIDPQNSAJRZNNJ4TLQ5YPOVA5CNFSM4KGJLH72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEI6BSKQ#issuecomment-574363946>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAB2YYRDJXNTYK5E3YZBFHTQ5YPOVANCNFSM4KGJLH7Q> .

-- Oliver Gould <[email protected]>

olix0r · 2020-01-14T22:17:51Z

That segfault was a config issue on my host. Seems to work properly after fixing it.

hawkw · 2020-01-14T22:57:42Z

@olix0r which, if any, of these changes would you prefer we make on this branch, and which ones can be done in follow-up PRs?

BTW, I've made some changes to the plot script in a second branch, it now produces error bars based on the (fortio-reported) standard deviation.

olix0r · 2020-01-14T23:40:34Z

@hawkw Basically, I care that we know how to run these scripts on arbitrary linux hosts. I'm still having some problems running the profiling scripts successfully. I'll need to dig in a little more to understand what's wrong. Once I've run these scripts on my laptop, I think we can merge and address feedback in followups.

hawkw · 2020-01-14T23:46:10Z

Okay, great. I'm working on dockerizing the plotting script...but I don't think that will help with the profiling scripts, given that perf needs to be installed on the host...

olix0r · 2020-01-14T23:49:58Z

@hawkw i suspect that we could dockerize perf/iperf etc if we gave the container SYS_ADMIN privileges. Not a hard blocker but I think it's worth investigating

olix0r · 2020-01-14T23:57:43Z

:; HIDE=0 RUST_BACKTRACE=1 RUST_LOG=info profiling/profiling-perf-fortio.sh
File marker eliza-benchmark-and-profiling 2020Jan14_23h52m30s Iter: 1 Dur: 6s Conns: 4 Streams: 4
------------------------------------------------------------
Server listening on TCP port 8080
TCP window size:  128 KByte (default)
------------------------------------------------------------
Couldn't record kernel reference relocation symbol
Symbol resolution may be skewed if relocation was used (e.g. kexec).
Check /proc/kallsyms permission or run as root.
support destination service listening; addr=V4(127.0.0.1:41155)
proxy running; tap=, identity=None, inbound=127.0.0.1:4143 (SO_ORIGINAL_DST=127.0.0.1:8080), outbound=127.0.0.1:4140 (SO_ORIGINAL_DST=127.0.0.1:8080), metrics=127.0.0.1:4191
INFO [     0.002500s] proxy{test=main:proxy}:inbound: linkerd2_app_inbound serving listen.addr=127.0.0.1:4143
LISTEN  0        128                0.0.0.0:8080                0.0.0.0:*
LISTEN  0        128              127.0.0.1:4140                0.0.0.0:*
TCP outbound
------------------------------------------------------------
Client connecting to 127.0.0.1, TCP port 4140
TCP window size: 2.50 MByte (default)
------------------------------------------------------------
[  3] local 127.0.0.1 port 37816 connected with 127.0.0.1 port 4140
[  4] local 127.0.0.1 port 8080 connected with 127.0.0.1 port 58680
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 6.0 sec  3.49 GBytes  5.00 Gbits/sec
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0- 6.0 sec  3.49 GBytes  4.99 Gbits/sec
server Listening dropped; addr=127.0.0.1:8080
server Listening dropped; addr=127.0.0.1:8080
[ perf record: Woken up 392 times to write data ]
LISTEN     0        128               0.0.0.0:8080              0.0.0.0:*
[ perf record: Captured and wrote 98.390 MB perf.data (12038 samples) ]
------------------------------------------------------------
Server listening on TCP port 8080
TCP window size:  128 KByte (default)
------------------------------------------------------------
Couldn't record kernel reference relocation symbol
Symbol resolution may be skewed if relocation was used (e.g. kexec).
Check /proc/kallsyms permission or run as root.
support destination service listening; addr=V4(127.0.0.1:48813)
proxy running; tap=, identity=None, inbound=127.0.0.1:4143 (SO_ORIGINAL_DST=127.0.0.1:8080), outbound=127.0.0.1:4140 (SO_ORIGINAL_DST=127.0.0.1:8080), metrics=127.0.0.1:4191
INFO [     0.003139s] proxy{test=main:proxy}:inbound: linkerd2_app_inbound serving listen.addr=127.0.0.1:4143
LISTEN     0        128               0.0.0.0:8080              0.0.0.0:*
LISTEN     0        128             127.0.0.1:4143              0.0.0.0:*
TCP inbound
------------------------------------------------------------
Client connecting to 127.0.0.1, TCP port 4143
TCP window size: 2.50 MByte (default)
------------------------------------------------------------
[  3] local 127.0.0.1 port 41582 connected with 127.0.0.1 port 4143
[  4] local 127.0.0.1 port 8080 connected with 127.0.0.1 port 58694
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 6.0 sec  3.81 GBytes  5.45 Gbits/sec
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0- 6.0 sec  3.81 GBytes  5.44 Gbits/sec
server Listening dropped; addr=127.0.0.1:8080
server Listening dropped; addr=127.0.0.1:8080
[ perf record: Woken up 393 times to write data ]
LISTEN     0        128               0.0.0.0:8080              0.0.0.0:*
[ perf record: Captured and wrote 98.621 MB perf.data (12076 samples) ]
Fortio 1.3.2-pre grpc 'ping' server listening on [::]:8079
Fortio 1.3.2-pre https redirector server listening on [::]:8081
Fortio 1.3.2-pre echo server listening on [::]:8080
23:55:40 W uihandler.go:877> Adding missing trailing / to UI path ''''
Data directory is /home/ver/b/l2-proxy/profiling
UI started - visit:
http://localhost:8080''/
(or any host/ip reachable on this server)
23:55:40 I fortio_main.go:214> All fortio 1.3.2-pre unknown go1.13.5 servers started!
Couldn't record kernel reference relocation symbol
Symbol resolution may be skewed if relocation was used (e.g. kexec).
Check /proc/kallsyms permission or run as root.
support destination service listening; addr=V4(127.0.0.1:60897)
proxy running; tap=, identity=None, inbound=127.0.0.1:4143 (SO_ORIGINAL_DST=127.0.0.1:8080), outbound=127.0.0.1:4140 (SO_ORIGINAL_DST=127.0.0.1:8080), metrics=127.0.0.1:4191
INFO [     0.002005s] proxy{test=main:proxy}:inbound: linkerd2_app_inbound serving listen.addr=127.0.0.1:4143
LISTEN  0        128                       *:8080                     *:*
LISTEN  0        128               127.0.0.1:4140               0.0.0.0:*
HTTP outbound Iteration: 1 RPS: 4000 REQ_BODY_LEN: 200
Fortio 1.3.2-pre running at 4000 queries per second, 12->12 procs, for 6s: localhost:4140
23:55:41 I httprunner.go:82> Starting http test for localhost:4140 with 4 threads at 4000.0 qps
23:55:41 W http_client.go:143> Assuming http:// on missing scheme for 'localhost:4140'
INFO [     0.979158s] proxy{test=main:proxy}: trust_dns_proto::xfer::dns_exchange sending message via: UDP(127.0.0.53:53)
INFO [     0.984403s] proxy{test=main:proxy}: trust_dns_proto::xfer::dns_exchange sending message via: UDP(127.0.0.53:53)
INFO [     0.990248s] proxy{test=main:proxy}: trust_dns_proto::xfer::dns_exchange sending message via: UDP(127.0.0.53:53)
INFO [     0.995758s] proxy{test=main:proxy}: trust_dns_proto::xfer::dns_exchange sending message via: UDP(127.0.0.53:53)
WARN [     1.001069s] proxy{test=main:proxy}:outbound:accept{peer.addr=127.0.0.1:37836}:source{target.addr=127.0.0.1:8080}:addr{addr=transparency.test.svc.cluster.local:80}: linkerd2_proxy_http::canonicalize failed to refine transparency.test.svc.cluster.local: no record found for name: transparency.test.svc.cluster.local.buoyant.int. type: AAAA class: IN; using original name
WARN [     1.002586s] proxy{test=main:proxy}:outbound:accept{peer.addr=127.0.0.1:37836}:source{target.addr=127.0.0.1:8080}:addr{addr=transparency.test.svc.cluster.local:80}:logical{dst.logical=transparency.test.svc.cluster.local:80}: linkerd2_app_core::profiles profile stream failed: Status { code: Unavailable, message: "unit test controller has no results" }
Starting at 4000 qps with 4 thread(s) [gomax 12] for 6s : 6000 calls each (total 24000)
INFO [     4.002118s] proxy{test=main:proxy}: trust_dns_proto::xfer::dns_exchange sending message via: UDP(127.0.0.53:53)
WARN [     4.003250s] proxy{test=main:proxy}:outbound:accept{peer.addr=127.0.0.1:37836}:source{target.addr=127.0.0.1:8080}:addr{addr=transparency.test.svc.cluster.local:80}:logical{dst.logical=transparency.test.svc.cluster.local:80}: linkerd2_app_core::profiles profile stream failed: Status { code: Unavailable, message: "unit test controller has no results" }
INFO [     4.005479s] proxy{test=main:proxy}: trust_dns_proto::xfer::dns_exchange sending message via: UDP(127.0.0.53:53)
INFO [     4.008436s] proxy{test=main:proxy}: trust_dns_proto::xfer::dns_exchange sending message via: UDP(127.0.0.53:53)
INFO [     4.011870s] proxy{test=main:proxy}: trust_dns_proto::xfer::dns_exchange sending message via: UDP(127.0.0.53:53)
WARN [     7.004271s] proxy{test=main:proxy}:outbound:accept{peer.addr=127.0.0.1:37836}:source{target.addr=127.0.0.1:8080}:addr{addr=transparency.test.svc.cluster.local:80}:logical{dst.logical=transparency.test.svc.cluster.local:80}: linkerd2_app_core::profiles profile stream failed: Status { code: Unavailable, message: "unit test controller has no results" }
23:55:47 W periodic.go:494> T002 warning only did 5985 out of 6000 calls before reaching 6s
23:55:47 W periodic.go:494> T003 warning only did 5985 out of 6000 calls before reaching 6s
23:55:47 I periodic.go:543> T003 ended after 6.000046265s : 5985 calls. qps=997.4923085030579
23:55:47 I periodic.go:543> T002 ended after 6.000041915s : 5985 calls. qps=997.4930316799296
23:55:47 W periodic.go:494> T001 warning only did 5987 out of 6000 calls before reaching 6s
23:55:47 I periodic.go:543> T001 ended after 6.000548905s : 5987 calls. qps=997.7420557328164
23:55:47 W periodic.go:494> T000 warning only did 5987 out of 6000 calls before reaching 6s
23:55:47 I periodic.go:543> T000 ended after 6.0006138s : 5987 calls. qps=997.7312654248803
Ended after 6.000621849s : 23944 calls. qps=3990.3
Aggregated Sleep Time : count 23944 avg -0.0082791303 +/- 0.004504 min -0.016038223 max 0.000248903 sum -198.235495
# range, mid point, percentile, count
>= -0.0160382 <= -0.001 , -0.00851911 , 93.41, 22365
> -0.001 <= 0 , -0.0005 , 97.35, 944
> 0 <= 0.000248903 , 0.000124452 , 100.00, 635
# target 50% -0.00798856
WARNING 93.41% of sleep were falling behind
Aggregated Function Time : count 23944 avg 0.0009749639 +/- 9.514e-05 min 0.000673107 max 0.003098227 sum 23.3445356
# range, mid point, percentile, count
>= 0.000673107 <= 0.001 , 0.000836554 , 64.67, 15484
> 0.001 <= 0.002 , 0.0015 , 99.95, 8449
> 0.002 <= 0.003 , 0.0025 , 100.00, 10
> 0.003 <= 0.00309823 , 0.00304911 , 100.00, 1
# target 50% 0.000925851
# target 75% 0.00129282
# target 90% 0.00171791
# target 99% 0.00197296
# target 99.9% 0.00199847
Sockets used: 23948 (for perfect keepalive, would be 4)
Jitter: false
Code 200 : 23944 (100.0 %)
Response Header Sizes : count 23944 avg 117 +/- 0 min 117 max 117 sum 2801448
Response Body/Total Sizes : count 23944 avg 317 +/- 0 min 317 max 317 sum 7590248
All done 23944 calls (plus 4 warmup) 0.975 ms avg, 3990.3 qps
Successfully wrote 2180 bytes of Json data to http1outbound_perf-4000-rps.2020Jan14_23h52m30s.json
INFO [     7.016513s] proxy{test=main:proxy}: trust_dns_proto::xfer::dns_exchange sending message via: UDP(127.0.0.53:53)
server Listening dropped; addr=127.0.0.1:8080
server Listening dropped; addr=127.0.0.1:8080
profiling/profiling-perf-fortio.sh: line 52: 19782 Terminated              $SERVER &> "$LOG"
[ perf record: Woken up 385 times to write data ]
[ perf record: Captured and wrote 96.829 MB perf.data (12002 samples) ]

The first few profile exercises appear to work okay; but the third test appears to fail (Note the terminated line). Nothing progresses after this for me...

hawkw · 2020-01-15T00:13:28Z

I'm pretty sure the "Terminated..." line is normal — when I run the benchmarks, I see that at the end of each run. I think this is just how the benchmark script kills the test server:

linkerd2-proxy/profiling/profiling-perf-fortio.sh

Line 107 in decd602

kill $SPID || ( echo "test server failed"; true )

Signed-off-by: Eliza Weisman <[email protected]>

olix0r · 2020-01-15T00:39:09Z

Ah, the test isn't hung indefinitely. It's just that some of the perf script invocations take an extremely log time to execute (15+ mins)

profiling/.gitignore

olix0r · 2020-01-15T03:38:11Z

@hawkw out of curiosity about how long does it take you to run a profile? The actual test invocations seem pretty quick, but the perf script invocation takes about 80 minutes!

-rw-rw-r-- 1 ver ver 3.6M Jan 15 02:07 profiling/flamegraph_grpcinbound_perf.2020Jan15_00h47m16s.svg

I wonder if (as a followup) we can parallelize the image generation after all of the profiles have been run?

Signed-off-by: Eliza Weisman <[email protected]>

hawkw · 2020-01-15T19:48:38Z

@olixor

I wonder if we can dockerize the test harness. Or we could use something like Pex; but it would be great to figure out how to bundle dependencies for things like plot.py (or replace with something else that we can bundle better)
Traceback (most recent call last):
  File "profiling/plot.py", line 2, in <module>
    import pandas as pd
ModuleNotFoundError: No module named 'pandas'

i've dockerized the plotting script in 88e84eb and added a bash script to run it in docker.

@hawkw out of curiosity about how long does it take you to run a profile? The actual test invocations seem pretty quick, but the perf script invocation takes about 80 minutes!
-rw-rw-r-- 1 ver ver 3.6M Jan 15 02:07 profiling/flamegraph_grpcinbound_perf.2020Jan15_00h47m16s.svg
I wonder if (as a followup) we can parallelize the image generation after all of the profiles have been run?

running a profiling run on ares, i'll get back to you on how long it takes when it finishes! running perf is definitely not fast though, the benchmark run without perf is pretty quick in my experience.

Signed-off-by: Eliza Weisman <[email protected]>

hawkw · 2020-02-11T00:32:54Z

Okay, I think this is ready for another (hopefully final!) look — I've fixed a couple of last bugs with running the tests in Docker, and updated the README.

linkerd/app/integration/src/proxy.rs

Signed-off-by: Eliza Weisman <[email protected]>

olix0r · 2020-02-14T22:10:42Z

Hmmm... when I try to run either benchmark.sh or profiling-heap.sh, I don't actually see a proxy get started, and both hang.

Also, is it not possible to run these scripts from the repo root? It looks like it's intended to work but I get some errors like:

:; profiling/benchmark.sh
    Creating ../target/profile/2020Feb14_22h10m27s
    Starting Benchmark eliza-benchmark-and-profiling 2020Feb14_22h10m27s Iter: 1 Dur: 10s Conns: 4 Streams: 4
profiling/benchmark.sh: line 17: ../target/profile/2020Feb14_22h10m27s/summary.txt: No such file or directory
     Running TCP outbound

Signed-off-by: Eliza Weisman <[email protected]>

hawkw · 2020-02-14T23:02:21Z

@olix0r I believe my most recent changes have fixed the issues with proxies not starting on Linux (due to permissions for ssh keys 🙃) + the path wackiness.

Signed-off-by: Eliza Weisman <[email protected]>

kleimkuhler

This looks good to me! There has been a lot of discussion already, but from the benchmark.sh and profiling-heap.sh scripts on my machine (Mac) everything works as expected from following the directions in the README.

I confirmed with @hawkw that running profiling-perf.sh is probably going to result in errors due to perf not being available.

I was able to analyze the .svgs and confirmed everything cleans up well after.

kleimkuhler · 2020-02-19T01:24:46Z

profiling/README.md

+For example:
+
+```console
+$ ITERATIONS=2 DURATION=2s CONNECTIONS=2 GRPC_STREAMS=2 HTTP_RPS="100" GRPC_RPS="100 1000" REQ_BODY_LEN="100 8000" ./benchmark-cargo-test-fortio.sh


This should now be .. ./benchmark.sh

This release includes the results from continued profiling & performance analysis. In addition to modifying internals to prevent unwarranted memory growth, we've introduced new metrics to aid in debugging and diagnostics: a new `request_errors_total` metric exposes the number of requests that receive synthesized responses due to proxy errors; and a suite of `stack_*` metrics expose proxy internals that can help us identify unexpected behavior. --- * trace: update `tracing-subscriber` dependency to 0.2.1 (linkerd/linkerd2-proxy#426) * Reimplement the Lock middleware with tokio::sync (linkerd/linkerd2-proxy#427) * Add the request_errors_total metric (linkerd/linkerd2-proxy#417) * Expose the number of service instances in the proxy (linkerd/linkerd2-proxy#428) * concurrency-limit: Share a limit across Services (linkerd/linkerd2-proxy#429) * profiling: add benchmark and profiling scripts (linkerd/linkerd2-proxy#406) * http-box: Box HTTP payloads via middleware (linkerd/linkerd2-proxy#430) * lock: Generalize to protect a guarded value (linkerd/linkerd2-proxy#431)

pothos and others added 6 commits January 8, 2020 16:02

post rebase fixup

704444f

Signed-off-by: Eliza Weisman <[email protected]>

make profiling proxy a bin rather than a test

aef1f57

Signed-off-by: Eliza Weisman <[email protected]>

fix paths in build script

91b85c1

Signed-off-by: Eliza Weisman <[email protected]>

should actually work now

decd602

Signed-off-by: Eliza Weisman <[email protected]>

hawkw requested a review from olix0r January 13, 2020 21:50

put profiling executable in a separate crate

f014e0a

Signed-off-by: Eliza Weisman <[email protected]>

olix0r reviewed Jan 15, 2020

View reviewed changes

profiling/.gitignore Show resolved Hide resolved

hawkw added 2 commits January 15, 2020 11:34

dockerize plotting script

88e84eb

Signed-off-by: Eliza Weisman <[email protected]>

add make clean-profile

99b8d8c

Signed-off-by: Eliza Weisman <[email protected]>

hawkw added 3 commits February 10, 2020 13:42

fix fortio/memory_profiler server port collison

0c0c775

Signed-off-by: Eliza Weisman <[email protected]>

make bash script slightly more consistent

e6d9c14

rename things & cleanup

7b05add

Signed-off-by: Eliza Weisman <[email protected]>

hawkw requested a review from adleong February 11, 2020 00:32

olix0r reviewed Feb 11, 2020

View reviewed changes

linkerd/app/integration/src/proxy.rs Outdated Show resolved Hide resolved

hawkw added 3 commits February 11, 2020 13:32

rm commented-out code

13fc897

Signed-off-by: Eliza Weisman <[email protected]>

make rustflags an arg

87b0893

Signed-off-by: Eliza Weisman <[email protected]>

fix missing proxy pubkey

56f0bf3

Signed-off-by: Eliza Weisman <[email protected]>

hawkw added 6 commits February 14, 2020 14:14

chmod authorized-keys

7f8150f

Signed-off-by: Eliza Weisman <[email protected]>

asdf

a115491

Signed-off-by: Eliza Weisman <[email protected]>

kill me

0720272

Signed-off-by: Eliza Weisman <[email protected]>

fix sshd permissions sadness

afdb730

Signed-off-by: Eliza Weisman <[email protected]>

aghghhghghg

5642909

Signed-off-by: Eliza Weisman <[email protected]>

fix scripts not working if run outside profiling

1e1d9c9

Signed-off-by: Eliza Weisman <[email protected]>

fix wrong qutoes in globs

409547c

Signed-off-by: Eliza Weisman <[email protected]>

olix0r approved these changes Feb 17, 2020

View reviewed changes

add error messages to perf script

2e33c75

Signed-off-by: Eliza Weisman <[email protected]>

hawkw force-pushed the eliza/benchmark-and-profiling branch from 1b8b870 to 2e33c75 Compare February 17, 2020 23:29

olix0r requested a review from a team February 18, 2020 17:13

kleimkuhler approved these changes Feb 19, 2020

View reviewed changes

hawkw added 2 commits February 19, 2020 10:52

fix renamed benchmark.sh

f5ccf91

Merge branch 'master' into eliza/benchmark-and-profiling

0a64630

hawkw merged commit a5e168e into master Feb 19, 2020

olix0r deleted the eliza/benchmark-and-profiling branch February 19, 2020 21:11

olix0r mentioned this pull request Feb 19, 2020

proxy: v2.86.0 linkerd/linkerd2#4075

Merged

Conversation

hawkw commented Jan 13, 2020

Uh oh!

hawkw commented Jan 13, 2020

Uh oh!

olix0r commented Jan 14, 2020

Uh oh!

olix0r commented Jan 14, 2020

Uh oh!

olix0r commented Jan 14, 2020

Uh oh!

hawkw commented Jan 14, 2020

Uh oh!

olix0r commented Jan 14, 2020 via email

Uh oh!

olix0r commented Jan 14, 2020

Uh oh!

hawkw commented Jan 14, 2020

Uh oh!

olix0r commented Jan 14, 2020 via email • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

olix0r commented Jan 14, 2020

Uh oh!

hawkw commented Jan 14, 2020

Uh oh!

olix0r commented Jan 14, 2020

Uh oh!

hawkw commented Jan 14, 2020

Uh oh!

olix0r commented Jan 14, 2020

Uh oh!

olix0r commented Jan 14, 2020

Uh oh!

hawkw commented Jan 15, 2020

Uh oh!

olix0r commented Jan 15, 2020

Uh oh!

Uh oh!

olix0r commented Jan 15, 2020

Uh oh!

hawkw commented Jan 15, 2020

Uh oh!

hawkw commented Feb 11, 2020

Uh oh!

Uh oh!

olix0r commented Feb 14, 2020

Uh oh!

hawkw commented Feb 14, 2020

Uh oh!

kleimkuhler left a comment

Choose a reason for hiding this comment

Uh oh!

kleimkuhler Feb 19, 2020

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

olix0r commented Jan 14, 2020 via email •

edited

Loading