Skip to content

profiling: add benchmark and profiling scripts#406

Merged
hawkw merged 64 commits intomasterfrom
eliza/benchmark-and-profiling
Feb 19, 2020
Merged

profiling: add benchmark and profiling scripts#406
hawkw merged 64 commits intomasterfrom
eliza/benchmark-and-profiling

Conversation

@hawkw
Copy link
Contributor

@hawkw hawkw commented Jan 13, 2020

This is essentially @pothos' PR #278, but rebased & updated to work with
the current master. In addition, I've changed the profiling proxy to be
run as a separate bin target (run with cargo run --bin profile) rather
than a test case that does nothing on most test runs and runs the proxy
in profiling mode when an env var is set.

Description from the original PR:

A benchmark/profiling script for local development
or a CI helps to catch performance regressions early
and find spots for optimization.

The benchmark setup consists of a cargo test
that reuses the test infrastructure to forward
localhost connections. This test is skipped by
default unless an env var is set.
The benchmark load comes from a fortio server
and client for HTTP/gRPC req/s latency measurement
and from an iperf server and client for TCP throughput
measurement.
In addition to the fortio UI to inspect the benchmark
data, the results are also stored to a summary text file
which can be used to plot the difference of the summary
results of, e.g., two git branches.

The profiling setup is the same as above but also
runs "perf" or "memory-profilier" to sample the
call stacks at either runtime or on heap allocation
calls. This requires a special debug build with
optimizations, that can be generated with a build script.
The results can be inspected as interactive flamegraph
SVGs in the browser.

Please follow the instructions in the profiling/README.md
file on how the scripts are used.

Signed-off-by: Kai Lüke [email protected]

Closes #278.

Signed-off-by: Eliza Weisman [email protected]
Co-authored-by: Kai Lüke [email protected]

pothos and others added 6 commits January 8, 2020 16:02
A benchmark/profiling script for local development
or a CI helps to catch performance regressions early
and find spots for optimization.

The benchmark setup consists of a cargo test
that reuses the test infrastructure to forward
localhost connections. This test is skipped by
default unless an env var is set.
The benchmark load comes from a fortio server
and client for HTTP/gRPC req/s latency measurement
and from an iperf server and client for TCP throughput
measurement.
In addition to the fortio UI to inspect the benchmark
data, the results are also stored to a summary text file
which can be used to plot the difference of the summary
results of, e.g., two git branches.

The profiling setup is the same as above but also
runs "perf" or "memory-profilier" to sample the
call stacks at either runtime or on heap allocation
calls. This requires a special debug build with
optimizations, that can be generated with a build script.
The results can be inspected as interactive flamegraph
SVGs in the browser.

Please follow the instructions in the profiling/README.md
file on how the scripts are used.

Signed-off-by: Kai Lüke <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Since the profiling binary is no longer run as a Rust test, we don't
need to silently skip it when the required env var is unset. Instead, we
can actually print that we expected it to be set and exit early.

Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
@hawkw hawkw requested a review from olix0r January 13, 2020 21:50
@hawkw
Copy link
Contributor Author

hawkw commented Jan 13, 2020

I've done some manual testing of the profiling scripts added in this branch, and confirmed that everything seems to work — I've even generated some potentially interesting flamegraphs.

@olix0r
Copy link
Member

olix0r commented Jan 14, 2020

@hawkw Is it feasible to put the binary in its own crate (i.e. linkerd/app/profile) so that we don't have to compile the binary when building the integration tests?

@olix0r
Copy link
Member

olix0r commented Jan 14, 2020

The profiling script needs to ensure that iperf is available and fail if it's not. Currently, the command will just stall and emit nothing, which is difficult to debug.

@olix0r
Copy link
Member

olix0r commented Jan 14, 2020

Trying to invoke fortio, the profile script hangs. Our invocation appears to use a deprecated flag:

:; fortio load -resolve 127.0.0.1 -c=4 -qps=4000 -t=10s -payload-size=10 '-labels=eliza-benchmark-and-profiling 2020Jan14_19h28m54s Iter: 1 Dur: 10s Conns: 4 Streams: 4' -json=http1outbound_bench-4000-rps.2020Jan14_19h28m54s.json -keepalive=false -H 'Host: transparency.test.svc.cluster.local' localhost:4140
flag provided but not defined: -resolve

@hawkw
Copy link
Contributor Author

hawkw commented Jan 14, 2020

Trying to invoke fortio, the profile script hangs. Our invocation appears to use a deprecated flag:

:; fortio load -resolve 127.0.0.1 -c=4 -qps=4000 -t=10s -payload-size=10 '-labels=eliza-benchmark-and-profiling 2020Jan14_19h28m54s Iter: 1 Dur: 10s Conns: 4 Streams: 4' -json=http1outbound_bench-4000-rps.2020Jan14_19h28m54s.json -keepalive=false -H 'Host: transparency.test.svc.cluster.local' localhost:4140
flag provided but not defined: -resolve

hmm, this worked for me on ares — what version of fortio do you have installed? on ares, I see:

eliza@ares:~$ fortio -version
1.3.2-pre unknown go1.10.2

which i suspect is probably old?

@olix0r
Copy link
Member

olix0r commented Jan 14, 2020 via email

@olix0r
Copy link
Member

olix0r commented Jan 14, 2020

I wonder if we can dockerize the test harness. Or we could use something like Pex; but it would be great to figure out how to bundle dependencies for things like plot.py (or replace with something else that we can bundle better)

Traceback (most recent call last):
  File "profiling/plot.py", line 2, in <module>
    import pandas as pd
ModuleNotFoundError: No module named 'pandas'

@hawkw
Copy link
Contributor Author

hawkw commented Jan 14, 2020

I wonder if we can dockerize the test harness. Or we could use something like Pex; but it would be great to figure out how to bundle dependencies for things like plot.py (or replace with something else that we can bundle better)

yeah, I'm trying to figure out a nicer solution for that...i don't know a whole lot about Python & how to bundle python deps with a script, but I'm doing some research.

@olix0r
Copy link
Member

olix0r commented Jan 14, 2020 via email

@olix0r
Copy link
Member

olix0r commented Jan 14, 2020

That segfault was a config issue on my host. Seems to work properly after fixing it.

@hawkw
Copy link
Contributor Author

hawkw commented Jan 14, 2020

@olix0r which, if any, of these changes would you prefer we make on this branch, and which ones can be done in follow-up PRs?

BTW, I've made some changes to the plot script in a second branch, it now produces error bars based on the (fortio-reported) standard deviation.

@olix0r
Copy link
Member

olix0r commented Jan 14, 2020

@hawkw Basically, I care that we know how to run these scripts on arbitrary linux hosts. I'm still having some problems running the profiling scripts successfully. I'll need to dig in a little more to understand what's wrong. Once I've run these scripts on my laptop, I think we can merge and address feedback in followups.

@hawkw
Copy link
Contributor Author

hawkw commented Jan 14, 2020

Okay, great. I'm working on dockerizing the plotting script...but I don't think that will help with the profiling scripts, given that perf needs to be installed on the host...

@olix0r
Copy link
Member

olix0r commented Jan 14, 2020

@hawkw i suspect that we could dockerize perf/iperf etc if we gave the container SYS_ADMIN privileges. Not a hard blocker but I think it's worth investigating

@olix0r
Copy link
Member

olix0r commented Jan 14, 2020

:; HIDE=0 RUST_BACKTRACE=1 RUST_LOG=info profiling/profiling-perf-fortio.sh
File marker eliza-benchmark-and-profiling 2020Jan14_23h52m30s Iter: 1 Dur: 6s Conns: 4 Streams: 4
------------------------------------------------------------
Server listening on TCP port 8080
TCP window size:  128 KByte (default)
------------------------------------------------------------
Couldn't record kernel reference relocation symbol
Symbol resolution may be skewed if relocation was used (e.g. kexec).
Check /proc/kallsyms permission or run as root.
support destination service listening; addr=V4(127.0.0.1:41155)
proxy running; tap=, identity=None, inbound=127.0.0.1:4143 (SO_ORIGINAL_DST=127.0.0.1:8080), outbound=127.0.0.1:4140 (SO_ORIGINAL_DST=127.0.0.1:8080), metrics=127.0.0.1:4191
INFO [     0.002500s] proxy{test=main:proxy}:inbound: linkerd2_app_inbound serving listen.addr=127.0.0.1:4143
LISTEN  0        128                0.0.0.0:8080                0.0.0.0:*
LISTEN  0        128              127.0.0.1:4140                0.0.0.0:*
TCP outbound
------------------------------------------------------------
Client connecting to 127.0.0.1, TCP port 4140
TCP window size: 2.50 MByte (default)
------------------------------------------------------------
[  3] local 127.0.0.1 port 37816 connected with 127.0.0.1 port 4140
[  4] local 127.0.0.1 port 8080 connected with 127.0.0.1 port 58680
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 6.0 sec  3.49 GBytes  5.00 Gbits/sec
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0- 6.0 sec  3.49 GBytes  4.99 Gbits/sec
server Listening dropped; addr=127.0.0.1:8080
server Listening dropped; addr=127.0.0.1:8080
[ perf record: Woken up 392 times to write data ]
LISTEN     0        128               0.0.0.0:8080              0.0.0.0:*
[ perf record: Captured and wrote 98.390 MB perf.data (12038 samples) ]
------------------------------------------------------------
Server listening on TCP port 8080
TCP window size:  128 KByte (default)
------------------------------------------------------------
Couldn't record kernel reference relocation symbol
Symbol resolution may be skewed if relocation was used (e.g. kexec).
Check /proc/kallsyms permission or run as root.
support destination service listening; addr=V4(127.0.0.1:48813)
proxy running; tap=, identity=None, inbound=127.0.0.1:4143 (SO_ORIGINAL_DST=127.0.0.1:8080), outbound=127.0.0.1:4140 (SO_ORIGINAL_DST=127.0.0.1:8080), metrics=127.0.0.1:4191
INFO [     0.003139s] proxy{test=main:proxy}:inbound: linkerd2_app_inbound serving listen.addr=127.0.0.1:4143
LISTEN     0        128               0.0.0.0:8080              0.0.0.0:*
LISTEN     0        128             127.0.0.1:4143              0.0.0.0:*
TCP inbound
------------------------------------------------------------
Client connecting to 127.0.0.1, TCP port 4143
TCP window size: 2.50 MByte (default)
------------------------------------------------------------
[  3] local 127.0.0.1 port 41582 connected with 127.0.0.1 port 4143
[  4] local 127.0.0.1 port 8080 connected with 127.0.0.1 port 58694
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 6.0 sec  3.81 GBytes  5.45 Gbits/sec
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0- 6.0 sec  3.81 GBytes  5.44 Gbits/sec
server Listening dropped; addr=127.0.0.1:8080
server Listening dropped; addr=127.0.0.1:8080
[ perf record: Woken up 393 times to write data ]
LISTEN     0        128               0.0.0.0:8080              0.0.0.0:*
[ perf record: Captured and wrote 98.621 MB perf.data (12076 samples) ]
Fortio 1.3.2-pre grpc 'ping' server listening on [::]:8079
Fortio 1.3.2-pre https redirector server listening on [::]:8081
Fortio 1.3.2-pre echo server listening on [::]:8080
23:55:40 W uihandler.go:877> Adding missing trailing / to UI path ''''
Data directory is /home/ver/b/l2-proxy/profiling
UI started - visit:
http://localhost:8080''/
(or any host/ip reachable on this server)
23:55:40 I fortio_main.go:214> All fortio 1.3.2-pre unknown go1.13.5 servers started!
Couldn't record kernel reference relocation symbol
Symbol resolution may be skewed if relocation was used (e.g. kexec).
Check /proc/kallsyms permission or run as root.
support destination service listening; addr=V4(127.0.0.1:60897)
proxy running; tap=, identity=None, inbound=127.0.0.1:4143 (SO_ORIGINAL_DST=127.0.0.1:8080), outbound=127.0.0.1:4140 (SO_ORIGINAL_DST=127.0.0.1:8080), metrics=127.0.0.1:4191
INFO [     0.002005s] proxy{test=main:proxy}:inbound: linkerd2_app_inbound serving listen.addr=127.0.0.1:4143
LISTEN  0        128                       *:8080                     *:*
LISTEN  0        128               127.0.0.1:4140               0.0.0.0:*
HTTP outbound Iteration: 1 RPS: 4000 REQ_BODY_LEN: 200
Fortio 1.3.2-pre running at 4000 queries per second, 12->12 procs, for 6s: localhost:4140
23:55:41 I httprunner.go:82> Starting http test for localhost:4140 with 4 threads at 4000.0 qps
23:55:41 W http_client.go:143> Assuming http:// on missing scheme for 'localhost:4140'
INFO [     0.979158s] proxy{test=main:proxy}: trust_dns_proto::xfer::dns_exchange sending message via: UDP(127.0.0.53:53)
INFO [     0.984403s] proxy{test=main:proxy}: trust_dns_proto::xfer::dns_exchange sending message via: UDP(127.0.0.53:53)
INFO [     0.990248s] proxy{test=main:proxy}: trust_dns_proto::xfer::dns_exchange sending message via: UDP(127.0.0.53:53)
INFO [     0.995758s] proxy{test=main:proxy}: trust_dns_proto::xfer::dns_exchange sending message via: UDP(127.0.0.53:53)
WARN [     1.001069s] proxy{test=main:proxy}:outbound:accept{peer.addr=127.0.0.1:37836}:source{target.addr=127.0.0.1:8080}:addr{addr=transparency.test.svc.cluster.local:80}: linkerd2_proxy_http::canonicalize failed to refine transparency.test.svc.cluster.local: no record found for name: transparency.test.svc.cluster.local.buoyant.int. type: AAAA class: IN; using original name
WARN [     1.002586s] proxy{test=main:proxy}:outbound:accept{peer.addr=127.0.0.1:37836}:source{target.addr=127.0.0.1:8080}:addr{addr=transparency.test.svc.cluster.local:80}:logical{dst.logical=transparency.test.svc.cluster.local:80}: linkerd2_app_core::profiles profile stream failed: Status { code: Unavailable, message: "unit test controller has no results" }
Starting at 4000 qps with 4 thread(s) [gomax 12] for 6s : 6000 calls each (total 24000)
INFO [     4.002118s] proxy{test=main:proxy}: trust_dns_proto::xfer::dns_exchange sending message via: UDP(127.0.0.53:53)
WARN [     4.003250s] proxy{test=main:proxy}:outbound:accept{peer.addr=127.0.0.1:37836}:source{target.addr=127.0.0.1:8080}:addr{addr=transparency.test.svc.cluster.local:80}:logical{dst.logical=transparency.test.svc.cluster.local:80}: linkerd2_app_core::profiles profile stream failed: Status { code: Unavailable, message: "unit test controller has no results" }
INFO [     4.005479s] proxy{test=main:proxy}: trust_dns_proto::xfer::dns_exchange sending message via: UDP(127.0.0.53:53)
INFO [     4.008436s] proxy{test=main:proxy}: trust_dns_proto::xfer::dns_exchange sending message via: UDP(127.0.0.53:53)
INFO [     4.011870s] proxy{test=main:proxy}: trust_dns_proto::xfer::dns_exchange sending message via: UDP(127.0.0.53:53)
WARN [     7.004271s] proxy{test=main:proxy}:outbound:accept{peer.addr=127.0.0.1:37836}:source{target.addr=127.0.0.1:8080}:addr{addr=transparency.test.svc.cluster.local:80}:logical{dst.logical=transparency.test.svc.cluster.local:80}: linkerd2_app_core::profiles profile stream failed: Status { code: Unavailable, message: "unit test controller has no results" }
23:55:47 W periodic.go:494> T002 warning only did 5985 out of 6000 calls before reaching 6s
23:55:47 W periodic.go:494> T003 warning only did 5985 out of 6000 calls before reaching 6s
23:55:47 I periodic.go:543> T003 ended after 6.000046265s : 5985 calls. qps=997.4923085030579
23:55:47 I periodic.go:543> T002 ended after 6.000041915s : 5985 calls. qps=997.4930316799296
23:55:47 W periodic.go:494> T001 warning only did 5987 out of 6000 calls before reaching 6s
23:55:47 I periodic.go:543> T001 ended after 6.000548905s : 5987 calls. qps=997.7420557328164
23:55:47 W periodic.go:494> T000 warning only did 5987 out of 6000 calls before reaching 6s
23:55:47 I periodic.go:543> T000 ended after 6.0006138s : 5987 calls. qps=997.7312654248803
Ended after 6.000621849s : 23944 calls. qps=3990.3
Aggregated Sleep Time : count 23944 avg -0.0082791303 +/- 0.004504 min -0.016038223 max 0.000248903 sum -198.235495
# range, mid point, percentile, count
>= -0.0160382 <= -0.001 , -0.00851911 , 93.41, 22365
> -0.001 <= 0 , -0.0005 , 97.35, 944
> 0 <= 0.000248903 , 0.000124452 , 100.00, 635
# target 50% -0.00798856
WARNING 93.41% of sleep were falling behind
Aggregated Function Time : count 23944 avg 0.0009749639 +/- 9.514e-05 min 0.000673107 max 0.003098227 sum 23.3445356
# range, mid point, percentile, count
>= 0.000673107 <= 0.001 , 0.000836554 , 64.67, 15484
> 0.001 <= 0.002 , 0.0015 , 99.95, 8449
> 0.002 <= 0.003 , 0.0025 , 100.00, 10
> 0.003 <= 0.00309823 , 0.00304911 , 100.00, 1
# target 50% 0.000925851
# target 75% 0.00129282
# target 90% 0.00171791
# target 99% 0.00197296
# target 99.9% 0.00199847
Sockets used: 23948 (for perfect keepalive, would be 4)
Jitter: false
Code 200 : 23944 (100.0 %)
Response Header Sizes : count 23944 avg 117 +/- 0 min 117 max 117 sum 2801448
Response Body/Total Sizes : count 23944 avg 317 +/- 0 min 317 max 317 sum 7590248
All done 23944 calls (plus 4 warmup) 0.975 ms avg, 3990.3 qps
Successfully wrote 2180 bytes of Json data to http1outbound_perf-4000-rps.2020Jan14_23h52m30s.json
INFO [     7.016513s] proxy{test=main:proxy}: trust_dns_proto::xfer::dns_exchange sending message via: UDP(127.0.0.53:53)
server Listening dropped; addr=127.0.0.1:8080
server Listening dropped; addr=127.0.0.1:8080
profiling/profiling-perf-fortio.sh: line 52: 19782 Terminated              $SERVER &> "$LOG"
[ perf record: Woken up 385 times to write data ]
[ perf record: Captured and wrote 96.829 MB perf.data (12002 samples) ]

The first few profile exercises appear to work okay; but the third test appears to fail (Note the terminated line). Nothing progresses after this for me...

@hawkw
Copy link
Contributor Author

hawkw commented Jan 15, 2020

I'm pretty sure the "Terminated..." line is normal — when I run the benchmarks, I see that at the end of each run. I think this is just how the benchmark script kills the test server:

kill $SPID || ( echo "test server failed"; true )

@olix0r
Copy link
Member

olix0r commented Jan 15, 2020

Ah, the test isn't hung indefinitely. It's just that some of the perf script invocations take an extremely log time to execute (15+ mins)

@olix0r
Copy link
Member

olix0r commented Jan 15, 2020

@hawkw out of curiosity about how long does it take you to run a profile? The actual test invocations seem pretty quick, but the perf script invocation takes about 80 minutes!

-rw-rw-r-- 1 ver ver 3.6M Jan 15 02:07 profiling/flamegraph_grpcinbound_perf.2020Jan15_00h47m16s.svg

I wonder if (as a followup) we can parallelize the image generation after all of the profiles have been run?

hawkw added 2 commits January 15, 2020 11:34
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
@hawkw
Copy link
Contributor Author

hawkw commented Jan 15, 2020

@olixor

I wonder if we can dockerize the test harness. Or we could use something like Pex; but it would be great to figure out how to bundle dependencies for things like plot.py (or replace with something else that we can bundle better)

Traceback (most recent call last):
  File "profiling/plot.py", line 2, in <module>
    import pandas as pd
ModuleNotFoundError: No module named 'pandas'

i've dockerized the plotting script in 88e84eb and added a bash script to run it in docker.

@hawkw out of curiosity about how long does it take you to run a profile? The actual test invocations seem pretty quick, but the perf script invocation takes about 80 minutes!

-rw-rw-r-- 1 ver ver 3.6M Jan 15 02:07 profiling/flamegraph_grpcinbound_perf.2020Jan15_00h47m16s.svg

I wonder if (as a followup) we can parallelize the image generation after all of the profiles have been run?

running a profiling run on ares, i'll get back to you on how long it takes when it finishes! running perf is definitely not fast though, the benchmark run without perf is pretty quick in my experience.

@hawkw hawkw requested a review from adleong February 11, 2020 00:32
@hawkw
Copy link
Contributor Author

hawkw commented Feb 11, 2020

Okay, I think this is ready for another (hopefully final!) look — I've fixed a couple of last bugs with running the tests in Docker, and updated the README.

Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
@olix0r
Copy link
Member

olix0r commented Feb 14, 2020

Hmmm... when I try to run either benchmark.sh or profiling-heap.sh, I don't actually see a proxy get started, and both hang.

Also, is it not possible to run these scripts from the repo root? It looks like it's intended to work but I get some errors like:

:; profiling/benchmark.sh
    Creating ../target/profile/2020Feb14_22h10m27s
    Starting Benchmark eliza-benchmark-and-profiling 2020Feb14_22h10m27s Iter: 1 Dur: 10s Conns: 4 Streams: 4
profiling/benchmark.sh: line 17: ../target/profile/2020Feb14_22h10m27s/summary.txt: No such file or directory
     Running TCP outbound

Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
@hawkw
Copy link
Contributor Author

hawkw commented Feb 14, 2020

@olix0r I believe my most recent changes have fixed the issues with proxies not starting on Linux (due to permissions for ssh keys 🙃) + the path wackiness.

Signed-off-by: Eliza Weisman <[email protected]>
@hawkw hawkw force-pushed the eliza/benchmark-and-profiling branch from 1b8b870 to 2e33c75 Compare February 17, 2020 23:29
@olix0r olix0r requested a review from a team February 18, 2020 17:13
Copy link
Contributor

@kleimkuhler kleimkuhler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! There has been a lot of discussion already, but from the benchmark.sh and profiling-heap.sh scripts on my machine (Mac) everything works as expected from following the directions in the README.

I confirmed with @hawkw that running profiling-perf.sh is probably going to result in errors due to perf not being available.

I was able to analyze the .svgs and confirmed everything cleans up well after.

For example:

```console
$ ITERATIONS=2 DURATION=2s CONNECTIONS=2 GRPC_STREAMS=2 HTTP_RPS="100" GRPC_RPS="100 1000" REQ_BODY_LEN="100 8000" ./benchmark-cargo-test-fortio.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should now be .. ./benchmark.sh

@hawkw hawkw merged commit a5e168e into master Feb 19, 2020
@olix0r olix0r deleted the eliza/benchmark-and-profiling branch February 19, 2020 21:11
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Feb 19, 2020
This release includes the results from continued profiling & performance
analysis. In addition to modifying internals to prevent unwarranted
memory growth, we've introduced new metrics to aid in debugging and
diagnostics: a new `request_errors_total` metric exposes the number of
requests that receive synthesized responses due to proxy errors; and a
suite of `stack_*` metrics expose proxy internals that can help us
identify unexpected behavior.

---

* trace: update `tracing-subscriber` dependency to 0.2.1 (linkerd/linkerd2-proxy#426)
* Reimplement the Lock middleware with tokio::sync (linkerd/linkerd2-proxy#427)
* Add the request_errors_total metric (linkerd/linkerd2-proxy#417)
* Expose the number of service instances in the proxy (linkerd/linkerd2-proxy#428)
* concurrency-limit: Share a limit across Services (linkerd/linkerd2-proxy#429)
* profiling: add benchmark and profiling scripts (linkerd/linkerd2-proxy#406)
* http-box: Box HTTP payloads via middleware (linkerd/linkerd2-proxy#430)
* lock: Generalize to protect a guarded value (linkerd/linkerd2-proxy#431)
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Feb 19, 2020
This release includes the results from continued profiling & performance
analysis. In addition to modifying internals to prevent unwarranted
memory growth, we've introduced new metrics to aid in debugging and
diagnostics: a new `request_errors_total` metric exposes the number of
requests that receive synthesized responses due to proxy errors; and a
suite of `stack_*` metrics expose proxy internals that can help us
identify unexpected behavior.

---

* trace: update `tracing-subscriber` dependency to 0.2.1 (linkerd/linkerd2-proxy#426)
* Reimplement the Lock middleware with tokio::sync (linkerd/linkerd2-proxy#427)
* Add the request_errors_total metric (linkerd/linkerd2-proxy#417)
* Expose the number of service instances in the proxy (linkerd/linkerd2-proxy#428)
* concurrency-limit: Share a limit across Services (linkerd/linkerd2-proxy#429)
* profiling: add benchmark and profiling scripts (linkerd/linkerd2-proxy#406)
* http-box: Box HTTP payloads via middleware (linkerd/linkerd2-proxy#430)
* lock: Generalize to protect a guarded value (linkerd/linkerd2-proxy#431)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants