Extended redis-benchmark instant metrics and overall latency report by filipecosta90 · Pull Request #7600 · redis/redis

filipecosta90 · 2020-07-31T16:45:22Z

This PR is the first step to enable a consistent full percentile analysis on query latency so that we can fully understand the performance and stability characteristics of the redis-server system we are measuring. It also improves the instantaneous reported metrics, and the csv output format.
Apart from extending the reported metrics it should also reduce the overhead of keeping all latency samples on an array. We are moving from keeping track of all data, to keeping track of a data sketch by making usage of an HDR histogram. We've also micro-benchmarked insertions to ensure that this will not impact the benchmark himself ( adding a datapoint takes 6 ns, results here ).

1) changes to benchmark summary

Currently redis-benchmark outputs variable precision latency distributions that make it hard to compare multiple runs, given that there is no ensurance that we can retrieve fixed quantile information (we iterate over latency numbers and not quantiles leading to q50, q95, q99,etc... sometimes not being presented ).
Here is an example of the current summary output:

1.1) Current latency report

====== SET ======
  100000 requests completed in 1.06 seconds
  50 parallel clients
  3 bytes payload
  keep alive: 1
  host configuration "save": 
  host configuration "appendonly": no
  multi-thread: no

0.00% <= 0.1 milliseconds
0.00% <= 0.2 milliseconds
92.27% <= 0.3 milliseconds
99.05% <= 0.4 milliseconds
99.77% <= 0.5 milliseconds
99.96% <= 0.6 milliseconds
100.00% <= 0.7 milliseconds
100.00% <= 0.7 milliseconds
93984.96 requests per second

1.2) Proposed latency report

output a logarithmic percentile spectrum ( focusing at the high percentiles)
output one liner with minimum ( p0 ), p50, p95, p99, max (p100), and avg(just to be backwards compatible)
together with the latency histogram, output the cumulative count of samples so we can quickly relate both

====== SET ======                                                     
  100000 requests completed in 0.75 seconds
  50 parallel clients
  3 bytes payload
  keep alive: 1
  host configuration "save": 3600 1 300 100 60 10000
  host configuration "appendonly": no
  multi-thread: no

Latency by percentile distribution:
0.000% <= 0.119 milliseconds (cumulative count 1)
50.000% <= 0.175 milliseconds (cumulative count 57365)
75.000% <= 0.191 milliseconds (cumulative count 80250)
87.500% <= 0.215 milliseconds (cumulative count 88194)
93.750% <= 0.295 milliseconds (cumulative count 93964)
96.875% <= 0.447 milliseconds (cumulative count 96879)
98.438% <= 0.567 milliseconds (cumulative count 98465)
99.219% <= 0.679 milliseconds (cumulative count 99222)
99.609% <= 0.863 milliseconds (cumulative count 99617)
99.805% <= 0.999 milliseconds (cumulative count 99808)
99.902% <= 1.103 milliseconds (cumulative count 99910)
99.951% <= 2.135 milliseconds (cumulative count 99952)
99.976% <= 2.615 milliseconds (cumulative count 99976)
99.988% <= 2.839 milliseconds (cumulative count 99988)
99.994% <= 2.975 milliseconds (cumulative count 99994)
99.997% <= 3.031 milliseconds (cumulative count 99997)
99.998% <= 3.079 milliseconds (cumulative count 99999)
99.999% <= 3.095 milliseconds (cumulative count 100000)
100.000% <= 3.095 milliseconds (cumulative count 100000)

Cumulative distribution of latencies:
0.000% <= 0.103 milliseconds (cumulative count 0)
86.348% <= 0.207 milliseconds (cumulative count 86348)
94.222% <= 0.303 milliseconds (cumulative count 94222)
96.402% <= 0.407 milliseconds (cumulative count 96402)
97.752% <= 0.503 milliseconds (cumulative count 97752)
98.831% <= 0.607 milliseconds (cumulative count 98831)
99.324% <= 0.703 milliseconds (cumulative count 99324)
99.504% <= 0.807 milliseconds (cumulative count 99504)
99.689% <= 0.903 milliseconds (cumulative count 99689)
99.816% <= 1.007 milliseconds (cumulative count 99816)
99.910% <= 1.103 milliseconds (cumulative count 99910)
99.942% <= 1.207 milliseconds (cumulative count 99942)
99.948% <= 1.303 milliseconds (cumulative count 99948)
99.949% <= 1.407 milliseconds (cumulative count 99949)
99.950% <= 1.503 milliseconds (cumulative count 99950)
99.951% <= 2.103 milliseconds (cumulative count 99951)
100.000% <= 3.103 milliseconds (cumulative count 100000)

Summary:
  throughput summary: 132802.12 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        0.195     0.112     0.175     0.343     0.631     3.095

2) changes to the reported instant metrics

The current output of while running a benchmark can be misleading given that it presents the overall average ops/sec. Right aside to the overall ops/sec we output the instantaneous ops/sec. In addition to it, we output also the average overall and instantaneous latency.

2.1) current output while running benchmark

SET: 95375.08

2.2) Proposed output while running benchmark

SET: rps=154504.0 (overall: 152518.7) avg_msec=0.159 (overall: 0.159)

3) changes to the csv output

following the proposed changes to the overall latency report we've added the latency metrics ( and consequently a header given that we have now more than just ops/sec on the csv ).

3.1) current csv output

redis-benchmark --csv
"PING_INLINE","94161.95"
"PING_BULK","93632.96"
"SET","93720.71"
"GET","94339.62"
"INCR","94339.62"
"LPUSH","94073.38"
"RPUSH","94786.73"
"LPOP","94696.97"
"RPOP","94339.62"
"SADD","94517.96"
"HSET","94161.95"
"SPOP","94250.71"
"LPUSH (needed to benchmark LRANGE)","95328.88"
"LRANGE_100 (first 100 elements)","60753.34"
"LRANGE_300 (first 300 elements)","27314.94"
"LRANGE_500 (first 450 elements)","20820.32"
"LRANGE_600 (first 600 elements)","16716.82"
"MSET (10 keys)","104058.27"

3.3) Proposed csv output

redis-benchmark --csv
"test","rps","avg_latency_ms","min_latency_ms","p50_latency_ms","p95_latency_ms","p99_latency_ms","max_latency_ms"
"PING_INLINE","131752.31","0.196","0.112","0.183","0.279","0.639","1.863"
"PING_BULK","125313.29","0.208","0.080","0.199","0.295","0.575","5.367"
"SET","135501.36","0.191","0.104","0.191","0.247","0.359","1.655"
"GET","141242.94","0.182","0.112","0.175","0.231","0.375","1.207"
"INCR","129701.68","0.199","0.104","0.183","0.335","0.423","3.351"
"LPUSH","135135.14","0.194","0.112","0.191","0.247","0.431","2.695"
"RPUSH","136798.91","0.190","0.088","0.183","0.247","0.367","1.567"
"LPOP","159489.64","0.171","0.080","0.159","0.271","0.551","1.583"
"RPOP","139470.02","0.186","0.112","0.183","0.239","0.359","1.503"
"SADD","134770.89","0.192","0.104","0.191","0.231","0.343","2.311"
"HSET","138504.16","0.188","0.088","0.183","0.231","0.327","1.303"
"SPOP","142247.52","0.183","0.096","0.175","0.239","0.383","1.183"
"LPUSH (needed to benchmark LRANGE)","129032.27","0.201","0.096","0.199","0.247","0.367","1.871"
"LRANGE_100 (first 100 elements)","66844.91","0.384","0.248","0.367","0.487","0.751","5.255"
"LRANGE_300 (first 300 elements)","23218.02","1.093","0.320","0.975","1.807","2.631","8.103"
"LRANGE_500 (first 450 elements)","12818.87","1.968","0.520","1.727","3.511","4.519","13.839"
"LRANGE_600 (first 600 elements)","14015.42","1.784","0.304","1.703","2.671","3.199","14.167"
"MSET (10 keys)","121065.38","0.339","0.096","0.327","0.495","0.631","1.055"

…benchmark latency metrics reported summary

oranagra · 2020-08-01T13:05:26Z

thanks @filipecosta90
LGTM. (failed test is timing issue of a new test i added, fix in #7604)

@redis/core-team this PR adds another (small) dependency, solely for redis-benchmark. I don't like to bloat the project just for the benchmark, but on the other hand the benchmark has little value if it's not accurate or doesn't provide enough info, so it should either be extracted to a different repo, or improved inside redis.
in anyway, while improving it, there's no sense to write everything from scratch if there's something that fits just what we need and doesn't include much more extra.

the other concern is backwards compatibility of the output, but i guess that's not a big concern.
if someone has a script that relies on the output, he'll need to adjust.
An alternative would be to keep supporting the old output and add some switch, i suppose we can do that using the new implementation which will just emit a format similar to the old one, but i'm not sure it worth it.

please share your thoughts.

madolson · 2020-08-04T01:18:03Z

I don't think redis-benchmark is generally considered to be on the same level as something like memtier, which is a much better benchmarking tool. I would not be inclined to accept this if we intend to continue using redis-benchmark as just an easy adhoc tool that comes out of the box. If we want to start using it more for validating the performance of Redis, I think something like this would be justified.

Also, anecdotally I have known users who have "latency goals", and so they care less about the percentile response you are proposing and more of what percentage of requests will hit them. So I think we should definitely include a backwards compatible version too.

filipecosta90 · 2020-08-04T09:06:06Z

I don't think redis-benchmark is generally considered to be on the same level as something like memtier, which is a much better benchmarking tool. I would not be inclined to accept this if we intend to continue using redis-benchmark as just an easy adhoc tool that comes out of the box. If we want to start using it more for validating the performance of Redis, I think something like this would be justified.

@madolson I believe both tools have different strong points. IMHO redis-benchmark gains for it's easness of usage, out of the box tests for more than sets/gets, and of course being available when users clone redis. Memtier is more a full blown benchmarking tool but I believe they can both complement each other. That said I believe they should both present extended metrics on latency.

Also, anecdotally I have known users who have "latency goals", and so they care less about the percentile response you are proposing and more of what percentage of requests will hit them. So I think we should definitely include a backwards compatible version too.

Regarding the latency goals, I think it's a valid point that they rely on it and that is basically the inverse of percentile (CDF), and enables answering questions like "What is the percentage of requests served up to X ms". I would keep the both:

"Cumulative distribution of latencies" and
"Latency by percentile distribution"
WDYT?

madolson · 2020-08-07T22:32:29Z

@filipecosta90 I think that makes sense. So it would be great if you can update the PR to be able to produce both options. I'll take a pass through and leave some comments. Would like to hear other thoughts from @redis/core-team as well.

yossigo · 2020-08-09T11:42:49Z

I generally agree there's good justification for both tools. Ideally I think memtier_benchmark should be the superset: support everything that redis-benchmark can do and be the target for all complex/powerful stuff.

For example, redis-benchmark currently doesn't handle TLS/SSL. Last time I had a look at it, it seemed a pretty tedious task to do so I'm OK with avoiding it.

As for this PR, I think the main justification to accept it is the fact that memtier_benchmark is not yet a full superset and users still need to resort to redis-benchmark.

filipecosta90 · 2020-08-09T22:15:45Z

@filipecosta90 I think that makes sense. So it would be great if you can update the PR to be able to produce both options. I'll take a pass through and leave some comments. Would like to hear other thoughts from @redis/core-team as well.

@madolson wdyt about the following output? mimicking the previous behavior on the cumulative distribution of latencies we iterate by a step of 0.1ms up to 2ms and by a step of 1ms afterwards:

====== SET ======                                                     
  100000 requests completed in 0.99 seconds
  50 parallel clients
  3 bytes payload
  keep alive: 1
  host configuration "save": 
  host configuration "appendonly": no
  multi-thread: no

Latency by percentile distribution:
0.000% <= 0.111 milliseconds (cumulative count 1)
50.000% <= 0.303 milliseconds (cumulative count 51727)
75.000% <= 0.359 milliseconds (cumulative count 75435)
87.500% <= 0.415 milliseconds (cumulative count 87716)
93.750% <= 0.471 milliseconds (cumulative count 94099)
96.875% <= 0.527 milliseconds (cumulative count 96875)
98.438% <= 0.615 milliseconds (cumulative count 98464)
99.219% <= 0.799 milliseconds (cumulative count 99223)
99.609% <= 1.319 milliseconds (cumulative count 99614)
99.805% <= 1.911 milliseconds (cumulative count 99807)
99.902% <= 2.375 milliseconds (cumulative count 99903)
99.951% <= 2.983 milliseconds (cumulative count 99952)
99.976% <= 3.535 milliseconds (cumulative count 99976)
99.988% <= 3.823 milliseconds (cumulative count 99988)
99.994% <= 3.999 milliseconds (cumulative count 99994)
99.997% <= 4.087 milliseconds (cumulative count 99997)
99.998% <= 4.215 milliseconds (cumulative count 99999)
99.999% <= 4.327 milliseconds (cumulative count 100000)
100.000% <= 4.327 milliseconds (cumulative count 100000)

Cumulative distribution of latencies:
0.000% <= 0.103 milliseconds (cumulative count 0)
3.355% <= 0.207 milliseconds (cumulative count 3355)
51.727% <= 0.303 milliseconds (cumulative count 51727)
86.394% <= 0.407 milliseconds (cumulative count 86394)
95.979% <= 0.503 milliseconds (cumulative count 95979)
98.373% <= 0.607 milliseconds (cumulative count 98373)
98.944% <= 0.703 milliseconds (cumulative count 98944)
99.237% <= 0.807 milliseconds (cumulative count 99237)
99.366% <= 0.903 milliseconds (cumulative count 99366)
99.452% <= 1.007 milliseconds (cumulative count 99452)
99.512% <= 1.103 milliseconds (cumulative count 99512)
99.546% <= 1.207 milliseconds (cumulative count 99546)
99.599% <= 1.303 milliseconds (cumulative count 99599)
99.666% <= 1.407 milliseconds (cumulative count 99666)
99.695% <= 1.503 milliseconds (cumulative count 99695)
99.726% <= 1.607 milliseconds (cumulative count 99726)
99.751% <= 1.703 milliseconds (cumulative count 99751)
99.776% <= 1.807 milliseconds (cumulative count 99776)
99.802% <= 1.903 milliseconds (cumulative count 99802)
99.843% <= 2.007 milliseconds (cumulative count 99843)
99.868% <= 2.103 milliseconds (cumulative count 99868)
99.958% <= 3.103 milliseconds (cumulative count 99958)
99.997% <= 4.103 milliseconds (cumulative count 99997)
100.000% <= 5.103 milliseconds (cumulative count 100000)

Summary:
  throughput summary: 101214.58 requests per second
  latency summary: min=0.104 msec, q50=0.303 msec, q95=0.487 msec, q99=0.719 msec, max=4.327 msec, avg=0.327 msec

itamarhaber · 2020-08-23T23:32:01Z

Let's try to move ahead with merging this - my only "discomfort" is the INFO like latency summary line which sticks like a sore thumb in the overall stdout format. Maybe I'm just too old - WDUT @filipecosta90 (about the format, not my age)?

Bump @madolson - any additional comments before I approve this?

filipecosta90 · 2020-08-24T16:22:29Z

WDUT @filipecosta90 (about the format, not my age)?

@itamarhaber do you think that a multi-line summary would fit best the summary? like the following:

(...)
Summary:
  throughput summary: 101214.58 requests per second
  latency summary: 
    - min: 0.104 msec
    - p50: 0.303 msec
    - p95: 0.487 msec
    - p99: 0.719 msec
    - max: 4.327 msec
    - avg: 0.327 msec

WDYT?

yossigo · 2020-08-24T17:56:43Z

@filipecosta90 @itamarhaber Not that I want to turn this into a huge issue... but to me a good compromise would be columns:

  latency summary (msec):
	min	q50	q95	q99	max	avg
	0.104	0.303	0.487	0.99	4.327	0.327

At least to me it looks UNIX-ly familiar and as a bonus it's easier to parse.

itamarhaber · 2020-08-25T12:46:57Z

My comment was to enhance readability and parse-ability (for automation) - I'm good with either option (rows or columns).

filipecosta90 · 2020-08-25T12:50:57Z

@filipecosta90 @itamarhaber Not that I want to turn this into a huge issue... but to me a good compromise would be columns:
  latency summary (msec):
	min	q50	q95	q99	max	avg
	0.104	0.303	0.487	0.99	4.327	0.327
At least to me it looks UNIX-ly familiar and as a bonus it's easier to parse.

will push the changes with the column format :)

filipecosta90 · 2020-08-25T14:31:16Z

@yossigo and @itamarhaber added changes per the PR review:

Summary:
  throughput summary: 132802.12 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        0.195     0.112     0.175     0.343     0.631     3.095

itamarhaber

LGTM - merging & thank you.

…edis#7600) A first step to enable a consistent full percentile analysis on query latency so that we can fully understand the performance and stability characteristics of the redis-server system we are measuring. It also improves the instantaneous reported metrics, and the csv output format.

[add] Extended redis-benchmark instantaneous metrics. Extended redis-…

25363f4

…benchmark latency metrics reported summary

[add] displaying the average milliseconds on the instantaneous metrics

8155db7

filipecosta90 mentioned this pull request Aug 1, 2020

Add HdrHistogram support to redis-benchmark #3535

Closed

itamarhaber mentioned this pull request Aug 9, 2020

[Redis-benchmark] Support zset type #7618

Merged

[add] Added cumulative distribution of latencies per PR review

aa1f64a

[add] Added column based latency summary per PR review

062f4ea

itamarhaber approved these changes Aug 25, 2020

View reviewed changes

itamarhaber merged commit 21784de into redis:unstable Aug 25, 2020

filipecosta90 deleted the redisbenchmark.hdr_histogram branch October 15, 2020 19:35

uriyage mentioned this pull request Mar 3, 2021

remove unused latency variable #8597

Merged

This was referenced Apr 26, 2021

Updates redis-benchmark usage and report output #3674

Closed

redis-benchmark: add --percentile option #4455

Closed

enjoy-binbin mentioned this pull request May 30, 2021

Correct the CONFIG_LATENCY_HISTOGRAM_MAX_VALUE value. #9006

Closed

Conversation

filipecosta90 commented Jul 31, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1) changes to benchmark summary

1.1) Current latency report

1.2) Proposed latency report

2) changes to the reported instant metrics

2.1) current output while running benchmark

2.2) Proposed output while running benchmark

3) changes to the csv output

3.1) current csv output

3.3) Proposed csv output

Uh oh!

oranagra commented Aug 1, 2020

Uh oh!

madolson commented Aug 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

filipecosta90 commented Aug 4, 2020

Uh oh!

madolson commented Aug 7, 2020

Uh oh!

yossigo commented Aug 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

filipecosta90 commented Aug 9, 2020

Uh oh!

itamarhaber commented Aug 23, 2020

Uh oh!

filipecosta90 commented Aug 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yossigo commented Aug 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

itamarhaber commented Aug 25, 2020

Uh oh!

filipecosta90 commented Aug 25, 2020

Uh oh!

filipecosta90 commented Aug 25, 2020

Uh oh!

itamarhaber left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

filipecosta90 commented Jul 31, 2020 •

edited

Loading

madolson commented Aug 4, 2020 •

edited

Loading

yossigo commented Aug 9, 2020 •

edited

Loading

filipecosta90 commented Aug 24, 2020 •

edited

Loading

yossigo commented Aug 24, 2020 •

edited

Loading

itamarhaber left a comment •

edited

Loading