Skip to content

Extended redis-benchmark instant metrics and overall latency report#7600

Merged
itamarhaber merged 4 commits intoredis:unstablefrom
filipecosta90:redisbenchmark.hdr_histogram
Aug 25, 2020
Merged

Extended redis-benchmark instant metrics and overall latency report#7600
itamarhaber merged 4 commits intoredis:unstablefrom
filipecosta90:redisbenchmark.hdr_histogram

Conversation

@filipecosta90
Copy link
Contributor

@filipecosta90 filipecosta90 commented Jul 31, 2020

Fixes #3535

This PR is the first step to enable a consistent full percentile analysis on query latency so that we can fully understand the performance and stability characteristics of the redis-server system we are measuring. It also improves the instantaneous reported metrics, and the csv output format.
Apart from extending the reported metrics it should also reduce the overhead of keeping all latency samples on an array. We are moving from keeping track of all data, to keeping track of a data sketch by making usage of an HDR histogram. We've also micro-benchmarked insertions to ensure that this will not impact the benchmark himself ( adding a datapoint takes 6 ns, results here ).

1) changes to benchmark summary

Currently redis-benchmark outputs variable precision latency distributions that make it hard to compare multiple runs, given that there is no ensurance that we can retrieve fixed quantile information (we iterate over latency numbers and not quantiles leading to q50, q95, q99,etc... sometimes not being presented ).
Here is an example of the current summary output:

1.1) Current latency report

====== SET ======
  100000 requests completed in 1.06 seconds
  50 parallel clients
  3 bytes payload
  keep alive: 1
  host configuration "save": 
  host configuration "appendonly": no
  multi-thread: no

0.00% <= 0.1 milliseconds
0.00% <= 0.2 milliseconds
92.27% <= 0.3 milliseconds
99.05% <= 0.4 milliseconds
99.77% <= 0.5 milliseconds
99.96% <= 0.6 milliseconds
100.00% <= 0.7 milliseconds
100.00% <= 0.7 milliseconds
93984.96 requests per second

1.2) Proposed latency report

  • output a logarithmic percentile spectrum ( focusing at the high percentiles)
  • output one liner with minimum ( p0 ), p50, p95, p99, max (p100), and avg(just to be backwards compatible)
  • together with the latency histogram, output the cumulative count of samples so we can quickly relate both
====== SET ======                                                     
  100000 requests completed in 0.75 seconds
  50 parallel clients
  3 bytes payload
  keep alive: 1
  host configuration "save": 3600 1 300 100 60 10000
  host configuration "appendonly": no
  multi-thread: no

Latency by percentile distribution:
0.000% <= 0.119 milliseconds (cumulative count 1)
50.000% <= 0.175 milliseconds (cumulative count 57365)
75.000% <= 0.191 milliseconds (cumulative count 80250)
87.500% <= 0.215 milliseconds (cumulative count 88194)
93.750% <= 0.295 milliseconds (cumulative count 93964)
96.875% <= 0.447 milliseconds (cumulative count 96879)
98.438% <= 0.567 milliseconds (cumulative count 98465)
99.219% <= 0.679 milliseconds (cumulative count 99222)
99.609% <= 0.863 milliseconds (cumulative count 99617)
99.805% <= 0.999 milliseconds (cumulative count 99808)
99.902% <= 1.103 milliseconds (cumulative count 99910)
99.951% <= 2.135 milliseconds (cumulative count 99952)
99.976% <= 2.615 milliseconds (cumulative count 99976)
99.988% <= 2.839 milliseconds (cumulative count 99988)
99.994% <= 2.975 milliseconds (cumulative count 99994)
99.997% <= 3.031 milliseconds (cumulative count 99997)
99.998% <= 3.079 milliseconds (cumulative count 99999)
99.999% <= 3.095 milliseconds (cumulative count 100000)
100.000% <= 3.095 milliseconds (cumulative count 100000)

Cumulative distribution of latencies:
0.000% <= 0.103 milliseconds (cumulative count 0)
86.348% <= 0.207 milliseconds (cumulative count 86348)
94.222% <= 0.303 milliseconds (cumulative count 94222)
96.402% <= 0.407 milliseconds (cumulative count 96402)
97.752% <= 0.503 milliseconds (cumulative count 97752)
98.831% <= 0.607 milliseconds (cumulative count 98831)
99.324% <= 0.703 milliseconds (cumulative count 99324)
99.504% <= 0.807 milliseconds (cumulative count 99504)
99.689% <= 0.903 milliseconds (cumulative count 99689)
99.816% <= 1.007 milliseconds (cumulative count 99816)
99.910% <= 1.103 milliseconds (cumulative count 99910)
99.942% <= 1.207 milliseconds (cumulative count 99942)
99.948% <= 1.303 milliseconds (cumulative count 99948)
99.949% <= 1.407 milliseconds (cumulative count 99949)
99.950% <= 1.503 milliseconds (cumulative count 99950)
99.951% <= 2.103 milliseconds (cumulative count 99951)
100.000% <= 3.103 milliseconds (cumulative count 100000)

Summary:
  throughput summary: 132802.12 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        0.195     0.112     0.175     0.343     0.631     3.095

2) changes to the reported instant metrics

The current output of while running a benchmark can be misleading given that it presents the overall average ops/sec. Right aside to the overall ops/sec we output the instantaneous ops/sec. In addition to it, we output also the average overall and instantaneous latency.

2.1) current output while running benchmark

SET: 95375.08

2.2) Proposed output while running benchmark

SET: rps=154504.0 (overall: 152518.7) avg_msec=0.159 (overall: 0.159)

3) changes to the csv output

following the proposed changes to the overall latency report we've added the latency metrics ( and consequently a header given that we have now more than just ops/sec on the csv ).

3.1) current csv output

redis-benchmark --csv
"PING_INLINE","94161.95"
"PING_BULK","93632.96"
"SET","93720.71"
"GET","94339.62"
"INCR","94339.62"
"LPUSH","94073.38"
"RPUSH","94786.73"
"LPOP","94696.97"
"RPOP","94339.62"
"SADD","94517.96"
"HSET","94161.95"
"SPOP","94250.71"
"LPUSH (needed to benchmark LRANGE)","95328.88"
"LRANGE_100 (first 100 elements)","60753.34"
"LRANGE_300 (first 300 elements)","27314.94"
"LRANGE_500 (first 450 elements)","20820.32"
"LRANGE_600 (first 600 elements)","16716.82"
"MSET (10 keys)","104058.27"

3.3) Proposed csv output

redis-benchmark --csv
"test","rps","avg_latency_ms","min_latency_ms","p50_latency_ms","p95_latency_ms","p99_latency_ms","max_latency_ms"
"PING_INLINE","131752.31","0.196","0.112","0.183","0.279","0.639","1.863"
"PING_BULK","125313.29","0.208","0.080","0.199","0.295","0.575","5.367"
"SET","135501.36","0.191","0.104","0.191","0.247","0.359","1.655"
"GET","141242.94","0.182","0.112","0.175","0.231","0.375","1.207"
"INCR","129701.68","0.199","0.104","0.183","0.335","0.423","3.351"
"LPUSH","135135.14","0.194","0.112","0.191","0.247","0.431","2.695"
"RPUSH","136798.91","0.190","0.088","0.183","0.247","0.367","1.567"
"LPOP","159489.64","0.171","0.080","0.159","0.271","0.551","1.583"
"RPOP","139470.02","0.186","0.112","0.183","0.239","0.359","1.503"
"SADD","134770.89","0.192","0.104","0.191","0.231","0.343","2.311"
"HSET","138504.16","0.188","0.088","0.183","0.231","0.327","1.303"
"SPOP","142247.52","0.183","0.096","0.175","0.239","0.383","1.183"
"LPUSH (needed to benchmark LRANGE)","129032.27","0.201","0.096","0.199","0.247","0.367","1.871"
"LRANGE_100 (first 100 elements)","66844.91","0.384","0.248","0.367","0.487","0.751","5.255"
"LRANGE_300 (first 300 elements)","23218.02","1.093","0.320","0.975","1.807","2.631","8.103"
"LRANGE_500 (first 450 elements)","12818.87","1.968","0.520","1.727","3.511","4.519","13.839"
"LRANGE_600 (first 600 elements)","14015.42","1.784","0.304","1.703","2.671","3.199","14.167"
"MSET (10 keys)","121065.38","0.339","0.096","0.327","0.495","0.631","1.055"

@oranagra
Copy link
Member

oranagra commented Aug 1, 2020

thanks @filipecosta90
LGTM. (failed test is timing issue of a new test i added, fix in #7604)

@redis/core-team this PR adds another (small) dependency, solely for redis-benchmark. I don't like to bloat the project just for the benchmark, but on the other hand the benchmark has little value if it's not accurate or doesn't provide enough info, so it should either be extracted to a different repo, or improved inside redis.
in anyway, while improving it, there's no sense to write everything from scratch if there's something that fits just what we need and doesn't include much more extra.

the other concern is backwards compatibility of the output, but i guess that's not a big concern.
if someone has a script that relies on the output, he'll need to adjust.
An alternative would be to keep supporting the old output and add some switch, i suppose we can do that using the new implementation which will just emit a format similar to the old one, but i'm not sure it worth it.

please share your thoughts.

@madolson
Copy link
Contributor

madolson commented Aug 4, 2020

I don't think redis-benchmark is generally considered to be on the same level as something like memtier, which is a much better benchmarking tool. I would not be inclined to accept this if we intend to continue using redis-benchmark as just an easy adhoc tool that comes out of the box. If we want to start using it more for validating the performance of Redis, I think something like this would be justified.

Also, anecdotally I have known users who have "latency goals", and so they care less about the percentile response you are proposing and more of what percentage of requests will hit them. So I think we should definitely include a backwards compatible version too.

@filipecosta90
Copy link
Contributor Author

I don't think redis-benchmark is generally considered to be on the same level as something like memtier, which is a much better benchmarking tool. I would not be inclined to accept this if we intend to continue using redis-benchmark as just an easy adhoc tool that comes out of the box. If we want to start using it more for validating the performance of Redis, I think something like this would be justified.

@madolson I believe both tools have different strong points. IMHO redis-benchmark gains for it's easness of usage, out of the box tests for more than sets/gets, and of course being available when users clone redis. Memtier is more a full blown benchmarking tool but I believe they can both complement each other. That said I believe they should both present extended metrics on latency.

Also, anecdotally I have known users who have "latency goals", and so they care less about the percentile response you are proposing and more of what percentage of requests will hit them. So I think we should definitely include a backwards compatible version too.

Regarding the latency goals, I think it's a valid point that they rely on it and that is basically the inverse of percentile (CDF), and enables answering questions like "What is the percentage of requests served up to X ms". I would keep the both:

  • "Cumulative distribution of latencies" and
  • "Latency by percentile distribution"
    WDYT?

@madolson
Copy link
Contributor

madolson commented Aug 7, 2020

@filipecosta90 I think that makes sense. So it would be great if you can update the PR to be able to produce both options. I'll take a pass through and leave some comments. Would like to hear other thoughts from @redis/core-team as well.

@yossigo
Copy link
Collaborator

yossigo commented Aug 9, 2020

I generally agree there's good justification for both tools. Ideally I think memtier_benchmark should be the superset: support everything that redis-benchmark can do and be the target for all complex/powerful stuff.

For example, redis-benchmark currently doesn't handle TLS/SSL. Last time I had a look at it, it seemed a pretty tedious task to do so I'm OK with avoiding it.

As for this PR, I think the main justification to accept it is the fact that memtier_benchmark is not yet a full superset and users still need to resort to redis-benchmark.

@filipecosta90
Copy link
Contributor Author

@filipecosta90 I think that makes sense. So it would be great if you can update the PR to be able to produce both options. I'll take a pass through and leave some comments. Would like to hear other thoughts from @redis/core-team as well.

@madolson wdyt about the following output? mimicking the previous behavior on the cumulative distribution of latencies we iterate by a step of 0.1ms up to 2ms and by a step of 1ms afterwards:

====== SET ======                                                     
  100000 requests completed in 0.99 seconds
  50 parallel clients
  3 bytes payload
  keep alive: 1
  host configuration "save": 
  host configuration "appendonly": no
  multi-thread: no

Latency by percentile distribution:
0.000% <= 0.111 milliseconds (cumulative count 1)
50.000% <= 0.303 milliseconds (cumulative count 51727)
75.000% <= 0.359 milliseconds (cumulative count 75435)
87.500% <= 0.415 milliseconds (cumulative count 87716)
93.750% <= 0.471 milliseconds (cumulative count 94099)
96.875% <= 0.527 milliseconds (cumulative count 96875)
98.438% <= 0.615 milliseconds (cumulative count 98464)
99.219% <= 0.799 milliseconds (cumulative count 99223)
99.609% <= 1.319 milliseconds (cumulative count 99614)
99.805% <= 1.911 milliseconds (cumulative count 99807)
99.902% <= 2.375 milliseconds (cumulative count 99903)
99.951% <= 2.983 milliseconds (cumulative count 99952)
99.976% <= 3.535 milliseconds (cumulative count 99976)
99.988% <= 3.823 milliseconds (cumulative count 99988)
99.994% <= 3.999 milliseconds (cumulative count 99994)
99.997% <= 4.087 milliseconds (cumulative count 99997)
99.998% <= 4.215 milliseconds (cumulative count 99999)
99.999% <= 4.327 milliseconds (cumulative count 100000)
100.000% <= 4.327 milliseconds (cumulative count 100000)

Cumulative distribution of latencies:
0.000% <= 0.103 milliseconds (cumulative count 0)
3.355% <= 0.207 milliseconds (cumulative count 3355)
51.727% <= 0.303 milliseconds (cumulative count 51727)
86.394% <= 0.407 milliseconds (cumulative count 86394)
95.979% <= 0.503 milliseconds (cumulative count 95979)
98.373% <= 0.607 milliseconds (cumulative count 98373)
98.944% <= 0.703 milliseconds (cumulative count 98944)
99.237% <= 0.807 milliseconds (cumulative count 99237)
99.366% <= 0.903 milliseconds (cumulative count 99366)
99.452% <= 1.007 milliseconds (cumulative count 99452)
99.512% <= 1.103 milliseconds (cumulative count 99512)
99.546% <= 1.207 milliseconds (cumulative count 99546)
99.599% <= 1.303 milliseconds (cumulative count 99599)
99.666% <= 1.407 milliseconds (cumulative count 99666)
99.695% <= 1.503 milliseconds (cumulative count 99695)
99.726% <= 1.607 milliseconds (cumulative count 99726)
99.751% <= 1.703 milliseconds (cumulative count 99751)
99.776% <= 1.807 milliseconds (cumulative count 99776)
99.802% <= 1.903 milliseconds (cumulative count 99802)
99.843% <= 2.007 milliseconds (cumulative count 99843)
99.868% <= 2.103 milliseconds (cumulative count 99868)
99.958% <= 3.103 milliseconds (cumulative count 99958)
99.997% <= 4.103 milliseconds (cumulative count 99997)
100.000% <= 5.103 milliseconds (cumulative count 100000)

Summary:
  throughput summary: 101214.58 requests per second
  latency summary: min=0.104 msec, q50=0.303 msec, q95=0.487 msec, q99=0.719 msec, max=4.327 msec, avg=0.327 msec

@itamarhaber
Copy link
Member

Let's try to move ahead with merging this - my only "discomfort" is the INFO like latency summary line which sticks like a sore thumb in the overall stdout format. Maybe I'm just too old - WDUT @filipecosta90 (about the format, not my age)?

Bump @madolson - any additional comments before I approve this?

@filipecosta90
Copy link
Contributor Author

filipecosta90 commented Aug 24, 2020

WDUT @filipecosta90 (about the format, not my age)?

@itamarhaber do you think that a multi-line summary would fit best the summary? like the following:

(...)
Summary:
  throughput summary: 101214.58 requests per second
  latency summary: 
    - min: 0.104 msec
    - p50: 0.303 msec
    - p95: 0.487 msec
    - p99: 0.719 msec
    - max: 4.327 msec
    - avg: 0.327 msec

WDYT?

@yossigo
Copy link
Collaborator

yossigo commented Aug 24, 2020

@filipecosta90 @itamarhaber Not that I want to turn this into a huge issue... but to me a good compromise would be columns:

  latency summary (msec):
	min	q50	q95	q99	max	avg
	0.104	0.303	0.487	0.99	4.327	0.327

At least to me it looks UNIX-ly familiar and as a bonus it's easier to parse.

@itamarhaber
Copy link
Member

My comment was to enhance readability and parse-ability (for automation) - I'm good with either option (rows or columns).

@filipecosta90
Copy link
Contributor Author

@filipecosta90 @itamarhaber Not that I want to turn this into a huge issue... but to me a good compromise would be columns:

  latency summary (msec):
	min	q50	q95	q99	max	avg
	0.104	0.303	0.487	0.99	4.327	0.327

At least to me it looks UNIX-ly familiar and as a bonus it's easier to parse.

@filipecosta90 @itamarhaber Not that I want to turn this into a huge issue... but to me a good compromise would be columns:

  latency summary (msec):
	min	q50	q95	q99	max	avg
	0.104	0.303	0.487	0.99	4.327	0.327

At least to me it looks UNIX-ly familiar and as a bonus it's easier to parse.

will push the changes with the column format :)

@filipecosta90
Copy link
Contributor Author

@yossigo and @itamarhaber added changes per the PR review:

Summary:
  throughput summary: 132802.12 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        0.195     0.112     0.175     0.343     0.631     3.095

Copy link
Member

@itamarhaber itamarhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - merging & thank you.

@itamarhaber itamarhaber merged commit 21784de into redis:unstable Aug 25, 2020
JackieXie168 pushed a commit to JackieXie168/redis that referenced this pull request Sep 2, 2020
…edis#7600)

A first step to enable a consistent full percentile analysis on query latency so that we can fully understand the performance and stability characteristics of the redis-server system we are measuring. It also improves the instantaneous reported metrics, and the csv output format.
@filipecosta90 filipecosta90 deleted the redisbenchmark.hdr_histogram branch October 15, 2020 19:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add HdrHistogram support to redis-benchmark

5 participants