Extended redis-benchmark instant metrics and overall latency report#7600
Conversation
…benchmark latency metrics reported summary
|
thanks @filipecosta90 @redis/core-team this PR adds another (small) dependency, solely for redis-benchmark. I don't like to bloat the project just for the benchmark, but on the other hand the benchmark has little value if it's not accurate or doesn't provide enough info, so it should either be extracted to a different repo, or improved inside redis. the other concern is backwards compatibility of the output, but i guess that's not a big concern. please share your thoughts. |
|
I don't think redis-benchmark is generally considered to be on the same level as something like memtier, which is a much better benchmarking tool. I would not be inclined to accept this if we intend to continue using redis-benchmark as just an easy adhoc tool that comes out of the box. If we want to start using it more for validating the performance of Redis, I think something like this would be justified. Also, anecdotally I have known users who have "latency goals", and so they care less about the percentile response you are proposing and more of what percentage of requests will hit them. So I think we should definitely include a backwards compatible version too. |
@madolson I believe both tools have different strong points. IMHO redis-benchmark gains for it's easness of usage, out of the box tests for more than sets/gets, and of course being available when users clone redis. Memtier is more a full blown benchmarking tool but I believe they can both complement each other. That said I believe they should both present extended metrics on latency.
Regarding the latency goals, I think it's a valid point that they rely on it and that is basically the inverse of percentile (CDF), and enables answering questions like "What is the percentage of requests served up to X ms". I would keep the both:
|
|
@filipecosta90 I think that makes sense. So it would be great if you can update the PR to be able to produce both options. I'll take a pass through and leave some comments. Would like to hear other thoughts from @redis/core-team as well. |
|
I generally agree there's good justification for both tools. Ideally I think For example, As for this PR, I think the main justification to accept it is the fact that |
@madolson wdyt about the following output? mimicking the previous behavior on the cumulative distribution of latencies we iterate by a step of 0.1ms up to 2ms and by a step of 1ms afterwards: |
|
Let's try to move ahead with merging this - my only "discomfort" is the INFO like Bump @madolson - any additional comments before I approve this? |
@itamarhaber do you think that a multi-line summary would fit best the summary? like the following: WDYT? |
|
@filipecosta90 @itamarhaber Not that I want to turn this into a huge issue... but to me a good compromise would be columns: At least to me it looks UNIX-ly familiar and as a bonus it's easier to parse. |
|
My comment was to enhance readability and parse-ability (for automation) - I'm good with either option (rows or columns). |
will push the changes with the column format :) |
|
@yossigo and @itamarhaber added changes per the PR review: |
…edis#7600) A first step to enable a consistent full percentile analysis on query latency so that we can fully understand the performance and stability characteristics of the redis-server system we are measuring. It also improves the instantaneous reported metrics, and the csv output format.
Fixes #3535
This PR is the first step to enable a consistent full percentile analysis on query latency so that we can fully understand the performance and stability characteristics of the redis-server system we are measuring. It also improves the instantaneous reported metrics, and the csv output format.
Apart from extending the reported metrics it should also reduce the overhead of keeping all latency samples on an array. We are moving from keeping track of all data, to keeping track of a data sketch by making usage of an HDR histogram. We've also micro-benchmarked insertions to ensure that this will not impact the benchmark himself ( adding a datapoint takes 6 ns, results here ).
1) changes to benchmark summary
Currently
redis-benchmarkoutputs variable precision latency distributions that make it hard to compare multiple runs, given that there is no ensurance that we can retrieve fixed quantile information (we iterate over latency numbers and not quantiles leading to q50, q95, q99,etc... sometimes not being presented ).Here is an example of the current summary output:
1.1) Current latency report
1.2) Proposed latency report
2) changes to the reported instant metrics
The current output of while running a benchmark can be misleading given that it presents the overall average ops/sec. Right aside to the overall ops/sec we output the instantaneous ops/sec. In addition to it, we output also the average overall and instantaneous latency.
2.1) current output while running benchmark
2.2) Proposed output while running benchmark
3) changes to the csv output
following the proposed changes to the overall latency report we've added the latency metrics ( and consequently a header given that we have now more than just ops/sec on the csv ).
3.1) current csv output
3.3) Proposed csv output