Redis server-side latency

### Context

Today Redis provides partial server-side latency information on command execution in the form of total `usec` for a given command and average `usec_per_call` which basically tracks the `call()` time for a given command. 

These stats report the actual CPU time a command execution took, which is a portion of the total time it took for Redis to actually process the command, which includes, the time the command was waiting due to Redis multiplexing. For example, say two commands (e.g., `GET` and `EVAL`) are fully received in the socket buffer, then Redis executes `aeProcessEvents`. First  Redis processes (reads the socket, parses then executes the command) the `GET`, which takes `5us` then places the response in the client output buffer. Then Redis executes `EVAL` which takes `1000us` then places the response  in the client output buffer. In the `beforeSleep` Redis sends to the responses to the clients. At this point, both the `GET` and `EVAL` latency is approximately `5 + 1000us` (assuming little overhead to other functionality in `beforeSleep` and instantaneous `writeToClient`). The reason is that before flushing the `GET` response from the client output buffer, Redis had to process the `EVAL`. Similarly, before processing the `EVAL`, Redis had to process `GET`. 

`EVAL` was used to illustrate that a slowdown in one of the tasks during `processSingleFileEvent` may slowdown the rest of the commands that are queued up for processing during the current iteration of the event loop. That can be particularly interesting for commands, e.g., `GET` that have to wait for other commands that are performing `fsync` during `aof`. While it may be trivial to understand why a `SET` command is slow during `aof`, it does not make sense from the client side to perceive additional latency for a `GET` as a side effect of that. 

### Proposal 

A server-side, per-command latency, that tries to approach as much as possible the time from receiving the last byte of a command till the first byte of its response is sent to the client. The per-command latency can be reported using histograms (e.g., `LATENCY HISTOGRAM`) to give insights about spikes as opposed to the current reporting of `commandstats` which displays an aggregate. 

The per-command latency could incorporate various components, including the multiplexing overhead, processing, module blocking, `aof`, etc. We could consider blocking command latency, but that does not appear to be actual server-side induced latency. As a result, an enhanced version of `slowlog` could be implemented to incorporate all of these components. 

While the best way to measure latency is to track the end-to-end latency from client side, this is very useful to triage issues and identify what component is contributing to the latency (i.e., is it Redis vs network link). 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redis server-side latency #10719

Context

Proposal

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Redis server-side latency #10719

Description

Context

Proposal

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions