-
Notifications
You must be signed in to change notification settings - Fork 24.5k
Description
Context
Today Redis provides partial server-side latency information on command execution in the form of total usec for a given command and average usec_per_call which basically tracks the call() time for a given command.
These stats report the actual CPU time a command execution took, which is a portion of the total time it took for Redis to actually process the command, which includes, the time the command was waiting due to Redis multiplexing. For example, say two commands (e.g., GET and EVAL) are fully received in the socket buffer, then Redis executes aeProcessEvents. First Redis processes (reads the socket, parses then executes the command) the GET, which takes 5us then places the response in the client output buffer. Then Redis executes EVAL which takes 1000us then places the response in the client output buffer. In the beforeSleep Redis sends to the responses to the clients. At this point, both the GET and EVAL latency is approximately 5 + 1000us (assuming little overhead to other functionality in beforeSleep and instantaneous writeToClient). The reason is that before flushing the GET response from the client output buffer, Redis had to process the EVAL. Similarly, before processing the EVAL, Redis had to process GET.
EVAL was used to illustrate that a slowdown in one of the tasks during processSingleFileEvent may slowdown the rest of the commands that are queued up for processing during the current iteration of the event loop. That can be particularly interesting for commands, e.g., GET that have to wait for other commands that are performing fsync during aof. While it may be trivial to understand why a SET command is slow during aof, it does not make sense from the client side to perceive additional latency for a GET as a side effect of that.
Proposal
A server-side, per-command latency, that tries to approach as much as possible the time from receiving the last byte of a command till the first byte of its response is sent to the client. The per-command latency can be reported using histograms (e.g., LATENCY HISTOGRAM) to give insights about spikes as opposed to the current reporting of commandstats which displays an aggregate.
The per-command latency could incorporate various components, including the multiplexing overhead, processing, module blocking, aof, etc. We could consider blocking command latency, but that does not appear to be actual server-side induced latency. As a result, an enhanced version of slowlog could be implemented to incorporate all of these components.
While the best way to measure latency is to track the end-to-end latency from client side, this is very useful to triage issues and identify what component is contributing to the latency (i.e., is it Redis vs network link).
Metadata
Metadata
Assignees
Labels
Type
Projects
Status