Conversation
This comment has been minimized.
This comment has been minimized.
5eba0a0 to
da9767b
Compare
f685ccc to
6f5ac1a
Compare
The proxy handles errors by synthesizing responses; however, depending on where such an error occurs, it may not be recorded by a metrics layer. This change introduces an `request_errors_total` counter that records the number/kind of errors encountered. This is immediately helpful for debugging and diagnostics, but could also be helpful to view in the UI. We may want to revisit this metric, especialy its labels, before relying on it more prominently. But in the meantime it's extraordinarily useful as it is.
6f5ac1a to
b98e38a
Compare
hawkw
left a comment
There was a problem hiding this comment.
seems very straightforward — lgtm!
kleimkuhler
left a comment
There was a problem hiding this comment.
I have a question about the implementation of error-metrics. It's not a blocker on merging, but once that's resolved, everything else looks good to me.
linkerd/error-metrics/src/lib.rs
Outdated
| pub struct RecordErrorLayer<L, K: Hash + Eq> { | ||
| label: L, | ||
| errors: Arc<Mutex<IndexMap<K, Counter>>>, | ||
| } | ||
|
|
||
| pub struct RecordError<L, K: Hash + Eq, S> { | ||
| label: L, | ||
| errors: Arc<Mutex<IndexMap<K, Counter>>>, | ||
| inner: S, | ||
| } |
There was a problem hiding this comment.
What are your thoughts on changing these structs to:
pub struct RecordErrorLayer<L, K: Hash + Eq> {
label: L,
registry: Registry<K>,
}
pub struct RecordError<L, K: Hash + Eq, S> {
label: L,
registry: Registry<K>,
inner: S,
}
When reading through this change, this change is what I expected to see and how I thought about it when understanding the rest of the impls.
There was a problem hiding this comment.
Alternatively, this could be similar to the Registry in #428 where:
type Registry<L> = Arc<Mutex<IndexMap<L, Arc<Metrics>>>>;
There was a problem hiding this comment.
Eh, the Registry type is really there as a public handle for getting layers and reporting the results... It's a bit confusing for Registry to produce layers but those layers just hold Registries?
This might all be clearer if I split the types up into separate files?
There was a problem hiding this comment.
Yea it helps, but I think what I'm reacting to is it seems like each *-metrics crate has it's own way of implementing what it's concept of a Registry is and they are just different enough that I feel like I'm reviewing new impls.
Reviewing this change, the other open #428 metrics change, and the previous http-metrics crate change all have a similar purpose but don't really make their metrics recording services the same way.
I'm okay merging this as is, but it may just be something to be aware of that as more metrics are added, see if it makes sense to introduce a reusable "metrics registry".
This release includes the results from continued profiling & performance analysis. In addition to modifying internals to prevent unwarranted memory growth, we've introduced new metrics to aid in debugging and diagnostics: a new `request_errors_total` metric exposes the number of requests that receive synthesized responses due to proxy errors; and a suite of `stack_*` metrics expose proxy internals that can help us identify unexpected behavior. --- * trace: update `tracing-subscriber` dependency to 0.2.1 (linkerd/linkerd2-proxy#426) * Reimplement the Lock middleware with tokio::sync (linkerd/linkerd2-proxy#427) * Add the request_errors_total metric (linkerd/linkerd2-proxy#417) * Expose the number of service instances in the proxy (linkerd/linkerd2-proxy#428) * concurrency-limit: Share a limit across Services (linkerd/linkerd2-proxy#429) * profiling: add benchmark and profiling scripts (linkerd/linkerd2-proxy#406) * http-box: Box HTTP payloads via middleware (linkerd/linkerd2-proxy#430) * lock: Generalize to protect a guarded value (linkerd/linkerd2-proxy#431)
This release includes the results from continued profiling & performance analysis. In addition to modifying internals to prevent unwarranted memory growth, we've introduced new metrics to aid in debugging and diagnostics: a new `request_errors_total` metric exposes the number of requests that receive synthesized responses due to proxy errors; and a suite of `stack_*` metrics expose proxy internals that can help us identify unexpected behavior. --- * trace: update `tracing-subscriber` dependency to 0.2.1 (linkerd/linkerd2-proxy#426) * Reimplement the Lock middleware with tokio::sync (linkerd/linkerd2-proxy#427) * Add the request_errors_total metric (linkerd/linkerd2-proxy#417) * Expose the number of service instances in the proxy (linkerd/linkerd2-proxy#428) * concurrency-limit: Share a limit across Services (linkerd/linkerd2-proxy#429) * profiling: add benchmark and profiling scripts (linkerd/linkerd2-proxy#406) * http-box: Box HTTP payloads via middleware (linkerd/linkerd2-proxy#430) * lock: Generalize to protect a guarded value (linkerd/linkerd2-proxy#431)
The proxy handles errors by synthesizing responses; however, depending
on where such an error occurs, it may not be recorded by a metrics
layer.
This change introduces a
request_errors_totalcounter that recordsthe number/kind of errors encountered. This is immediately helpful for
debugging and diagnostics, but could also be helpful to view in the UI.
We may want to revisit this metric, especialy its labels, before relying
on it more prominently. But in the meantime it's extraordinarily useful
as it is.