Conversation
7b12ab4 to
f0f8c3b
Compare
f0f8c3b to
66b9bf0
Compare
|
Rebased on master after merging retries support. I think the build may fail since the |
olix0r
left a comment
There was a problem hiding this comment.
This looks really good to me! Do you have any thoughts about the way to highlight these errors in metrics?
|
Also, it appears that the Cargo.lock is not up-to-date... |
66b9bf0 to
9255190
Compare
Do we want a separate metric for them? Or just an additional label? Do we want to surface them by default, or just provide the ability to, such as by adding a custom response header that Route Response Classifiers could look for? |
9255190 to
85474e5
Compare
85474e5 to
7b4b7f5
Compare
7b4b7f5 to
2a995b8
Compare
|
Several tests appear to hang on this branch: |
|
Looks like gRPC errors. My guess is something about my reference to the linkerd2-proxy-api is out of sync with this repo... |
|
Oh I see, its related to having removed a bind timeout for the discovery tests... |
|
Yea, so the tests are checking that if we don't get We can either:
|
|
@seanmonstar For now, i'm in favor of removing this timeout. We don't want to stall forever when we're doing service discovery, but we should be conditioned to fail open rather than erroring hard as we do today. |
|
@olix0r to clarify, you suggest just removing these tests? I believe adding some sort of |
|
@seanmonstar I think we should remove tests that depend on the old timeout logic, though I think those tests cover cases we still care about. It's not clear to me why they rely on the timeout. |
|
@olix0r those tests are specifically checking that we don't hang if for various reasons we aren't able to determine where to route to. For reference, I've already written a |
|
@olix0r Newest commit re-introduces the bind timeout, but it only applies to the time a request will wait in the buffer until the balance layer is ready. Once it's dispatched through ( |
|
I'm still seeing test failures here |
|
Hrm, it's passing locally for me. Fun! |
|
Tested on both OSX and Linux (at 4bbd0c3) and I reliably get: |
|
I'm still shocked it passes locally, as I've found why it definitely shouldn't. The |
4bbd0c3 to
8adcabe
Compare
|
Many of the discovery tests were exercising things like not exceeding our gRPC Destination Service stream count, or that reconnects make the proxy "forget" old addresses, and in both cases, the request will wait for a capacity to free up, or an update from the controller. Thus, they were hitting our I've changed those tests to instead expect a client side timeout, asserting that the request doesn't proceed past the our There was one test that was specifically testing the |
Cargo.toml
Outdated
| linkerd2-timeout = { path = "lib/timeout" } | ||
|
|
||
| linkerd2-proxy-api = { git = "https://github.com/linkerd/linkerd2-proxy-api", rev = "1ad7018", version = "0.1.4" } | ||
| linkerd2-proxy-api = { git = "https://github.com/linkerd/linkerd2-proxy-api", branch = "sean/timeouts" } #tag = "v0.1.4", version = "0.1.4" } |
There was a problem hiding this comment.
Should point back at master (or a tag) before merging.
| fn start<B>(self, rsp: &http::Response<B>) -> Eos { | ||
| if rsp.extensions().get::<timeout::ProxyTimedOut>().is_some() { | ||
| return Eos::Error("timeout"); | ||
| } |
There was a problem hiding this comment.
This seems fine, though you might also consider doing this slightly differently: Instead of making the classify module enumerate all of the types of errors it may observfe, we could instead have modules add a classify::ProxyError(&'static str) marker onto responses -- so this would be able to work with arbitrary error types in the stack, not just timeouts.
This then becomes something to the effect of:
if let Some(ProxyError(msg)) = rsp.extensions().get::<ProxyError>() {
return Eos::Error(msg);
}There was a problem hiding this comment.
That'd seem to have app types present in proxy modules... Unless you mean just a general proxy::http::classify::ProxyError thingy?
olix0r
left a comment
There was a problem hiding this comment.
Modulo the proxy-api dependency and the Error specialization issue i flagged (not critical, though something we'll likely want to change at some point), I think this is basically good to merge. 👍
8adcabe to
64ff0d7
Compare
|
Oh noes, seems I was bitten by #170 not yet being merged, and thus |
e973750 to
fd07179
Compare
If a service profile specifies a timeout for a given route, this applies a timeout waiting for responses on that route. Signed-off-by: Sean McArthur <[email protected]>
fd07179 to
8fa37e2
Compare
|
Updated, green, just needs final r+. |
proxy: update pinned version to 5b507a9 This picks up the following proxy commits: * eaabc48 Update tower-grpc * e9561de Update h2 to 0.1.16 * 28fd5e7 Add Route timeouts (linkerd/linkerd2-proxy#165) * 5637372 Re-flag tcp_duration tests as flaky * 20cbd18 Revise several log levels and messages (linkerd/linkerd2-proxy##177) * ae16978 Remove flakiness from 'profiles' tests * 49c29cd canonicalize: Only log errors at the WARN level when falling back (linkerd/linkerd2-proxy#174) * 486dd13 Make outbound router honor `l5d-dst-override` header (linkerd/linkerd2-proxy#173) * 7adc50d Make timeouts for canonicalization DNS queries tuneable (linkerd/linkerd2-proxy#175) * 3188179 Try reducing CI flakiness by reducing RUST_TEST_THREADS to 1 Some of these changes will probably need changelog entries: * Improve logging when rejecting malformed HTTP/2 pseudo-headers (hyperium/h2#347) * Improve logging for gRPC errors (tower-rs/tower-grpc#111) * Add Route timeouts (linkerd/linkerd2-proxy#165) * Downgrade several of the noisiest log messages to TRACE (linkerd/linkerd2-proxy##177) * Add an environment variable for configuring the DNS canonicalization timeout (linkerd/linkerd2-proxy#175) * Make outbound router honor `l5d-dst-override` header (linkerd/linkerd2-proxy#173) Perhaps all the logging related changes can be grouped into one changelog entry, though... Signed-off-by: Eliza Weisman <[email protected]>
This picks up the following proxy commits: * eaabc48 Update tower-grpc * e9561de Update h2 to 0.1.16 * 28fd5e7 Add Route timeouts (linkerd/linkerd2-proxy#165) * 5637372 Re-flag tcp_duration tests as flaky * 20cbd18 Revise several log levels and messages (linkerd/linkerd2-proxy##177) * ae16978 Remove flakiness from 'profiles' tests * 49c29cd canonicalize: Only log errors at the WARN level when falling back (linkerd/linkerd2-proxy#174) * 486dd13 Make outbound router honor `l5d-dst-override` header (linkerd/linkerd2-proxy#173) * 7adc50d Make timeouts for canonicalization DNS queries tuneable (linkerd/linkerd2-proxy#175) * 3188179 Try reducing CI flakiness by reducing RUST_TEST_THREADS to 1 Some of these changes will probably need changelog entries: * Improve logging when rejecting malformed HTTP/2 pseudo-headers (hyperium/h2#347) * Improve logging for gRPC errors (tower-rs/tower-grpc#111) * Add Route timeouts (linkerd/linkerd2-proxy#165) * Downgrade several of the noisiest log messages to TRACE (linkerd/linkerd2-proxy##177) * Add an environment variable for configuring the DNS canonicalization timeout (linkerd/linkerd2-proxy#175) * Make outbound router honor `l5d-dst-override` header (linkerd/linkerd2-proxy#173) Perhaps all the logging related changes can be grouped into one changelog entry, though... Signed-off-by: Eliza Weisman <[email protected]>
simulate-proxy sends HTTP example data. Modify this test script to also send TCP example data. Part of linkerd#132 Signed-off-by: Andrew Seigner <[email protected]>
If a service profile specifies a timeout for a given route, this applies
a timeout waiting for responses on that route.
Timeouts show up in proxy metrics like this: