Fail requests that loop through the gateway#545
Conversation
When the gateway forwards requests, it now adds a `Forwarded` header including the source identity, the local identity, and the destination authority.
This change uses the gateway's `Forwarded` header to detect if the request has already transited through this gateway. This is determination is made by comparing ID strings, so this will prevent gateway daisy-chaining when clusters do not use distinct identity domains.
hawkw
left a comment
There was a problem hiding this comment.
this looks good to me! my main concern was about whether it's correct to use 508 Loop Detected here or not, i could be convinced either way but thought i'd bring it up. my other comments are not blockers.
linkerd/app/gateway/src/gateway.rs
Outdated
| if let Some(by) = fwd_by(fwd) { | ||
| if by == local_identity.as_ref() { | ||
| tracing::info!(fwd, "Loop detected"); | ||
| return rsp(http::StatusCode::LOOP_DETECTED); |
There was a problem hiding this comment.
my understanding is that 508 Loop Detected is specific to WebDAV (per MDN). the name of the status is definitely correct in this case, but the documented semantics of the code seem to mean something entirely different. is it correct to overload the meaning of that status here, or should we use a different error code?
There was a problem hiding this comment.
Good point!
I think that leaves us with 502. I'll do that and upgrade the info to a warn.
There was a problem hiding this comment.
Leaving this as 508 for now, as it's more descriptive to users. I'm less worried about the WebDAV-origin, because it's semantically correct and we're not violating any part of the RFC.
linkerd/app/gateway/src/gateway.rs
Outdated
| { | ||
| if let Some(by) = fwd_by(fwd) { | ||
| if by == local_identity.as_ref() { | ||
| tracing::info!(fwd, "Loop detected"); |
There was a problem hiding this comment.
should this be a warning? also, do we want to consider eventually adding a metric label or something so that the responses that fail due to looping are counted separately? of course, this can be a follow up; just a thought.
There was a problem hiding this comment.
Alternatively, we could throw an error here and let the error-handler take the wheel...
This ensures that error metrics are recorded and that logging is emitted uniformly. This also ensures that gRPC requests don't get HTTP error responses.
kleimkuhler
left a comment
There was a problem hiding this comment.
Looks good after the status code change!
This change uses the gateway's `Forwarded` header to detect if the request has already transited through this gateway. This is determination is made by comparing ID strings, so this will prevent gateway daisy-chaining when clusters do not use distinct identity domains.
The proxy can now operate as gateway, routing requests from its inbound proxy to the outbound proxy, without passing the requests to a local application. This supports Linkerd's multicluster feature by adding a `Forwarded` header to propagate the original client identity and assist in loop detection. --- * Add loop detection to inbound & TCP forwarding (linkerd/linkerd2-proxy#527) * Test loop detection (linkerd/linkerd2-proxy#532) * fallback: Unwrap errors recursively (linkerd/linkerd2-proxy#534) * app: Split inbound/outbound constructors into components (linkerd/linkerd2-proxy#533) * Introduce a gateway between inbound and outbound (linkerd/linkerd2-proxy#540) * gateway: Add a Forwarded header (linkerd/linkerd2-proxy#544) * gateway: Return errors instead of responses (linkerd/linkerd2-proxy#547) * Fail requests that loop through the gateway (linkerd/linkerd2-proxy#545)
This change modifies the linkerd-gateway component to use the inbound proxy, rather than nginx, for gateway. This allows us to detect loops and propagate identity through the gateway. This change also cleans up port naming to `mc-gateway` and `mc-probe` to resolve conflicts with Kubernetes validation. --- * proxy: v2.99.0 The proxy can now operate as gateway, routing requests from its inbound proxy to the outbound proxy, without passing the requests to a local application. This supports Linkerd's multicluster feature by adding a `Forwarded` header to propagate the original client identity and assist in loop detection. --- * Add loop detection to inbound & TCP forwarding (linkerd/linkerd2-proxy#527) * Test loop detection (linkerd/linkerd2-proxy#532) * fallback: Unwrap errors recursively (linkerd/linkerd2-proxy#534) * app: Split inbound/outbound constructors into components (linkerd/linkerd2-proxy#533) * Introduce a gateway between inbound and outbound (linkerd/linkerd2-proxy#540) * gateway: Add a Forwarded header (linkerd/linkerd2-proxy#544) * gateway: Return errors instead of responses (linkerd/linkerd2-proxy#547) * Fail requests that loop through the gateway (linkerd/linkerd2-proxy#545) * inject: Support config.linkerd.io/enable-gateway This change introduces a new annotation, config.linkerd.io/enable-gateway, that, when set, enables the proxy to act as a gateway, routing all traffic targetting the inbound listener through the outbound proxy. This also removes the nginx default listener and gateway port of 4180, instead using 4143 (the inbound port). * proxy: v2.100.0 This change modifies the inbound gateway caching so that requests may be routed to multiple leaves of a traffic split. --- * inbound: Do not cache gateway services (linkerd/linkerd2-proxy#549)
* outbound: Prevent loops (#525) This change modifies the outbound proxy to fail to build services targetting localhost:4140 (where 4140 is the outbound port). This prevents looping and will result in 502s. * Add loop detection to inbound & TCP forwarding (#527) e77fe18 introduced loop detection to the outbound HTTP proxy. This change extends this behavior to the inbound HTTP proxy and the TCP proxy for both inbound and outbound. This helps ensure malicious requests can't consume proxy resources. * Test loop detection (#532) This change adds (flakey) tests for loop detection. The tests are flakey because they require static ports to work properly. (We cannot configure the original dst port to be the same as the interface port if the interface port is not known). * fallback: Unwrap errors recursively (#534) This change modifies the fallback layer to inspect error sources recursively to determine if the given error type is satisfied. A stack-helper is also added for this case. * app: Split inbound/outbound constructors into components (#533) This change does not change any functionality. It only restructures the inbound and outbound proxy modules so that the clients and servers can be instantiated separately. This will support gatewaying requests between the inbound and outbound proxy. * Introduce a gateway between inbound and outbound (#540) When the proxy receives inbound requests without an original dst address (or with a original dst address matching the inbound listener), the proxy currently fails these requests. This change modifies the proxy to attempt to accept these requests and forward them back through the outbound router. The gateway requires that all requests are received over an mTLS-secured connection. It also refines the destination through DNS to determine the canonical-form name as well as an outbound original dst IP. All gatewayed destinations must have a suffix as set by the `LINKERD2_PROXY_INBOUND_GATEWAY_SUFFIXES` environment variable. All requests that do not meet these criteria are failed with a `403 Forbidden` status. * gateway: Add a Forwarded header When the gateway forwards requests, it now adds a `Forwarded` header including the source identity, the local identity, and the destination authority. * Fail requests that loop through the gateway This change uses the gateway's `Forwarded` header to detect if the request has already transited through this gateway. This is determination is made by comparing ID strings, so this will prevent gateway daisy-chaining when clusters do not use distinct identity domains. * gateway: Return errors instead of responses (#547) This ensures that error metrics are recorded and that logging is emitted uniformly. This also ensures that gRPC requests don't get HTTP error responses. * Fail requests that loop through the gateway (#545) This change uses the gateway's `Forwarded` header to detect if the request has already transited through this gateway. This is determination is made by comparing ID strings, so this will prevent gateway daisy-chaining when clusters do not use distinct identity domains. * inbound: Do not cache gateway services (#549) When the inbound caches gateway services, it eagerly obtains an outbound service to cache. If the outbound service employs a traffic split, this inbound service is pinned to a specific leaf, and requests will never be routed to the other leaf. This change moves the gateway fallback to be outside all of the inbound caches, so that outbound splits work as intended.
This change uses the gateway's
Forwardedheader to detect if therequest has already transited through this gateway. This is
determination is made by comparing ID strings, so this will prevent
gateway daisy-chaining when clusters do not use distinct identity
domains.
I've tested this manually: tapping the gateway's client, we see responses like:
{ "source": { "ip": "10.42.0.25", "port": 54748, "metadata": { "control_plane_ns": "linkerd", "deployment": "web", "namespace": "emojivoto", "pod": "web-9c9cc6d7d-9wn9d", "pod_template_hash": "9c9cc6d7d", "serviceaccount": "web", "tls": "loopback" } }, "destination": { "ip": "172.24.0.4", "port": 4143, "metadata": { "namespace": "emojivoto", "remote_cluster": "east", "remote_gateway": "linkerd-gateway", "remote_gateway_namespace": "linkerd-multicluster", "remote_service": "voting-svc", "remote_service_namespace": "emojivoto", "server_id": "linkerd-gateway.linkerd-multicluster.serviceaccount.identity.linkerd.east.k3d.example.com", "service": "voting-svc-east", "tls": "true" } }, "routeMeta": null, "proxyDirection": "OUTBOUND", "responseInitEvent": { "id": { "base": 1, "stream": 23 }, "sinceRequestInit": { "nanos": 1886209 }, "httpStatus": 508, "headers": [ { "name": ":status", "valueStr": "508" }, { "name": "date", "valueStr": "Sat, 30 May 2020 19:02:03 GMT" } ] } }