http: adding per-route internal redirects#5123
http: adding per-route internal redirects#5123alyssawilk merged 20 commits intoenvoyproxy:masterfrom
Conversation
Signed-off-by: Alyssa Wilk <[email protected]>
Signed-off-by: Alyssa Wilk <[email protected]>
Signed-off-by: Alyssa Wilk <[email protected]>
Signed-off-by: Alyssa Wilk <[email protected]>
api/envoy/api/v2/route/route.proto
Outdated
There was a problem hiding this comment.
these need to be swapped IMO. Passthrough is default, reject is option 1.
There was a problem hiding this comment.
+1 for swapping, though per the big convo below I think this is going to get changed anyway.
There was a problem hiding this comment.
Thanks for this nice writeup. Few thoughts on the design:
- Why rely on the upstream being an envoy? why not make this generic such that Envoy will auto follow redirects from any upstream (HTTP 302 has the redirect URL). Typical deployments with Envoys at the edge are more likely to benefit from this option. Nginx has something similar I believe.
- what is the idea behind the x-envoy-original-url header? Put another way, why rely on a header returned by an untrusted upstream envoy (x-envoy-original-url) instead of keeping track of the original url inside the caller Envoy itself? (may be this is what you meant but the docs are not clear).
- The idea of rejecting a redirect and returning 500 seems a bit out of place in this "follow_redirects" option. In my mind,
follow_redirects: on (aka handle) | off (aka passthrough,default)is different from 'on receiving a response code XYZ, send 500 upstream'. The latter is what you are doing with on301,send500. Couldn't this be accomplished with a simpler disallow_redirects option which if true will cause envoy to send 500 automatically.
There was a problem hiding this comment.
-
Any upstream can send this. The header directs Envoy instances to do an internal redirect but there's no documented requirement the originator of the URL is an Envoy instance. I'll try to make that more clear.
-
I think you misread the docs.
Redirects are initiated by an upstream sending the redirect command. Envoy then sends the x-envoy-original-url to the new upstream. No upstream ever sends x-envoy-original-url, and we protect against untrusted downstreams sending it. I'll add a workflow example -
The use case I am most familiar with is sending the first request to an auth server, then redirecting to a content URL to serve the response. In this case it is often the case that is's unsafe to ever return the content URL to the user, as they can use it to bypass auth checks. As such I'm inclined to leave both the defaults and the sanity checks as returning 500. I could add a fail-open option (split handle into handle-or-fail or handle-or-pass-through) but realistically any failed internal redirect indicates a bug on the upstream or configuration and I'm not inclined to add extra complexity to more gracefully handle user failure :-P
There was a problem hiding this comment.
- I was talking about
Redirect handling is triggered by an upstream server sending a 302 response with an
x-envoy-internal-redirect header containing the fully qualified URL (http://host/path) to redirect
to
piece in the docs above. Which seems to indicate that the upstream service needs to be aware of Envoy being the Downstream server instead of using HTTP 302 <url>
- I dont disagree with the use case. What I am stating is that
If the Envoy receiving this response is not configured for internal redirects, or is
explicitly configured to REJECT, the 302 will be converted into a 500 and the
x-envoy-internal-redirect will be removed, to prevent leaking potentially private URLs to untrusted
clients.
for the scenario above, isn't it sufficient to do the following:
allow_redirectswhich if set to true will cause envoy to accept and forward redirect responses to downstream. If set to false, will cause envoy to return a 500 to downstream without revealing any info about the redirect URL sent by the upstream.follow_redirects: on|off. If allow_redirects is set to true and follow_redirects is on, then you follow any 302 response from upstream. If allow_redirects is true and follow_redirects is off, then you forward the 302 response as is to downstream. if allow_redirects is false, then follow_redirects setting will be ignored. Any 302 response from upstream will be turned into a HTTP 500 response (after scrubbing). It might be worthwhile to do (in follow ups) afollow_redirect_depth: Nwhich controls the number of times we are willing to follow redirects (to avoid infinite loops, etc.).
There was a problem hiding this comment.
ahhh, that totally makes more sense, my bad. So you'd prefer the originating proxy not need to be Envoy-aware, and Envoy intercept all 302s and use the location header? That's doable and I'm interested to hear if folks think it's a more compelling use case. My concern is twofold. first, intercepting redirects in general is dangerous: often if you're doing cross-domain redirects you're not going to get the correct cookies, or may forward cookies or other PII from one server which should have them to another which is not. Between that and URL leakage concerns I'd rather have the upstream do 302 capturing which is opt-out (you have to do something special to get this behavior) and fail-closed (you don't leak urls by accident) rather than this being something that people turn up without a lot of thought in the hopes of getting a latency boost skipping a client RTT.
We can almost achieve both both with an x-envoy-internal-redirect-fail-close header where default IR behavior was what you're asking for (intercepting 302), we pass-through by default or if #redirects > N (where N is 1 for now but we can always extend in a follow-up), but then the use case I care about doesn't work for multi-level proxying (if pass-through is default we can't allow an L2 envoy to pass a fail-closed URL to an L1 envoy without a configured option PASS_THROUGH_EVEN_FOR_FAIL_CLOSE which feels needlessly complicated)
Any thoughts on how to handle both cases gracefully?
We definitely want to be able to mix regular 302s and IR-302s on the same route so we really can't have the auto-redirect behavior. the best I can think of is to have explicit
handle / pass through for regular 302s and
handle / pass-through / reject for IR-302s
We could either have 2 full set of config, or handle the likely common permutations:
PASS_ALL_THROUGH (302 proxied, 302 IR proxied) <- allow handling IR at L1
HANDLE_EXPLICIT (302 proxied, 302-IR handled) <- what we mainly want
HANDLE_ALL (302 handled, 302-IR handled) <- what you're requesting?
REJECT_EXPLICIT (302 proxied, 302 IR rejected) <- default
It's more complicated than I'd like but I think it gets us both what we want, and with a docced up chart of what happens in what case I think it's doable.
There was a problem hiding this comment.
I'm inclined to agree with @rshriram that offering opt-in 302 following is going to be a compelling feature. @alyssawilk the reason you need to differentiate "IR" from "regular" is really because you share a route in which you want to do both? That makes sense to me. I think I would be inclined to just have 2 different config options for "regular" and for header driven "IR" with clear docs. I think this would be the easiest to understand. With that said, I don't feel very strongly about single config vs. split. @rshriram thoughts?
There was a problem hiding this comment.
between poor GH formatting and terminology mismatch, i am having a hard time decoding @alyssawilk 's use case for the multi-level envoy stuff.
Between that and URL leakage concerns I'd rather have the upstream do 302 capturing which is opt-out (you have to do something special to get this behavior) and fail-closed (you don't leak urls by accident) rather than this being something that people turn up without a lot of thought in the hopes of getting a latency boost skipping a client RTT.
We can almost achieve both both with an x-envoy-internal-redirect-fail-close header where default IR behavior was what you're asking for (intercepting 302), we pass-through by default or if #redirects > N (where N is 1 for now but we can always extend in a follow-up), but then the use case I care about doesn't work for multi-level proxying (if pass-through is default we can't allow an L2 envoy to pass a fail-closed URL to an L1 envoy without a configured option PASS_THROUGH_EVEN_FOR_FAIL_CLOSE which feels needlessly complicated)
Any thoughts on how to handle both cases gracefully?
Would it be possible to scribble a diagram on paper napkin and stick it in slack so that its easier to visualize the actors involved?
There was a problem hiding this comment.
I think Matt's got it, and I agree with him that 2 configs is going to be less confusing than one.
I'll put together the change Matt and I thinks makes sense along with docs, and ask @rshriram to vet that they're substantially more clear than my github comments :-D
There was a problem hiding this comment.
Ok, draft version up - docs are not cleaned up and now has E_INSUFFICIENT_INTEGRATION_TESTS too, but I'm hoping this is enough to clarify a bit for @rshriram . WDYT?
Signed-off-by: Alyssa Wilk <[email protected]>
f355f7e to
806cf09
Compare
Signed-off-by: Alyssa Wilk <[email protected]>
Signed-off-by: Alyssa Wilk <[email protected]>
Signed-off-by: Alyssa Wilk <[email protected]>
mattklein123
left a comment
There was a problem hiding this comment.
Just looking at the docs/config at a high level this LGTM modulo some small comments. @rshriram can you take a look at this at a high level so we can finalize the design before spending time on the real code review?
api/envoy/api/v2/route/route.proto
Outdated
| HANDLE_REDIRECT = 1; | ||
| } | ||
| // TODO(alyssawilk) it's really annoying to have a completely seprate enum | ||
| // because proto3 doesn't have defaults. Check with htuch@ for better |
There was a problem hiding this comment.
I think maybe you could use an EnumValue (?) WKT and then have different defaults?
api/envoy/api/v2/route/route.proto
Outdated
|
|
||
| // Configures :ref:`internal redirect <arch_overview_internal_redirects>` behavior. | ||
| enum RedirectAction { | ||
| PASS_THROUGH_REDIRECT = 0; |
There was a problem hiding this comment.
Do you mind adding documentation for all the enum values so it's easier to review the options inline?
| returning the redirected response as the response of the original request. | ||
|
|
||
| There are two modes of redirect handling, "Internal redirects", for internal or private URLs, | ||
| general redirects, which capture all 302s. They are configured separately via the |
| <envoy_api_field_route.RouteAction.redirect_action>` fields in | ||
| route configuration respectively. | ||
|
|
||
| Internal redirect handling is triggered via the x-envoy-internal-redirect header. Any response |
There was a problem hiding this comment.
In the final change can you add x-envoy-internal-redirect to the HCM headers doc and cross link?
| <envoy_api_field_route.RouteAction.redirect_action>` fields in | ||
| route configuration respectively. | ||
|
|
||
| Internal redirect handling is triggered via the x-envoy-internal-redirect header. Any response |
There was a problem hiding this comment.
I think it might help to define somewhere in this doc a use case for an "internal redirect." This might help readers that might be confused as to why anyone would do this?
| 2. Have a *location* header with a valid, fully qualified URL matching the scheme of the original request. | ||
| 3. The request must have been fully processed by Envoy. | ||
| 4. The request must not have a body | ||
| 5. The request must have not been previously redirected, as determined by the presence of an x-envoy-original-url header |
There was a problem hiding this comment.
doesn't this mandate that the downstream has to be an Envoy? which won't work for the common case.
|
So this (the current state) is what I was talking about in my earlier comment. Why have to behaviors for internal and external redirect? Why wont the following work for both: The flow is something like this:
This will work with 2 levels of envoy as well, with the level-2 envoys configured to "forward" redirects (i.e. default) while level-1 envoys can be configured to abort. What am I missing in the Internal redirect use case that is not captured by the above example? |
|
Two issues with that |
|
Alternately given the fail-closed behavior seems to be confusing, Google can very easily take the code as written and just have an internal filter which does fail-close redirects using the same recreateStream added here. Our internal filter can fail-closed and it can even reuse the existing x-google header control we have today. If anyone else ever needs fail-close functionality or we decide we want it upstream we can always add it later. I'm going to go through and remove the internal redirect path when I get a chance because I think it'll neatly resolve the confusion and debate, and get faster progress on today's use case, which doesn't even require upstream cooperation (the istio folks want to do their redirects in a filter too =P) |
Signed-off-by: Alyssa Wilk <[email protected]>
4785164 to
607bd0b
Compare
Signed-off-by: Alyssa Wilk <[email protected]>
607bd0b to
fb61e6f
Compare
|
This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
|
@alyssawilk LMK when this is ready for review again. /wait |
|
Matt: has been ready for a week, I just thought you were swamped at KubeCon :-P PTAL at your convenience. I'll assume you have some comments so I'll deal with merge conflict once you take a look |
mattklein123
left a comment
There was a problem hiding this comment.
Thanks, LGTM. Much clearer to me now with the extra comments, etc. Just some small things.
/wait
| and then shipped upstream with all of the normal Envoy request sanitization taking place. Note that | ||
| HTTP connection manager sanitization such as clearing untrusted headers will only be applied once, | ||
| per-route header modifications will be applied on both the original route and the second route, even | ||
| if they are the same, so be mindful configuring header modification rules to avoid duplicating |
There was a problem hiding this comment.
Ping on this. Can you move this into a warning/attention block?
| // n.b. we do not currently change the codecs to point at the new stream | ||
| // decoder because the decoder callbacks are complete. It would be good to | ||
| // null out that pointer but should not be necessary. | ||
| HeaderMapPtr request_headers(parent_.request_headers_.release()); |
There was a problem hiding this comment.
ping on this nit, or just resolve if you don't want to
| request_headers.removeEnvoyExpectedRequestTimeoutMs(); | ||
| request_headers.removeEnvoyForceTrace(); | ||
| request_headers.removeEnvoyIpTags(); | ||
| request_headers.removeEnvoyOriginalUrl(); |
There was a problem hiding this comment.
Can you add a sanitizing test for this in the unit tests for this file? I think there is already a test that covers sanitizing.
| namespace Utility { | ||
|
|
||
| /** | ||
| * Given a fully qualified URL, splits the string_view provided into scheme, |
There was a problem hiding this comment.
Worth having some specific unit tests for this new class?
Signed-off-by: Alyssa Wilk <[email protected]>
|
/wait Sorry, I somehow missed this PR before, but I definitely want to review this feature... I'll try to get to it later today. |
mattklein123
left a comment
There was a problem hiding this comment.
Nice. LGTM pending any @PiotrSikora comments and one small doc nit.
|
|
||
| .. warning:: | ||
| Note that HTTP connection manager sanitization such as clearing untrusted headers will only be | ||
| applied once, per-route header modifications will be applied on both the original route and the |
There was a problem hiding this comment.
nit: "once. Per-route..." (I think new sentence reads more clearly)
There was a problem hiding this comment.
Thanks for working on this, @alyssawilk!
Unfortunately, there are a few issues at the high level that don't make this usable in the real world, especially when dealing with generic HTTP traffic, where proxy configurations aren't tightly coupled with upstream applications.
Notably:
-
Intercepting all-or-none of the 302 responses means that applications cannot mix even simple things as end-user redirects to
/loginfor unauthorized users with internal redirects to offload content serving on the same route. Similarly, any redirects to locations not served by the same proxy are going to result in errors. There has to be a way for applications to request internal redirect. -
Applications have no way of knowing whether this feature is enabled or not, which requires very tight coordination between proxies and applications, and otherwise may result in leakage of secret URLs and/or credentials. Yes, this means that we need some Envoy-specific headers (feel free to blame me for this, since I was supposed to standardize this years ago), and applications need to be aware of this feature, but we cannot blindly intercept all redirects anyway (see above).
Sadly, it seems that those issues were introduced during the code review process, and initial PR was in a much better shape. Sorry for being late to the party!
Happy to talk more about this over Hangouts.
|
|
||
| Envoy supports handling 302 redirects internally, that is capturing a 302 redirect response, | ||
| synthesizing a new request, sending it to the upstream specified by the new route match, and | ||
| returning the redirected response as the response of the original request. |
There was a problem hiding this comment.
Nit: response to the original request.
| route configuration. When redirect handling is on, any 302 response from upstream is | ||
| subject to the redirect being handled by Envoy. | ||
|
|
||
| For a redirect to be handled successfully it must pass the following checks |
There was a problem hiding this comment.
Nit: missing ":" here and "." at the end of few sentences in the list.
| Any failure will result in redirect being passed downstream instead. | ||
|
|
||
| Once the redirect has passed these checks, the request headers which were shipped to the original | ||
| upstream will be modified by |
There was a problem hiding this comment.
Nit: missing ":" here and "." at the end of the sentences in the list.
source/common/router/router.cc
Outdated
|
|
||
| // As with setupRetry, redirects are not supported for streaming requests yet. | ||
| if (downstream_end_stream_ && | ||
| !callbacks_->decodingBuffer() && // Redirects woth body not yet supported. |
There was a problem hiding this comment.
s/woth body/with request body/
| attempting_internal_redirect_with_complete_stream_ = | ||
| upstream_request_->stream_info_.lastUpstreamRxByteReceived() && downstream_end_stream_; | ||
|
|
||
| // As with setupRetry, redirects are not supported for streaming requests yet. |
There was a problem hiding this comment.
Do you want to restrict this to GET and HEAD requests? If not, then we should convert other requests to GET in convertRequestHeadersForInternalRedirect().
There was a problem hiding this comment.
Nope - I want to add support for bodies, though in a follow-up (or series of follow-up) PRs.
There was a problem hiding this comment.
302 redirects are historically turned into GET requests by the eyeballs and intercepting proxies, so the target of the redirection might simply not support the same method as the original request. I'd suggest converting redirects into GET (and leaving HEAD as-is) for 302 and 303 redirects, and keeping original method for 307 redirects.
As for supporting request bodies, I'm all for it (though it adds non-trivial amount of complexity), but if they are not supported as of this PR, then I'd still restrict internal redirects to GET and HEAD methods in this PR, and lift that restriction in one of the follow-up PRs that adds support for request bodies.
|
@PiotrSikora it sounds like you want what I originally wanted, which is trusted-upstream driven redirects. I still want that feature but I don't need it today and don't insist on it being introduced in upstream Envoy (though I'm happy to do so at some point). The existing redirect feature is a latency-add on which I (and Matt, and Shriram) also believe has value. Unless you're strongly convinced that allowing per-route automatic redirects has no value I'd say we land this as its own feature, and can always land a upstream-driven redirect feature using the same code in future. |
Signed-off-by: Alyssa Wilk <[email protected]>
mattklein123
left a comment
There was a problem hiding this comment.
LGTM pending discussion with @PiotrSikora on general usefulness.
PiotrSikora
left a comment
There was a problem hiding this comment.
@alyssawilk yes, we definitely want the same thing :)
Regarding this PR, IMHO the issues I've mentioned yesterday are problematic enough (especially the first one), that this feature as-is is not going to be very useful in practice, at least not when Envoy is acting as a reverse proxy... I guess there might be some value when Envoy is acting as a forward proxy next to the client, to follow redirects on its behalf, but I'm not sure if there is a need for that, really.
Having said that, if you have a valid use case for which this works, then by all means please go ahead and land this feature, I didn't mean to veto it.
| attempting_internal_redirect_with_complete_stream_ = | ||
| upstream_request_->stream_info_.lastUpstreamRxByteReceived() && downstream_end_stream_; | ||
|
|
||
| // As with setupRetry, redirects are not supported for streaming requests yet. |
There was a problem hiding this comment.
302 redirects are historically turned into GET requests by the eyeballs and intercepting proxies, so the target of the redirection might simply not support the same method as the original request. I'd suggest converting redirects into GET (and leaving HEAD as-is) for 302 and 303 redirects, and keeping original method for 307 redirects.
As for supporting request bodies, I'm all for it (though it adds non-trivial amount of complexity), but if they are not supported as of this PR, then I'd still restrict internal redirects to GET and HEAD methods in this PR, and lift that restriction in one of the follow-up PRs that adds support for request bodies.
|
So I agree that to turn this on there has to be a tight knit integration between the AFE and the proxy. We absolutely could use this at the GFE for cookiesless services (though not for many other existing IR cases). Given we already have three envoy developers from three different companies who think this will be useful I'm going to claim we're good on that front, despite your reservations. I suspect we will add the IRs you and I want later but I'm more interested in getting this code landed than working out the details of a feature my team doesn't need for 2 years (I'm adding this today for another google team who wants filter-driven redirects now) To your other point, given this is done for latency reduction, I would strongly prefer to not add extra code to avoid bodyless-posts - it adds more complexity and more corner cases for a feature which can be used today and should be extended soon. Any other concerns or am I good to go with Matt's approval (and another master merge for conflicts)? |
|
No other concerns, you're good to go. Thanks! |
Signed-off-by: Alyssa Wilk <[email protected]>
01a8405 to
c59c96c
Compare
Signed-off-by: Alyssa Wilk <[email protected]>
8129645 to
a478bbf
Compare
Signed-off-by: Alyssa Wilk <[email protected]>
|
Hey @mattklein123 I think we're good - can I get a final stamp for the master merge? |
Allows both upstream-driven and filter-controlled internal redirects, basically rerunning the whole filter chain for a new stream. The current implementation is limited to requests-sans-bodies and complete-requests, and num-redirects = 1, but could be fairly easily extended (probably in a follow-up) to remove any of these restrictions. I do need to add more unit tests here, but I want to make sure we're happy both the validation we're doing and where we do it. For example while this implementation forces N=1 for upstream internal redirects it allows filters to impose their own separate limits and allows them to screw up w.r.t. redirect loops. We could globally enforce by disallowing recreateStream if is_internally_created_ true but I could imagine wanting different limits for a filter redirect than an external redirect so am mildly inclined to allow filters to enforce on their own with internal checks as the router filter does. TODO(alyssawilk) in a follow-up before killing off the initial stream, pass it the original StreamInfo and copy relevant fields (downstream timing info etc.) Risk Level: Medium (some refactors of existing code, new high risk code paths well guarded) Testing: E2E tests. E_INSUFFICIENT_UNIT_TESTS Docs Changes: inline Release Notes: yep. Part of envoyproxy#3250 Signed-off-by: Alyssa Wilk <[email protected]> Signed-off-by: Fred Douglas <[email protected]>
Allows both upstream-driven and filter-controlled internal redirects, basically rerunning the whole filter chain for a new stream.
The current implementation is limited to requests-sans-bodies and complete-requests, and num-redirects = 1, but could be fairly easily extended (probably in a follow-up) to remove any of these restrictions.
I do need to add more unit tests here, but I want to make sure we're happy both the validation we're doing and where we do it. For example while this implementation forces N=1 for upstream internal redirects it allows filters to impose their own separate limits and allows them to screw up w.r.t. redirect loops. We could globally enforce by disallowing recreateStream if is_internally_created_ true but I could imagine wanting different limits for a filter redirect than an external redirect so am mildly inclined to allow filters to enforce on their own with internal checks as the router filter does.
TODO(alyssawilk) in a follow-up before killing off the initial stream, pass it the original StreamInfo and copy relevant fields (downstream timing info etc.)
Risk Level: Medium (some refactors of existing code, new high risk code paths well guarded)
Testing: E2E tests. E_INSUFFICIENT_UNIT_TESTS
Docs Changes: inline
Release Notes: yep.
Part of #3250