Skip to content

http: setting :scheme consistently in Envoy#14899

Merged
alyssawilk merged 10 commits intoenvoyproxy:mainfrom
alyssawilk:scheme_alternate
Feb 24, 2021
Merged

http: setting :scheme consistently in Envoy#14899
alyssawilk merged 10 commits intoenvoyproxy:mainfrom
alyssawilk:scheme_alternate

Conversation

@alyssawilk
Copy link
Copy Markdown
Contributor

Commit Message: Setting :scheme for headers in-Envoy.
Additional Description: This affects a number of components which do :scheme (rather than X-Forwarded-Proto) based logic for HTTP/1.1 traffic, up to but not exclusively wasm/lua filters looking for :scheme, gRPC access loggers, CSRF filter and oath2 filter
Risk Level: High
Testing: new unit tests, integration tests
Release Notes: inline
Runtime guard: envoy.reloadable_features.add_and_validate_scheme_header
Part 1 of #14587

@alyssawilk
Copy link
Copy Markdown
Contributor Author

So I was in the middle of adding the below comment and it occurred to me I have no interest in plumbing the actual config knobs any time soon. I'm just going to remove it as all I really want is a way to indicate "use HTTP3 upstream" and we can sort out config knobs later when we have a working implementation.

Config comment saved for future me :-P

// Http2Protocol options is used for shared configuration between HTTP/2 and
// HTTP/3, or where the best analog is.
//
// So when used here, any custom Settings should be HTTP/3 SETTINGS or
// extensions as defined in
// https://tools.ietf.org/html/draft-ietf-quic-http-29 or above,
//
// initial_stream_window_size in HTTP/2 translates to
// initial_max_stream_data_bidi_[local|remote], etc.
Http2ProtocolOptions http2_protocol_options = 1;

Signed-off-by: Alyssa Wilk <[email protected]>
Copy link
Copy Markdown
Contributor

@antoniovicente antoniovicente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvements to :scheme handling.

default value was deprecated in version 1.14.0. This only changes the behavior when the value is not set. The previous behavior can be used by setting the value
to true. This behavior change by be overridden by setting runtime feature `envoy.deprecated_features.grpc_stats_filter_enable_stats_for_all_methods_by_default`.
* grpc_stats: the default value for :ref:`stats_for_all_methods <envoy_v3_api_field_extensions.filters.http.grpc_stats.v3.FilterConfig.stats_for_all_methods>` is switched from true to false, in order to avoid possible memory exhaustion due to an untrusted downstream sending a large number of unique method names. The previous default value was deprecated in version 1.14.0. This only changes the behavior when the value is not set. The previous behavior can be used by setting the value to true. This behavior change by be overridden by setting runtime feature `envoy.deprecated_features.grpc_stats_filter_enable_stats_for_all_methods_by_default`.
* http: resolving inconsistencies between :scheme and X-Forwarded-Proto. :scheme will now be set for all HTTP/1.1 requests. Thchanges the behavior of the gRPC access logger, Wasm filters, CSRF filter and oath2 filter for HTTP/1 traffic, where :scheme was previously not set. This change also validates that for front-line Envoys (Envoys configured with `xff_num_trusted_hops` set to 0) that https schemed requests can not be sent over non-TLS connections. All behavioral changes listed here can be temporarily reverted by setting `envoy.reloadable_features.add_and_validate_scheme_header` to false.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling: Thchanges -> This changes

// Envoys, where :scheme is set based on transport security.
if (!request_headers.Scheme() &&
Runtime::runtimeFeatureEnabled("envoy.reloadable_features.add_and_validate_scheme_header")) {
request_headers.setReferenceScheme(request_headers.getForwardedProtoValue());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe for scheme to keep a reference to getForwardedProtoValue? The forwarded proto value may not be a reference. If forwarded proto is held by value and someone were to remove the forwarded proto header from the header map, scheme would point to a header value that is already freed.

Not sure if the sequence I describe is possible by configuring Envoy to remove X-Forwarded-Proto before forwarding the request.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I think this should probably be a copy.

request_headers.setReferenceForwardedProto(connection.ssl() ? Headers::get().SchemeValues.Https
: Headers::get().SchemeValues.Http);
}
// If :scheme is not set, sets :scheme based on X-Forwarded-Proto.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does envoy do if ForwardedProto != "http" and != "https" ?

// forward the received Host field-value.
headers.setHost(absolute_url.hostAndPort());
// Add the scheme. This will be validated in the HCM to ensure no https://
// requests are accepted over unencrypted connections by front-line Envoys.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect fully qualified URLs when xff_num_trusted_hops != 0?

The scheme validation in HCM is only done if xff_num_trusted_hops == 0

Also, it seems that the scheme in the fully qualified URL overwrites the scheme in the X-forwarded-proto when that header is also present.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect fully qualified URLs when xff_num_trusted_hops != 0?

I don't think so. I think if this is not an absolute URL we need to synthesize scheme depending on whether the underlying transport is TLS or not?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Envoy never sends fully qualified urls, so in an envoy mesh no, but if num_trusted_hops > 0 and your trusted proxy is sending fully qualified
scheme_one://host urls with x-forwarded-proto scheme_two
that's kind of on you. Right now scheme and x-forwarded proto are known to be inconsistent so we can not (yet) validate and reject if they're not the same.

I'm was the HCM set scheme when it's not present because we'd talked about moving more sanitization.
If scheme is not present, we want to set based on encryption or XFP, depending on if this is a frontline Envoy, and I don't think that logic belongs in the codec even if it's a problem specific to HTTP/1.1

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I see, so here we only set scheme in the absolute URL case, because otherwise we rely on the HCM to set XFP and then from that set scheme? Can you add some more comments?

sendRawHttpAndWaitForResponse(lookupPort("http"),
"GET https://www.redirect.com HTTP/1.1\r\nHost: host\r\n\r\n",
&response, true);
EXPECT_FALSE(response.find("HTTP/1.1 302\r\n") == 0);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest using: EXPECT_THAT(response, StartsWith("..."));

The EXPECT_FALSE here caught my eye. I think that the setup for this test case may be the same as AbsolutePathUsingHttpsDisallowedAtFrontline above. Should the two tests be different?

sendRawHttpAndWaitForResponse(lookupPort("http"),
"GET https://www.redirect.com HTTP/1.1\r\nHost: host\r\n\r\n",
&response, true);
EXPECT_TRUE(response.find("HTTP/1.1 403 Forbidden\r\n") == 0);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EXPECT_THAT(response, StartsWith("HTTP/1.1 403 Forbidden\r\n"));

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice thanks for working on this. Just to check my understanding, the upstream scheme setting is still around, and will overwrite whatever this does, so this is more about getting consistency for internal filters as step 1, right?

default value was deprecated in version 1.14.0. This only changes the behavior when the value is not set. The previous behavior can be used by setting the value
to true. This behavior change by be overridden by setting runtime feature `envoy.deprecated_features.grpc_stats_filter_enable_stats_for_all_methods_by_default`.
* grpc_stats: the default value for :ref:`stats_for_all_methods <envoy_v3_api_field_extensions.filters.http.grpc_stats.v3.FilterConfig.stats_for_all_methods>` is switched from true to false, in order to avoid possible memory exhaustion due to an untrusted downstream sending a large number of unique method names. The previous default value was deprecated in version 1.14.0. This only changes the behavior when the value is not set. The previous behavior can be used by setting the value to true. This behavior change by be overridden by setting runtime feature `envoy.deprecated_features.grpc_stats_filter_enable_stats_for_all_methods_by_default`.
* http: resolving inconsistencies between :scheme and X-Forwarded-Proto. :scheme will now be set for all HTTP/1.1 requests. Thchanges the behavior of the gRPC access logger, Wasm filters, CSRF filter and oath2 filter for HTTP/1 traffic, where :scheme was previously not set. This change also validates that for front-line Envoys (Envoys configured with `xff_num_trusted_hops` set to 0) that https schemed requests can not be sent over non-TLS connections. All behavioral changes listed here can be temporarily reverted by setting `envoy.reloadable_features.add_and_validate_scheme_header` to false.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When reading just the docs, I found "https schemed requests can not be sent over non-TLS connections" a bit confusing. Perhaps rephrase to make it more clear this is on the downstream/listener side? It sounds kind of like it applies on the upstream/proxy side. The implementation makes sense to me but I'm worried users won't fully understand what is being changed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In thinking about this more, I'm not sure we can do the scheme blocking right now. Consider encrypted traffic being forwarded to an unencrypted backend. With the upstream scheme setting still in place, that would break. Right? I think we need to do this type of blocking at the very end of the migration and then think carefully about whether it should be a config flag or not?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll only break if their upstream thinks it's an L1.
Basically as written today, this will reject https-over-http at the edge, where it should, and only affects
http2 in the clear (who does that?)
http absolute urls with htttps:// over http, which were previously downgraded to http and served the wrong content.
at non-edge, it'll break if your Envoy is misconfigured, to look like an edge Envoy and you're sending https traffic in the clear.

If we want to allow previously supported broken config, because hey, it worked, I can limit this to be an HTTP/1.1 check for frontline Envoys, at which point it will only affect HTTP/1.1 with absolute URLs. I assume you'd prefer that? I could also reinstate prior behavior where https://foo.bar/thing gets silently downgraded to http://foo.bar/thing though I think that's a bug worth fixing and I think fixing it flag guarded is the way to go. WDYT?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and only affects http2 in the clear (who does that?)

For mesh cases I think this happens all the time. Per below, I think I don't fully understand the desired state and what we do or don't do today. If this is only edge traffic I agree this makes sense, but I'm concerned that people may have misconfigured internal proxies that might have issues? That's my main concern, so I would like to better understand what we are doing now and want to do.

// Envoys, where :scheme is set based on transport security.
if (!request_headers.Scheme() &&
Runtime::runtimeFeatureEnabled("envoy.reloadable_features.add_and_validate_scheme_header")) {
request_headers.setReferenceScheme(request_headers.getForwardedProtoValue());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I think this should probably be a copy.

uint32_t xff_num_trusted_hops, bool connection_is_ssl) {
if (xff_num_trusted_hops == 0 &&
Runtime::runtimeFeatureEnabled("envoy.reloadable_features.add_and_validate_scheme_header") &&
!connection_is_ssl &&
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perf nit, I would put !connection_is_ssl before the runtime lookup to short circuit faster and avoid the runtime lookup when not needed.

absl::optional<std::pair<std::reference_wrapper<const absl::string_view>, Http::Code>>
HeaderUtility::requestHeadersValidAtHcm(const RequestHeaderMap& headers,
uint32_t xff_num_trusted_hops, bool connection_is_ssl) {
if (xff_num_trusted_hops == 0 &&
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add more comments about the xff_num_trusted_hops check? Every time I wade back into this code I get pretty confused and it would be good to leave some crumbs for the next person.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per some other comment threads, I'm pretty scared of breaking something with this new check. I'm almost definitely not fully understanding, but perhaps in the extra comments can you explain why this is safe and shouldn't break anyone today? Especially with how this check interacts with the fact that we still set upstream scheme?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a giant 15 line comment going through all the cases, but let's sort out if we'd

  1. prefer to consistently validate non https-over-plaintext at the risk of breaking folks with previously working bad configs
  2. not consistently validate, but make sure we validate the new case of using scheme of absolute URLs.
    first :-)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you spell out (1) more? I think I'm just not understanding the desired behavior and what we do today. I know for sure that people forward HTTPS incoming traffic to upstream HTTP, and I want to make sure we don't break that?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back and refreshed my memory on all the xff settings, and I'm pretty sure we will break people if we don't also check that use remote address is true? Consider the case where an L1 forwards directly to an internal mesh over plaintext. Wouldn't that break unless we check if using remote address? Sorry for all the questions but I want to make sure I'm fully understanding this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so yeah, I could check use_remote and XFP. Assuming we do that a way we're happy with if you look at the grid of

  1. edge envoys
  2. properly configured non-edge (trust XFP/XFF)
  3. badly configured non-edge (don't trust XFP/XFF)
    this PR as is would validate properly for 1, be a no-op for 2, and break 3 for https-in-plaintext.

We can instead move the checks to the codec, at which point we'd validate https://absolute/url-over-plaintext at edge Envoys, allow https schemed http2 in plaintext at edge envoys (not a case I'm terribly worried about) and allow the badly configured envoys (3) to continue to be badly configured without breaking them.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are talking past each other, or I'm very confused (quite likely). Let me be really specific with the case I am concerned about:

  1. Edge Envoy configured with use_remote_address = true, num_trusted_hop = 0, terminating TLS. In this case we don't trust XFF/XFP, set them, and with this change also either set scheme = XFP or verify it is not HTTPS over plaintext. So far so good. This makes sense.
  2. Envoy 1 forwards to a mesh Envoy over plain text (either just because, or because using IPSEC, or whatever - I think this is a perfectly valid use case). Mesh Envoy is configured with use_remote_address = false, num_trusted_hop = 0. Therefore it gets the address from XFF and respects XFP. Scheme in this case would be "https" in the common case.

With this change, wouldn't we now fail these requests which I think are legitimate? This is why I think this check needs to be gated on use_remote_address = true? Please tell me what I'm missing here. :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we are talking past each other :-)
I agree moving to use_remote_address makes sense and I'm happy to do that.
I'm also trying to determine if we want to lower even that risk by moving these checks to the HTTP/1 codec, since the absolute URL https://-over-http use case is the only security risk opened by this pr. I'll switch to use_remote once I know if I'm doing an in-place switch or moving-to-codec switch!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah +1 to just doing the most limited check possible. I think we could consider a more strict check later once the rest of the work is done?

// forward the received Host field-value.
headers.setHost(absolute_url.hostAndPort());
// Add the scheme. This will be validated in the HCM to ensure no https://
// requests are accepted over unencrypted connections by front-line Envoys.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect fully qualified URLs when xff_num_trusted_hops != 0?

I don't think so. I think if this is not an absolute URL we need to synthesize scheme depending on whether the underlying transport is TLS or not?

@mattklein123 mattklein123 self-assigned this Feb 2, 2021
Signed-off-by: Alyssa Wilk <[email protected]>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating. Few more comments and some existing ones. Thanks for bearing with all my questions!

/wait

default value was deprecated in version 1.14.0. This only changes the behavior when the value is not set. The previous behavior can be used by setting the value
to true. This behavior change by be overridden by setting runtime feature `envoy.deprecated_features.grpc_stats_filter_enable_stats_for_all_methods_by_default`.
* grpc_stats: the default value for :ref:`stats_for_all_methods <envoy_v3_api_field_extensions.filters.http.grpc_stats.v3.FilterConfig.stats_for_all_methods>` is switched from true to false, in order to avoid possible memory exhaustion due to an untrusted downstream sending a large number of unique method names. The previous default value was deprecated in version 1.14.0. This only changes the behavior when the value is not set. The previous behavior can be used by setting the value to true. This behavior change by be overridden by setting runtime feature `envoy.deprecated_features.grpc_stats_filter_enable_stats_for_all_methods_by_default`.
* http: resolving inconsistencies between :scheme and X-Forwarded-Proto. :scheme will now be set for all HTTP/1.1 requests. This changes the behavior of the gRPC access logger, Wasm filters, CSRF filter and oath2 filter for HTTP/1 traffic, where :scheme was previously not set. This change also validates that for front-line Envoys (Envoys configured with `xff_num_trusted_hops` set to 0) that https schemed requests can not be sent over non-TLS connections. All behavioral changes listed here can be temporarily reverted by setting `envoy.reloadable_features.add_and_validate_scheme_header` to false.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you ref link xff_num_trusted_hops and maybe also link to https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_conn_man/headers#config-http-conn-man-headers-x-forwarded-for since this subject is so confusing (IMO)?

Also, my comment got dropped about "sent over non-TLS connections" being a bit confusing IMO. Can you clarify this is a downstream/listener check?

namespace {

absl::string_view getScheme(absl::string_view forwarded_proto, bool is_ssl) {
if (!forwarded_proto.empty() && (forwarded_proto == Headers::get().SchemeValues.Https ||
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think forwarded proto must not be empty at this point so that empty check can be dropped? It's set in the line before this function is called? (Even if it's somehow set to the empty string I think it's safe to drop this check)

// forward the received Host field-value.
headers.setHost(absolute_url.hostAndPort());
// Add the scheme. This will be validated in the HCM to ensure no https://
// requests are accepted over unencrypted connections by front-line Envoys.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I see, so here we only set scheme in the absolute URL case, because otherwise we rely on the HCM to set XFP and then from that set scheme? Can you add some more comments?

absl::optional<std::pair<std::reference_wrapper<const absl::string_view>, Http::Code>>
HeaderUtility::requestHeadersValidAtHcm(const RequestHeaderMap& headers,
uint32_t xff_num_trusted_hops, bool connection_is_ssl) {
if (xff_num_trusted_hops == 0 &&
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per some other comment threads, I'm pretty scared of breaking something with this new check. I'm almost definitely not fully understanding, but perhaps in the extra comments can you explain why this is safe and shouldn't break anyone today? Especially with how this check interacts with the fact that we still set upstream scheme?

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating on this. This change makes me feel much more comfortable. I think we can potentially do the more aggressive block in the HCM later but I think better to start with this.

/wait

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be updated now with the more limited check?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update comment now that the validation is here and limited to H1?

Signed-off-by: Alyssa Wilk <[email protected]>
@alyssawilk
Copy link
Copy Markdown
Contributor Author

(sorry for force-push: merged from non-clean main)

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the iteration. This LGTM. I would definitely like either @antoniovicente or @yanavlasov to take another pass. Thank you!

@mattklein123
Copy link
Copy Markdown
Member

@alyssawilk also just remembered, is there anything in #14867 (comment) that you want to fix in this PR?

@alyssawilk
Copy link
Copy Markdown
Contributor Author

I had planned on waiting until the next PR. scheme is set, but it's still not "correct" for L2 Envoys so shouldn't be used.

@mattklein123
Copy link
Copy Markdown
Member

I had planned on waiting until the next PR. scheme is set, but it's still not "correct" for L2 Envoys so shouldn't be used.

OK as long as you are tracking SGTM. I just wanted to make sure we don't have stale/broken docs that need fixing!

Copy link
Copy Markdown
Contributor

@antoniovicente antoniovicente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for the change.

@alyssawilk alyssawilk merged commit d1a28c4 into envoyproxy:main Feb 24, 2021
@alyssawilk alyssawilk deleted the scheme_alternate branch August 18, 2021 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants