Skip to content

Commit b41ba59

Browse files
authored
upstream: add transport socket failure reason to stream info and log (#6018)
*Description*: Fixes #5603 *Risk Level*: Low (not changing flow, add more info) *Testing*: unit test, *Docs Changes*: Added *Release Notes*: Added Signed-off-by: Lizan Zhou <[email protected]>
1 parent 083e6df commit b41ba59

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

60 files changed

+328
-98
lines changed

api/envoy/data/accesslog/v2/accesslog.proto

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -135,6 +135,12 @@ message AccessLogCommon {
135135
// that ID in this field and cross reference later. It can also be used to
136136
// determine if a canary endpoint was used or not.
137137
envoy.api.v2.core.Metadata metadata = 17;
138+
139+
// If upstream connection failed due to transport socket (e.g. TLS handshake), provides the
140+
// failure reason from the transport socket. The format of this field depends on the configured
141+
// upstream transport socket. Common TLS failures are in
142+
// :ref:`TLS trouble shooting <arch_overview_ssl_trouble_shooting>`.
143+
string upstream_transport_failure_reason = 18;
138144
}
139145

140146
// Flags indicating occurrences during request/response processing.

docs/root/configuration/access_log.rst

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ Format Strings
2828
--------------
2929

3030
Format strings are plain strings, specified using the ``format`` key. They may contain
31-
either command operators or other characters interpreted as a plain string.
31+
either command operators or other characters interpreted as a plain string.
3232
The access log formatter does not make any assumptions about a new line separator, so one
3333
has to specified as part of the format string.
3434
See the :ref:`default format <config_access_log_default_format>` for an example.
@@ -78,7 +78,7 @@ For example, with the following format provided in the configuration:
7878
}
7979
}
8080
}
81-
81+
8282
The following JSON object would be written to the log file:
8383

8484
.. code-block:: json
@@ -228,6 +228,17 @@ The following command operators are supported:
228228
Local address of the upstream connection. If the address is an IP address it includes both
229229
address and port.
230230

231+
.. _config_access_log_format_upstream_transport_failure_reason:
232+
233+
%UPSTREAM_TRANSPORT_FAILURE_REASON%
234+
HTTP
235+
If upstream connection failed due to transport socket (e.g. TLS handshake), provides the failure
236+
reason from the transport socket. The format of this field depends on the configured upstream
237+
transport socket. Common TLS failures are in :ref:`TLS trouble shooting <arch_overview_ssl_trouble_shooting>`.
238+
239+
TCP
240+
Not implemented ("-")
241+
231242
%DOWNSTREAM_REMOTE_ADDRESS%
232243
Remote address of the downstream connection. If the address is an IP address it includes both
233244
address and port.

docs/root/intro/arch_overview/ssl.rst

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -158,3 +158,23 @@ infrastructure.
158158

159159
Client TLS authentication filter :ref:`configuration reference
160160
<config_network_filters_client_ssl_auth>`.
161+
162+
.. _arch_overview_ssl_trouble_shooting:
163+
164+
Trouble shooting
165+
----------------
166+
167+
When Envoy originates TLS when making connections to upstream clusters, any errors will be logged into
168+
:ref:`UPSTREAM_TRANSPORT_FAILURE_REASON<config_access_log_format_upstream_transport_failure_reason>` field or
169+
:ref:`AccessLogCommon.upstream_transport_failure_reason<envoy_api_field_data.accesslog.v2.AccessLogCommon.upstream_transport_failure_reason>` field.
170+
Common errors are:
171+
172+
* ``Secret is not supplied by SDS``: Envoy is still waiting SDS to deliver key/cert or root CA.
173+
* ``SSLV3_ALERT_CERTIFICATE_EXPIRED``: Peer certificate is expired and not allowed in config.
174+
* ``SSLV3_ALERT_CERTIFICATE_UNKNOWN``: Peer certificate is not in config specified SPKI.
175+
* ``SSLV3_ALERT_HANDSHAKE_FAILURE``: Handshake failed, usually due to upstream requires client certificate but not presented.
176+
* ``TLSV1_ALERT_PROTOCOL_VERSION``: TLS protocol version mismatch.
177+
* ``TLSV1_ALERT_UNKNOWN_CA``: Peer certificate CA is not in trusted CA.
178+
179+
More detailed list of error that can be raised by BoringSSL can be found
180+
`here <https://github.com/google/boringssl/blob/master/crypto/err/ssl.errordata>`_

docs/root/intro/version_history.rst

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@ Version history
66
* access log: added a new flag for upstream retry count exceeded.
77
* access log: added a :ref:`gRPC filter <envoy_api_msg_config.filter.accesslog.v2.GrpcStatusFilter>` to allow filtering on gRPC status.
88
* access log: added a new flag for stream idle timeout.
9+
* access log: added a new field for upstream transport failure reason in :ref:`file access logger<config_access_log_format_upstream_transport_failure_reason>` and
10+
:ref:`gRPC access logger<envoy_api_field_data.accesslog.v2.AccessLogCommon.upstream_transport_failure_reason>` for HTTP access logs.
911
* admin: the admin server can now be accessed via HTTP/2 (prior knowledge).
1012
* buffer: fix vulnerabilities when allocation fails.
1113
* build: releases are built with GCC-7 and linked with LLD.
@@ -17,7 +19,7 @@ Version history
1719
* config: finish cluster warming only when a named response i.e. ClusterLoadAssignment associated to the cluster being warmed comes in the EDS response. This is a behavioural change from the current implementation where warming of cluster completes on missing load assignments also.
1820
* config: use Envoy cpuset size to set the default number or worker threads if :option:`--cpuset-threads` is enabled.
1921
* cors: added :ref:`filter_enabled & shadow_enabled RuntimeFractionalPercent flags <cors-runtime>` to filter.
20-
* ext_authz: added an configurable option to make the gRPC service cross-compatible with V2Alpha. Note that this feature is already deprecated. It should be used for a short time, and only when transitioning from alpha to V2 release version.
22+
* ext_authz: added an configurable option to make the gRPC service cross-compatible with V2Alpha. Note that this feature is already deprecated. It should be used for a short time, and only when transitioning from alpha to V2 release version.
2123
* ext_authz: migrated from V2alpha to V2 and improved the documentation.
2224
* ext_authz: authorization request and response configuration has been separated into two distinct objects: :ref:`authorization request
2325
<envoy_api_field_config.filter.http.ext_authz.v2.HttpService.authorization_request>` and :ref:`authorization response

include/envoy/http/codec.h

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -134,8 +134,10 @@ class StreamCallbacks {
134134
/**
135135
* Fires when a stream has been remote reset.
136136
* @param reason supplies the reset reason.
137+
* @param transport_failure_reason supplies underlying transport failure reason.
137138
*/
138-
virtual void onResetStream(StreamResetReason reason) PURE;
139+
virtual void onResetStream(StreamResetReason reason,
140+
absl::string_view transport_failure_reason) PURE;
139141

140142
/**
141143
* Fires when a stream, or the connection the stream is sending to, goes over its high watermark.

include/envoy/http/conn_pool.h

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,10 +46,11 @@ class Callbacks {
4646
/**
4747
* Called when a pool error occurred and no connection could be acquired for making the request.
4848
* @param reason supplies the failure reason.
49+
* @param transport_failure_reason supplies the details of the transport failure reason.
4950
* @param host supplies the description of the host that caused the failure. This may be nullptr
5051
* if no host was involved in the failure (for example overflow).
5152
*/
52-
virtual void onPoolFailure(PoolFailureReason reason,
53+
virtual void onPoolFailure(PoolFailureReason reason, absl::string_view transport_failure_reason,
5354
Upstream::HostDescriptionConstSharedPtr host) PURE;
5455

5556
/**

include/envoy/network/connection.h

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -261,6 +261,12 @@ class Connection : public Event::DeferredDeletable, public FilterManager {
261261
* @return std::chrono::milliseconds The delayed close timeout value.
262262
*/
263263
virtual std::chrono::milliseconds delayedCloseTimeout() const PURE;
264+
265+
/**
266+
* @return std::string the failure reason of the underlying transport socket, if no failure
267+
* occurred an empty string is returned.
268+
*/
269+
virtual absl::string_view transportFailureReason() const PURE;
264270
};
265271

266272
typedef std::unique_ptr<Connection> ConnectionPtr;

include/envoy/network/transport_socket.h

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,12 @@ class TransportSocket {
105105
*/
106106
virtual std::string protocol() const PURE;
107107

108+
/**
109+
* @return std::string the last failure reason occurred on the transport socket. If no failure
110+
* has been occurred the empty string is returned.
111+
*/
112+
virtual absl::string_view failureReason() const PURE;
113+
108114
/**
109115
* @return bool whether the socket can be flushed and closed.
110116
*/

include/envoy/stream_info/stream_info.h

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -364,6 +364,17 @@ class StreamInfo {
364364
* @return SNI value for downstream host.
365365
*/
366366
virtual const std::string& requestedServerName() const PURE;
367+
368+
/**
369+
* @param failure_reason the upstream transport failure reason.
370+
*/
371+
virtual void setUpstreamTransportFailureReason(absl::string_view failure_reason) PURE;
372+
373+
/**
374+
* @return const std::string& the upstream transport failure reason, e.g. certificate validation
375+
* failed.
376+
*/
377+
virtual const std::string& upstreamTransportFailureReason() const PURE;
367378
};
368379

369380
} // namespace StreamInfo

source/common/access_log/access_log_formatter.cc

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -350,6 +350,14 @@ StreamInfoFormatter::StreamInfoFormatter(const std::string& field_name) {
350350
return UnspecifiedValueString;
351351
}
352352
};
353+
} else if (field_name == "UPSTREAM_TRANSPORT_FAILURE_REASON") {
354+
field_extractor_ = [](const StreamInfo::StreamInfo& stream_info) {
355+
if (!stream_info.upstreamTransportFailureReason().empty()) {
356+
return stream_info.upstreamTransportFailureReason();
357+
} else {
358+
return UnspecifiedValueString;
359+
}
360+
};
353361
} else {
354362
throw EnvoyException(fmt::format("Not supported field in StreamInfo: {}", field_name));
355363
}

0 commit comments

Comments
 (0)