Bug description
Hi All,
@Stono @martinhighton and myself have been looking into an issue with some of our K8 applications getting 503 errors. We're seeing roughly 5-10 an hour between the 2 applications we've been targeting in this investigation. I have obtained the istio debug logs and some packet captures. From the captures, you can see that the source application is the one sending the reset (below). The applications talking aren't particularly busy, so I'm guessing there's some sort of connection timeout issue.

I've looked at the debug logs for a while and I can't figure out how to read them. I've not managed to find a lot of documentation on them online. I'm assuming that the 'Cxx' value is a connection. So, all of the entries under, say, C22 relate to one connection between the source and destination envoy? If that is the case should the connection number on the destination logs match the source logs??
There's not a lot to see from the packet capture. Just that line above. Here are what I believe to be the relevant debug logs from the istio container
Source
[2019-06-28 13:02:36.789][36][debug][router] [external/envoy/source/common/router/router.cc:332] [C5][S4000182525086368835] cluster 'outbound|80||app.location-service.svc.cluster.local' match for URL '/api/location'
[2019-06-28 13:02:36.789][36][debug][router]
[2019-06-28 13:02:36.789][36][debug][router] [external/envoy/source/common/router/router.cc:332] [C5][S4000182525086368835] cluster 'outbound|80||app.location-service.svc.cluster.local' match for URL '/api/location'
[2019-06-28 13:02:36.789][36][debug][router] [external/envoy/source/common/router/router.cc:393] [C5][S4000182525086368835] router decoding headers:
':authority', 'app.dest.app'
':path', '/api/location'
':method', 'OPTIONS'
':scheme', 'https'
'accept', 'application/json, application/json, application/*+json, application/*+json'
'authorization', 'Basic xxxxxxxxxxxx'
'user-agent', 'source-app/xxxxxxxxx '
'accept-encoding', 'gzip,deflate'
'x-forwarded-proto', 'http'
'x-request-id', 'xxx-xxx-xxxx'
'x-envoy-decorator-operation', 'app.dest.app.svc.cluster.local:80/*'
'x-istio-attributes', 'CiEKGGRlc3RpbmF0aW9uLnNlcnZpY2UubmFtZRIFEgNhcHAKMwodZGVzdGluYXRpb24uc2VydmljZS5uYW1lc3BhY2USEhIQbG9jYXRpb24tc2VydmljZQpNCgpzb3VyY2UudWlkEj8SPWt1YmVybmV0ZXM6Ly9tb3Zlcy1sb2dpc3RpY3MtNmY1Yzk0NGM1OS1qZzRtcC5tb3Zlcy1sb2dpc3RpY3MKRAoYZGVzdGluYXRpb24uc2VydmljZS5ob3N0EigSJmFwcC5sb2NhdGlvbi1zZXJ2aWNlLnN2Yy5jbHVzdGVyLmxvY2FsCkIKF2Rlc3RpbmF0aW9uLnNlcnZpY2UudWlkEicSJWlzdGlvOi8vbG9jYXRpb24tc2VydmljZS9zZXJ2aWNlcy9hcHA='
'x-envoy-expected-rq-timeout-ms', '10000'
'x-b3-traceid', 'xxxxxxxxxxxxx'
'x-b3-spanid', 'xxxxxxxxxxxxxx'
'x-b3-sampled', '0'
[2019-06-28 13:02:36.789][36][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:97] [C22] using existing connection
[2019-06-28 13:02:36.789][36][debug][router] [external/envoy/source/common/router/router.cc:1210] [C5][S4000182525086368835] pool ready
[2019-06-28 13:02:36.789][36][debug][router] [external/envoy/source/common/router/router.cc:1210] [C5][S4000182525086368835] pool ready
[2019-06-28 13:02:36.791][36][debug][router] [external/envoy/source/common/router/router.cc:754] [C5][S4000182525086368835] upstream headers complete: end_stream=false
[2019-06-28 13:02:36.792][36][debug][router] [external/envoy/source/common/router/router.cc:932] [C5][S4000182525086368835] performing retry
[2019-06-28 13:02:36.792][36][debug][router] [external/envoy/source/common/router/router.cc:1156] [C5][S4000182525086368835] resetting pool request
[2019-06-28 13:02:36.792][36][debug][client] [external/envoy/source/common/http/codec_client.cc:105] [C22] request reset
[2019-06-28 13:02:36.792][36][debug][connection] [external/envoy/source/common/network/connection_impl.cc:101] [C22] closing data_to_write=0 type=1
[2019-06-28 13:02:36.792][36][debug][connection] [external/envoy/source/common/network/connection_impl.cc:188] [C22] closing socket: 1
[2019-06-28 13:02:36.792][36][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:270] [C22] SSL shutdown: rc=0
[2019-06-28 13:02:36.792][36][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:201] [C22]
[2019-06-28 13:02:36.792][36][debug][client] [external/envoy/source/common/http/codec_client.cc:82] [C22] disconnect. resetting 0 pending requests
[2019-06-28 13:02:36.792][36][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:129] [C22] client disconnected, failure reason:
[2019-06-28 13:02:36.803][36][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:88] creating a new connection
Destination
2019-06-28 13:02:36.790][37][debug][router] [external/envoy/source/common/router/router.cc:393] [C21][S3699665653477458718] router decoding headers:
':authority', 'app.dest-app'
':path', '/api/location'
':method', 'OPTIONS'
':scheme', 'http'
'accept', 'application/json, application/json, application/*+json, application/*+json'
'authorization', 'Basic xxxxxxxxxxxxx'
'user-agent', 'source-app/xxxxxxxxxxxxxxxxxx '
'accept-encoding', 'gzip,deflate'
'x-forwarded-proto', 'http'
'x-request-id', 'xxxxx-xxxxxxx-xxxxxxx'
'content-length', '0'
'x-forwarded-client-cert', 'By=spiffe://cluster.local/ns/dest-app/sa/default;Hash=76bb5f18ded8f10606e5f67fe910ddf377a8ffa51f50a7ae180dad792ac7dde2;Subject="";URI=spiffe://cluster.local/ns/source-app/sa/default'
'x-b3-traceid', 'xxxxxxxxxxxxxxx'
'x-b3-spanid', 'xxxxxxxxxxxxxx'
'x-b3-parentspanid', 'xxxxxxxxxxxxx'
'x-b3-sampled', '0'
[2019-06-28 13:02:36.790][37][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:97] [C26] using existing connection
[2019-06-28 13:02:36.790][37][debug][router] [external/envoy/source/common/router/router.cc:1210] [C21][S3699665653477458718] pool ready
[2019-06-28 13:02:36.790][37][debug][connection] [external/envoy/source/common/network/connection_impl.cc:518] [C26] remote close
[2019-06-28 13:02:36.790][37][debug][connection] [external/envoy/source/common/network/connection_impl.cc:188] [C26] closing socket: 0
[2019-06-28 13:02:36.791][37][debug][client] [external/envoy/source/common/http/codec_client.cc:82] [C26] disconnect. resetting 1 pending requests
[2019-06-28 13:02:36.791][37][debug][client] [external/envoy/source/common/http/codec_client.cc:105] [C26] request reset
[2019-06-28 13:02:36.791][37][debug][router] [external/envoy/source/common/router/router.cc:671] [C21][S3699665653477458718] upstream reset: reset reason connection termination
[2019-06-28 13:02:36.791][37][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:1137] [C21][S3699665653477458718] Sending local reply with details upstream_reset_before_response_started{connection termination}
[2019-06-28 13:02:36.791][37][debug][filter] [src/envoy/http/mixer/filter.cc:141] Called Mixer::Filter : encodeHeaders 2
[2019-06-28 13:02:36.791][37][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:1329] [C21][S3699665653477458718] encoding headers via codec (end_stream=false):
':status', '503'
'content-length', '95'
'content-type', 'text/plain'
'date', 'Fri, 28 Jun 2019 13:02:36 GMT'
'server', 'istio-envoy'
[2019-06-28 13:02:36.791][37][debug][filter] [src/envoy/http/authn/http_filter.cc:48] Called AuthenticationFilter : onDestroy
[2019-06-28 13:02:36.791][37][debug][filter] [src/envoy/http/mixer/filter.cc:214] Called Mixer::Filter : onDestroy state: 2
[2019-06-28 13:02:36.791][37][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:129] [C26] client disconnected, failure reason:
[2019-06-28 13:02:36.791][37][debug][filter] [src/envoy/http/mixer/filter.cc:228] Called Mixer::Filter : log
[2019-06-28 13:02:36.791][37][debug][filter] [./src/envoy/http/mixer/report_data.h:144] No dynamic_metadata found for filter envoy.filters.http.rbac
[2019-06-28 13:02:36.791][37][debug][filter] [src/istio/control/client_context_base.cc:140] Report attributes: attributes {
key: "connection.mtls"
value {
bool_value: true
}
}
attributes {
key: "connection.requested_server_name"
value {
string_value: "outbound_.80_._.app.dest-a
[2019-06-28 13:02:36.792][37][debug][connection] [external/envoy/source/common/network/connection_impl.cc:518] [C21] remote close
[2019-06-28 13:02:36.792][37][debug][connection] [external/envoy/source/common/network/connection_impl.cc:188] [C21] closing socket: 0
[2019-06-28 13:02:36.792][37][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:270] [C21] SSL shutdown: rc=1
[2019-06-28 13:02:36.792][37][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:201] [C21]
We're having trouble figuring out exactly what is going on here. Could someone possibly help read these debug logs? The source envoy just seems to reset the connection. If the connection numbers should match then the source envoy is trying to use a connection that maybe doesn't exist on the destination envoy anymore? If this isn't the case I'm not sure why the source envoy is resetting the connection
Please let me know if somebody needs any more info on this problem.
--
Cheers
Chris
Affected product area (please put an X in all that apply)
[ ] Configuration Infrastructure
[ ] Docs
[ ] Installation
[x] Networking
[x ] Performance and Scalability
[ ] Policies and Telemetry
[ ] Security
[ ] Test and Release
[ ] User Experience
[ ] Developer Infrastructure
Expected behavior
Envoy not to issue a 503 unless there is a problem with the source, destination app
Steps to reproduce the bug
This happens randomly
Version (include the output of istioctl version --remote and kubectl version)
Kubelet v1.13
Istio v.1.2.2
Environment where bug was observed (cloud vendor, OS, etc)
GKE
master 1.13.6-gke.13 (Nodes v1.12.7-gke.24)
Container-Optimized OS from Google Kernel v.4.14.127+ docker://17.3.2
Bug description
Hi All,
@Stono @martinhighton and myself have been looking into an issue with some of our K8 applications getting 503 errors. We're seeing roughly 5-10 an hour between the 2 applications we've been targeting in this investigation. I have obtained the istio debug logs and some packet captures. From the captures, you can see that the source application is the one sending the reset (below). The applications talking aren't particularly busy, so I'm guessing there's some sort of connection timeout issue.
I've looked at the debug logs for a while and I can't figure out how to read them. I've not managed to find a lot of documentation on them online. I'm assuming that the 'Cxx' value is a connection. So, all of the entries under, say, C22 relate to one connection between the source and destination envoy? If that is the case should the connection number on the destination logs match the source logs??
There's not a lot to see from the packet capture. Just that line above. Here are what I believe to be the relevant debug logs from the istio container
Source
Destination
We're having trouble figuring out exactly what is going on here. Could someone possibly help read these debug logs? The source envoy just seems to reset the connection. If the connection numbers should match then the source envoy is trying to use a connection that maybe doesn't exist on the destination envoy anymore? If this isn't the case I'm not sure why the source envoy is resetting the connection
Please let me know if somebody needs any more info on this problem.
--
Cheers
Chris
Affected product area (please put an X in all that apply)
[ ] Configuration Infrastructure
[ ] Docs
[ ] Installation
[x] Networking
[x ] Performance and Scalability
[ ] Policies and Telemetry
[ ] Security
[ ] Test and Release
[ ] User Experience
[ ] Developer Infrastructure
Expected behavior
Envoy not to issue a 503 unless there is a problem with the source, destination app
Steps to reproduce the bug
This happens randomly
Version (include the output of
istioctl version --remoteandkubectl version)Kubelet v1.13
Istio v.1.2.2
Environment where bug was observed (cloud vendor, OS, etc)
GKE
master 1.13.6-gke.13 (Nodes v1.12.7-gke.24)
Container-Optimized OS from Google Kernel v.4.14.127+ docker://17.3.2