Skip to content

Envoy issuing 503's intermittently  #15285

@chrismckean

Description

@chrismckean

Bug description

Hi All,

@Stono @martinhighton and myself have been looking into an issue with some of our K8 applications getting 503 errors. We're seeing roughly 5-10 an hour between the 2 applications we've been targeting in this investigation. I have obtained the istio debug logs and some packet captures. From the captures, you can see that the source application is the one sending the reset (below). The applications talking aren't particularly busy, so I'm guessing there's some sort of connection timeout issue.

image

I've looked at the debug logs for a while and I can't figure out how to read them. I've not managed to find a lot of documentation on them online. I'm assuming that the 'Cxx' value is a connection. So, all of the entries under, say, C22 relate to one connection between the source and destination envoy? If that is the case should the connection number on the destination logs match the source logs??

There's not a lot to see from the packet capture. Just that line above. Here are what I believe to be the relevant debug logs from the istio container

Source

[2019-06-28 13:02:36.789][36][debug][router] [external/envoy/source/common/router/router.cc:332] [C5][S4000182525086368835] cluster 'outbound|80||app.location-service.svc.cluster.local' match for URL '/api/location'
[2019-06-28 13:02:36.789][36][debug][router]
[2019-06-28 13:02:36.789][36][debug][router] [external/envoy/source/common/router/router.cc:332] [C5][S4000182525086368835] cluster 'outbound|80||app.location-service.svc.cluster.local' match for URL '/api/location'
[2019-06-28 13:02:36.789][36][debug][router] [external/envoy/source/common/router/router.cc:393] [C5][S4000182525086368835] router decoding headers:
':authority', 'app.dest.app'
':path', '/api/location'
':method', 'OPTIONS'
':scheme', 'https'
'accept', 'application/json, application/json, application/*+json, application/*+json'
'authorization', 'Basic xxxxxxxxxxxx'
'user-agent', 'source-app/xxxxxxxxx '
'accept-encoding', 'gzip,deflate'
'x-forwarded-proto', 'http'
'x-request-id', 'xxx-xxx-xxxx'
'x-envoy-decorator-operation', 'app.dest.app.svc.cluster.local:80/*'
'x-istio-attributes', 'CiEKGGRlc3RpbmF0aW9uLnNlcnZpY2UubmFtZRIFEgNhcHAKMwodZGVzdGluYXRpb24uc2VydmljZS5uYW1lc3BhY2USEhIQbG9jYXRpb24tc2VydmljZQpNCgpzb3VyY2UudWlkEj8SPWt1YmVybmV0ZXM6Ly9tb3Zlcy1sb2dpc3RpY3MtNmY1Yzk0NGM1OS1qZzRtcC5tb3Zlcy1sb2dpc3RpY3MKRAoYZGVzdGluYXRpb24uc2VydmljZS5ob3N0EigSJmFwcC5sb2NhdGlvbi1zZXJ2aWNlLnN2Yy5jbHVzdGVyLmxvY2FsCkIKF2Rlc3RpbmF0aW9uLnNlcnZpY2UudWlkEicSJWlzdGlvOi8vbG9jYXRpb24tc2VydmljZS9zZXJ2aWNlcy9hcHA='
'x-envoy-expected-rq-timeout-ms', '10000'
'x-b3-traceid', 'xxxxxxxxxxxxx'
'x-b3-spanid', 'xxxxxxxxxxxxxx'
'x-b3-sampled', '0'

[2019-06-28 13:02:36.789][36][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:97] [C22] using existing connection
[2019-06-28 13:02:36.789][36][debug][router] [external/envoy/source/common/router/router.cc:1210] [C5][S4000182525086368835] pool ready
[2019-06-28 13:02:36.789][36][debug][router] [external/envoy/source/common/router/router.cc:1210] [C5][S4000182525086368835] pool ready
[2019-06-28 13:02:36.791][36][debug][router] [external/envoy/source/common/router/router.cc:754] [C5][S4000182525086368835] upstream headers complete: end_stream=false
[2019-06-28 13:02:36.792][36][debug][router] [external/envoy/source/common/router/router.cc:932] [C5][S4000182525086368835] performing retry
[2019-06-28 13:02:36.792][36][debug][router] [external/envoy/source/common/router/router.cc:1156] [C5][S4000182525086368835] resetting pool request
[2019-06-28 13:02:36.792][36][debug][client] [external/envoy/source/common/http/codec_client.cc:105] [C22] request reset
[2019-06-28 13:02:36.792][36][debug][connection] [external/envoy/source/common/network/connection_impl.cc:101] [C22] closing data_to_write=0 type=1
[2019-06-28 13:02:36.792][36][debug][connection] [external/envoy/source/common/network/connection_impl.cc:188] [C22] closing socket: 1
[2019-06-28 13:02:36.792][36][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:270] [C22] SSL shutdown: rc=0
[2019-06-28 13:02:36.792][36][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:201] [C22] 
[2019-06-28 13:02:36.792][36][debug][client] [external/envoy/source/common/http/codec_client.cc:82] [C22] disconnect. resetting 0 pending requests
[2019-06-28 13:02:36.792][36][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:129] [C22] client disconnected, failure reason: 
[2019-06-28 13:02:36.803][36][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:88] creating a new connection

Destination

2019-06-28 13:02:36.790][37][debug][router] [external/envoy/source/common/router/router.cc:393] [C21][S3699665653477458718] router decoding headers:
':authority', 'app.dest-app'
':path', '/api/location'
':method', 'OPTIONS'
':scheme', 'http'
'accept', 'application/json, application/json, application/*+json, application/*+json'
'authorization', 'Basic xxxxxxxxxxxxx'
'user-agent', 'source-app/xxxxxxxxxxxxxxxxxx '
'accept-encoding', 'gzip,deflate'
'x-forwarded-proto', 'http'
'x-request-id', 'xxxxx-xxxxxxx-xxxxxxx'
'content-length', '0'
'x-forwarded-client-cert', 'By=spiffe://cluster.local/ns/dest-app/sa/default;Hash=76bb5f18ded8f10606e5f67fe910ddf377a8ffa51f50a7ae180dad792ac7dde2;Subject="";URI=spiffe://cluster.local/ns/source-app/sa/default'
'x-b3-traceid', 'xxxxxxxxxxxxxxx'
'x-b3-spanid', 'xxxxxxxxxxxxxx'
'x-b3-parentspanid', 'xxxxxxxxxxxxx'
'x-b3-sampled', '0'

[2019-06-28 13:02:36.790][37][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:97] [C26] using existing connection
[2019-06-28 13:02:36.790][37][debug][router] [external/envoy/source/common/router/router.cc:1210] [C21][S3699665653477458718] pool ready
[2019-06-28 13:02:36.790][37][debug][connection] [external/envoy/source/common/network/connection_impl.cc:518] [C26] remote close
[2019-06-28 13:02:36.790][37][debug][connection] [external/envoy/source/common/network/connection_impl.cc:188] [C26] closing socket: 0
[2019-06-28 13:02:36.791][37][debug][client] [external/envoy/source/common/http/codec_client.cc:82] [C26] disconnect. resetting 1 pending requests
[2019-06-28 13:02:36.791][37][debug][client] [external/envoy/source/common/http/codec_client.cc:105] [C26] request reset
[2019-06-28 13:02:36.791][37][debug][router] [external/envoy/source/common/router/router.cc:671] [C21][S3699665653477458718] upstream reset: reset reason connection termination
[2019-06-28 13:02:36.791][37][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:1137] [C21][S3699665653477458718] Sending local reply with details upstream_reset_before_response_started{connection termination}
[2019-06-28 13:02:36.791][37][debug][filter] [src/envoy/http/mixer/filter.cc:141] Called Mixer::Filter : encodeHeaders 2
[2019-06-28 13:02:36.791][37][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:1329] [C21][S3699665653477458718] encoding headers via codec (end_stream=false):
':status', '503'
'content-length', '95'
'content-type', 'text/plain'
'date', 'Fri, 28 Jun 2019 13:02:36 GMT'
'server', 'istio-envoy'

[2019-06-28 13:02:36.791][37][debug][filter] [src/envoy/http/authn/http_filter.cc:48] Called AuthenticationFilter : onDestroy
[2019-06-28 13:02:36.791][37][debug][filter] [src/envoy/http/mixer/filter.cc:214] Called Mixer::Filter : onDestroy state: 2
[2019-06-28 13:02:36.791][37][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:129] [C26] client disconnected, failure reason: 
[2019-06-28 13:02:36.791][37][debug][filter] [src/envoy/http/mixer/filter.cc:228] Called Mixer::Filter : log
[2019-06-28 13:02:36.791][37][debug][filter] [./src/envoy/http/mixer/report_data.h:144] No dynamic_metadata found for filter envoy.filters.http.rbac
[2019-06-28 13:02:36.791][37][debug][filter] [src/istio/control/client_context_base.cc:140] Report attributes: attributes {
  key: "connection.mtls"
  value {
    bool_value: true
  }
}
attributes {
  key: "connection.requested_server_name"
  value {
    string_value: "outbound_.80_._.app.dest-a
[2019-06-28 13:02:36.792][37][debug][connection] [external/envoy/source/common/network/connection_impl.cc:518] [C21] remote close
[2019-06-28 13:02:36.792][37][debug][connection] [external/envoy/source/common/network/connection_impl.cc:188] [C21] closing socket: 0
[2019-06-28 13:02:36.792][37][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:270] [C21] SSL shutdown: rc=1
[2019-06-28 13:02:36.792][37][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:201] [C21] 

We're having trouble figuring out exactly what is going on here. Could someone possibly help read these debug logs? The source envoy just seems to reset the connection. If the connection numbers should match then the source envoy is trying to use a connection that maybe doesn't exist on the destination envoy anymore? If this isn't the case I'm not sure why the source envoy is resetting the connection

Please let me know if somebody needs any more info on this problem.

--

Cheers

Chris

Affected product area (please put an X in all that apply)

[ ] Configuration Infrastructure
[ ] Docs
[ ] Installation
[x] Networking
[x ] Performance and Scalability
[ ] Policies and Telemetry
[ ] Security
[ ] Test and Release
[ ] User Experience
[ ] Developer Infrastructure

Expected behavior
Envoy not to issue a 503 unless there is a problem with the source, destination app

Steps to reproduce the bug
This happens randomly

Version (include the output of istioctl version --remote and kubectl version)
Kubelet v1.13
Istio v.1.2.2

Environment where bug was observed (cloud vendor, OS, etc)
GKE
master 1.13.6-gke.13 (Nodes v1.12.7-gke.24)
Container-Optimized OS from Google Kernel v.4.14.127+ docker://17.3.2

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions