-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Issue Type:
- Bug report
- Feature request
What happened:
Defining a new service (ElasticSearch) and marking its default port (9200) as opaque makes the service unavailable, returning a 502 Bad Gateway on every request.
What you expected to happen:
I would expect the service to be available. If you remove the opaque-port definition it works smoothly
How to reproduce it (as minimally and precisely as possible):
The first step is to deploy an ElasticSearch cluster (I've deployed one single node cluster) and define the corresponding service. I've just happened to find this problem using ElasticSearch but I don't think the type of service it's relevant for this particular problem.
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
elasticsearch ClusterIP 10.0.102.110 <none> 9200/TCP 18m
Then you can just go to any other pod in your cluster and try to perform a regular curl:
curl http://elasticsearch:9200
You will get a 502 Bad Gateway error and, looking at the logs in the Linkerd sidecar :
[ 1472.597759s] DEBUG ThreadId(01) inbound:accept{client.addr=10.244.3.16:46202}:server{port=4143}:direct: linkerd_tls::server: Peeked bytes from TCP stream sz=301
[ 1472.598753s] DEBUG ThreadId(01) inbound:accept{client.addr=10.244.3.16:46202}:server{port=4143}:direct: linkerd_tls::server: Accepted TLS connection client.id=Some(ClientId("unicorn.ghae.serviceaccount.identity.linkerd.cluster.local")) alpn=Some("transport.l5d.io/v1")
[ 1472.598802s] DEBUG ThreadId(01) inbound:accept{client.addr=10.244.3.16:46202}:server{port=4143}:direct: linkerd_transport_header::server: Read transport header header=TransportHeader { port: 9200, name: None, protocol: Some(Http2) }
[ 1472.598856s] INFO ThreadId(01) inbound:accept{client.addr=10.244.3.16:46202}: linkerd_app_core::serve: Connection closed error=A named target must be provided on gateway connections
[ 1476.432186s] DEBUG ThreadId(02) daemon:admin{listen.addr=0.0.0.0:4191}:accept{client.addr=10.244.3.1:32826}: linkerd_tls::server: Peeked bytes from TCP stream sz=118
[ 1476.432228s] DEBUG ThreadId(02) daemon:admin{listen.addr=0.0.0.0:4191}:accept{client.addr=10.244.3.1:32826}: linkerd_detect: DetectResult protocol=Some(Http1) elapsed=6.4µs
[ 1476.432238s] DEBUG ThreadId(02) daemon:admin{listen.addr=0.0.0.0:4191}:accept{client.addr=10.244.3.1:32826}: linkerd_proxy_http::server: Creating HTTP service version=Http1
[ 1476.432251s] DEBUG ThreadId(02) daemon:admin{listen.addr=0.0.0.0:4191}:accept{client.addr=10.244.3.1:32826}: linkerd_proxy_http::server: Handling as HTTP version=Http1
[ 1476.432371s] DEBUG ThreadId(02) daemon:admin{listen.addr=0.0.0.0:4191}:accept{client.addr=10.244.3.1:32826}: linkerd_proxy_http::server: The client is shutting down the connection res=Ok(())
[ 1476.432400s] DEBUG ThreadId(02) daemon:admin{listen.addr=0.0.0.0:4191}:accept{client.addr=10.244.3.1:32826}: linkerd_app_core::serve: Connection closed
[ 1480.556736s] DEBUG ThreadId(02) daemon:admin{listen.addr=0.0.0.0:4191}:accept{client.addr=10.244.3.1:32932}: linkerd_tls::server: Peeked bytes from TCP stream sz=117
[ 1480.556779s] DEBUG ThreadId(02) daemon:admin{listen.addr=0.0.0.0:4191}:accept{client.addr=10.244.3.1:32932}: linkerd_detect: DetectResult protocol=Some(Http1) elapsed=5.601µs
[ 1480.556791s] DEBUG ThreadId(02) daemon:admin{listen.addr=0.0.0.0:4191}:accept{client.addr=10.244.3.1:32932}: linkerd_proxy_http::server: Creating HTTP service version=Http1
[ 1480.556801s] DEBUG ThreadId(02) daemon:admin{listen.addr=0.0.0.0:4191}:accept{client.addr=10.244.3.1:32932}: linkerd_proxy_http::server: Handling as HTTP version=Http1
[ 1480.556908s] DEBUG ThreadId(02) daemon:admin{listen.addr=0.0.0.0:4191}:accept{client.addr=10.244.3.1:32932}: linkerd_proxy_http::server: The client is shutting down the connection res=Ok(())
[ 1480.556939s] DEBUG ThreadId(02) daemon:admin{listen.addr=0.0.0.0:4191}:accept{client.addr=10.244.3.1:32932}: linkerd_app_core::serve: Connection closed
[ 1482.284320s] DEBUG ThreadId(01) inbound:accept{client.addr=10.244.3.1:56880}:server{port=9200}: linkerd_app_inbound::prevent_loop: addr=10.244.3.18:9200 self.port=4143
[ 1482.284420s] DEBUG ThreadId(01) inbound:accept{client.addr=10.244.3.1:56880}:server{port=9200}:forward:tcp: linkerd_proxy_transport::connect: Connecting server.addr=127.0.0.1:9200
[ 1482.284546s] DEBUG ThreadId(01) inbound:accept{client.addr=10.244.3.1:56880}:server{port=9200}:forward:tcp: linkerd_proxy_transport::connect: Connected local.addr=127.0.0.1:56646 keepalive=None
[ 1482.284570s] DEBUG ThreadId(01) inbound:accept{client.addr=10.244.3.1:56880}:server{port=9200}:forward:tcp: linkerd_proxy_transport::metrics: client connection open
[ 1482.285240s] DEBUG ThreadId(01) inbound:accept{client.addr=10.244.3.1:56880}: linkerd_app_core::serve: Connection closed
Looking at the logs in the Linkerd sidecar of the container where the request is being initiated we can find:
[ 1837.792907s] DEBUG ThreadId(01) server{port=9100}:http{v=h2}:http1{name=nginx-unicorn.ghae.svc.cluster.local:9100}:profile:http1: linkerd_proxy_http::client: method=GET uri=http://miguelangelp.ghaekube.net/sessions/_auth_request_endpoint version=HTTP/1.0
[ 1837.792928s] DEBUG ThreadId(01) server{port=9100}:http{v=h2}:http1{name=nginx-unicorn.ghae.svc.cluster.local:9100}:profile:http1: linkerd_proxy_http::client: headers={"host": "miguelangelp.ghaekube.net", "x-forwarded-for": "127.0.0.1", "l5d-dst-canonical": "nginx-unicorn.ghae.svc.cluster.local:9100"}
[ 1837.792941s] DEBUG ThreadId(01) server{port=9100}:http{v=h2}:http1{name=nginx-unicorn.ghae.svc.cluster.local:9100}:profile:http1: linkerd_proxy_http::h1: Using one-off client use_absolute_form=false is_missing_host=false
[ 1837.792990s] DEBUG ThreadId(01) server{port=9100}:http{v=h2}:http1{name=nginx-unicorn.ghae.svc.cluster.local:9100}:profile:http1: linkerd_proxy_transport::connect: Connecting server.addr=127.0.0.1:9100
[ 1837.793128s] DEBUG ThreadId(01) server{port=9100}:http{v=h2}:http1{name=nginx-unicorn.ghae.svc.cluster.local:9100}:profile:http1: linkerd_proxy_transport::connect: Connected local.addr=127.0.0.1:46200 keepalive=None
[ 1837.793148s] DEBUG ThreadId(01) server{port=9100}:http{v=h2}:http1{name=nginx-unicorn.ghae.svc.cluster.local:9100}:profile:http1: linkerd_proxy_transport::metrics: client connection open
[ 1837.830592s] DEBUG ThreadId(01) outbound:accept{client.addr=10.244.3.16:59012}:server{orig_dst=10.0.102.110:9200}: linkerd_cache: Caching new service target=Accept { orig_dst: OrigDstAddr(10.0.102.110:9200), protocol: () }
[ 1837.830713s] DEBUG ThreadId(01) outbound:accept{client.addr=10.244.3.16:59012}:server{orig_dst=10.0.102.110:9200}: linkerd_timeout::failfast: TCP Server service has become unavailable
[ 1837.830781s] DEBUG ThreadId(01) server{orig_dst=10.0.89.66:11211}: linkerd_dns: resolve_srv name=linkerd-dst-headless.linkerd.svc.cluster.local.
[ 1837.831946s] DEBUG ThreadId(01) outbound:accept{client.addr=10.244.3.16:59012}:server{orig_dst=10.0.102.110:9200}:poll_profile: linkerd_service_profiles::client: profile received: DestinationProfile { fully_qualified_name: "elasticsearch.ghae.svc.cluster.local", opaque_protocol: false, routes: [], retry_budget: Some(RetryBudget { retry_ratio: 0.2, min_retries_per_second: 10, ttl: Some(Duration { seconds: 10, nanos: 0 }) }), dst_overrides: [WeightedDst { authority: "elasticsearch.ghae.svc.cluster.local.:9200", weight: 10000 }], endpoint: None }
[ 1837.831995s] DEBUG ThreadId(01) outbound:accept{client.addr=10.244.3.16:59012}:server{orig_dst=10.0.102.110:9200}:poll_profile: linkerd_service_profiles::client: profile received: DestinationProfile { fully_qualified_name: "elasticsearch.ghae.svc.cluster.local", opaque_protocol: false, routes: [], retry_budget: Some(RetryBudget { retry_ratio: 0.2, min_retries_per_second: 10, ttl: Some(Duration { seconds: 10, nanos: 0 }) }), dst_overrides: [WeightedDst { authority: "elasticsearch.ghae.svc.cluster.local.:9200", weight: 10000 }], endpoint: None }
[ 1837.832010s] DEBUG ThreadId(01) outbound:accept{client.addr=10.244.3.16:59012}:server{orig_dst=10.0.102.110:9200}:poll_profile: linkerd_service_profiles::client: profile received: DestinationProfile { fully_qualified_name: "elasticsearch.ghae.svc.cluster.local", opaque_protocol: false, routes: [], retry_budget: Some(RetryBudget { retry_ratio: 0.2, min_retries_per_second: 10, ttl: Some(Duration { seconds: 10, nanos: 0 }) }), dst_overrides: [WeightedDst { authority: "elasticsearch.ghae.svc.cluster.local.:9200", weight: 10000 }], endpoint: None }
[ 1837.832051s] DEBUG ThreadId(01) outbound:accept{client.addr=10.244.3.16:59012}:server{orig_dst=10.0.102.110:9200}: linkerd_detect: DetectResult protocol=Some(Http1) elapsed=9.3µs
[ 1837.832063s] DEBUG ThreadId(01) outbound:accept{client.addr=10.244.3.16:59012}:server{orig_dst=10.0.102.110:9200}:http{v=1.x}: linkerd_proxy_http::server: Creating HTTP service version=Http1
[ 1837.832075s] DEBUG ThreadId(01) outbound:accept{client.addr=10.244.3.16:59012}:server{orig_dst=10.0.102.110:9200}:http{v=1.x}:logical{dst=elasticsearch.ghae.svc.cluster.local:9200}: linkerd_cache: Caching new service target=Logical { orig_dst: OrigDstAddr(10.0.102.110:9200), protocol: Http1, profile: Some(..), logical_addr: Some(LogicalAddr(NameAddr { name: "elasticsearch.ghae.svc.cluster.local", port: 9200 })) }
[ 1837.832125s] DEBUG ThreadId(01) outbound:accept{client.addr=10.244.3.16:59012}:server{orig_dst=10.0.102.110:9200}:http{v=1.x}: linkerd_proxy_http::server: Handling as HTTP version=Http1
[ 1837.832201s] DEBUG ThreadId(01) outbound:accept{client.addr=10.244.3.16:59012}:server{orig_dst=10.0.102.110:9200}:http{v=1.x}:logical{dst=elasticsearch.ghae.svc.cluster.local:9200}: linkerd_service_profiles::http::route_request: Updating HTTP routes routes=0
[ 1837.832233s] DEBUG ThreadId(01) outbound:accept{client.addr=10.244.3.16:59012}:server{orig_dst=10.0.102.110:9200}:http{v=1.x}:logical{dst=elasticsearch.ghae.svc.cluster.local:9200}: linkerd_service_profiles::split: Updating targets=[Target { addr: Name(NameAddr { name: "elasticsearch.ghae.svc.cluster.local.", port: 9200 }), weight: 10000 }]
[ 1837.832256s] DEBUG ThreadId(01) outbound:accept{client.addr=10.244.3.16:59012}:server{orig_dst=10.0.102.110:9200}:http{v=1.x}:logical{dst=elasticsearch.ghae.svc.cluster.local:9200}:concrete{addr=elasticsearch.ghae.svc.cluster.local:9200}: linkerd_proxy_api_resolve::resolve: Resolving dst=elasticsearch.ghae.svc.cluster.local:9200 context={"ns":"ghae", "nodeName":"aks-agentpool-32179278-vmss000004"}
[ 1837.832283s] DEBUG ThreadId(01) outbound:accept{client.addr=10.244.3.16:59012}:server{orig_dst=10.0.102.110:9200}:http{v=1.x}:logical{dst=elasticsearch.ghae.svc.cluster.local:9200}: linkerd_timeout::failfast: HTTP Logical service has become unavailable
[ 1837.832475s] DEBUG ThreadId(01) server{orig_dst=10.0.89.66:11211}: linkerd_dns: addrs=[10.244.0.4:8086, 10.244.2.3:8086, 10.244.5.28:8086]
[ 1837.832489s] DEBUG ThreadId(01) server{orig_dst=10.0.89.66:11211}: linkerd_proxy_dns_resolve: addrs=[10.244.0.4:8086, 10.244.2.3:8086, 10.244.5.28:8086]
[ 1837.833354s] DEBUG ThreadId(01) outbound:accept{client.addr=10.244.3.16:59012}:server{orig_dst=10.0.102.110:9200}:http{v=1.x}:logical{dst=elasticsearch.ghae.svc.cluster.local:9200}:concrete{addr=elasticsearch.ghae.svc.cluster.local:9200}: linkerd_timeout::failfast: HTTP Balancer service has become unavailable
[ 1837.833393s] DEBUG ThreadId(01) outbound:accept{client.addr=10.244.3.16:59012}:server{orig_dst=10.0.102.110:9200}:http{v=1.x}:logical{dst=elasticsearch.ghae.svc.cluster.local:9200}:concrete{addr=elasticsearch.ghae.svc.cluster.local:9200}: linkerd_proxy_api_resolve::resolve: Add endpoints=1
[ 1837.833456s] DEBUG ThreadId(01) outbound:accept{client.addr=10.244.3.16:59012}:server{orig_dst=10.0.102.110:9200}:http{v=1.x}:logical{dst=elasticsearch.ghae.svc.cluster.local:9200}:concrete{addr=elasticsearch.ghae.svc.cluster.local:9200}:endpoint{server.addr=10.244.3.18:9200}: linkerd_proxy_http::client: Building HTTP client settings=OrigProtoUpgrade
[ 1837.833469s] DEBUG ThreadId(01) outbound:accept{client.addr=10.244.3.16:59012}:server{orig_dst=10.0.102.110:9200}:http{v=1.x}:logical{dst=elasticsearch.ghae.svc.cluster.local:9200}:concrete{addr=elasticsearch.ghae.svc.cluster.local:9200}:endpoint{server.addr=10.244.3.18:9200}: linkerd_app_outbound::tcp::opaque_transport: Using opaque transport target_port=9200 opaque_port=4143
[ 1837.833481s] DEBUG ThreadId(01) outbound:accept{client.addr=10.244.3.16:59012}:server{orig_dst=10.0.102.110:9200}:http{v=1.x}:logical{dst=elasticsearch.ghae.svc.cluster.local:9200}:concrete{addr=elasticsearch.ghae.svc.cluster.local:9200}:endpoint{server.addr=10.244.3.18:9200}: linkerd_tls::client: Initiating TLS connection server.id=default.ghae.serviceaccount.identity.linkerd.cluster.local
[ 1837.833489s] DEBUG ThreadId(01) outbound:accept{client.addr=10.244.3.16:59012}:server{orig_dst=10.0.102.110:9200}:http{v=1.x}:logical{dst=elasticsearch.ghae.svc.cluster.local:9200}:concrete{addr=elasticsearch.ghae.svc.cluster.local:9200}:endpoint{server.addr=10.244.3.18:9200}: linkerd_proxy_transport::connect: Connecting server.addr=10.244.3.18:4143
[ 1837.833626s] DEBUG ThreadId(01) outbound:accept{client.addr=10.244.3.16:59012}:server{orig_dst=10.0.102.110:9200}:http{v=1.x}:logical{dst=elasticsearch.ghae.svc.cluster.local:9200}:concrete{addr=elasticsearch.ghae.svc.cluster.local:9200}:endpoint{server.addr=10.244.3.18:9200}:h2: linkerd_proxy_transport::connect: Connected local.addr=10.244.3.16:51070 keepalive=Some(10s)
[ 1837.834643s] DEBUG ThreadId(01) outbound:accept{client.addr=10.244.3.16:59012}:server{orig_dst=10.0.102.110:9200}:http{v=1.x}:logical{dst=elasticsearch.ghae.svc.cluster.local:9200}:concrete{addr=elasticsearch.ghae.svc.cluster.local:9200}:endpoint{server.addr=10.244.3.18:9200}:h2: linkerd_tls::client: alpn=Ok("transport.l5d.io/v1")
[ 1837.834681s] DEBUG ThreadId(01) outbound:accept{client.addr=10.244.3.16:59012}:server{orig_dst=10.0.102.110:9200}:http{v=1.x}:logical{dst=elasticsearch.ghae.svc.cluster.local:9200}:concrete{addr=elasticsearch.ghae.svc.cluster.local:9200}:endpoint{server.addr=10.244.3.18:9200}:h2: linkerd_app_outbound::tcp::opaque_transport: Wrote transport header sz=34
[ 1837.834693s] DEBUG ThreadId(01) outbound:accept{client.addr=10.244.3.16:59012}:server{orig_dst=10.0.102.110:9200}:http{v=1.x}:logical{dst=elasticsearch.ghae.svc.cluster.local:9200}:concrete{addr=elasticsearch.ghae.svc.cluster.local:9200}:endpoint{server.addr=10.244.3.18:9200}:h2: linkerd_proxy_transport::metrics: client connection open
[ 1837.834843s] DEBUG ThreadId(01) outbound:accept{client.addr=10.244.3.16:59012}:server{orig_dst=10.0.102.110:9200}:http{v=1.x}:logical{dst=elasticsearch.ghae.svc.cluster.local:9200}:concrete{addr=elasticsearch.ghae.svc.cluster.local:9200}:endpoint{server.addr=10.244.3.18:9200}:orig-proto-upgrade: linkerd_proxy_http::client: method=GET uri=http://elasticsearch:9200/ version=HTTP/1.1
[ 1837.834859s] DEBUG ThreadId(01) outbound:accept{client.addr=10.244.3.16:59012}:server{orig_dst=10.0.102.110:9200}:http{v=1.x}:logical{dst=elasticsearch.ghae.svc.cluster.local:9200}:concrete{addr=elasticsearch.ghae.svc.cluster.local:9200}:endpoint{server.addr=10.244.3.18:9200}:orig-proto-upgrade: linkerd_proxy_http::client: headers={"host": "elasticsearch:9200", "user-agent": "curl/7.58.0", "accept": "*/*", "l5d-dst-canonical": "elasticsearch.ghae.svc.cluster.local:9200"}
[ 1837.834869s] DEBUG ThreadId(01) outbound:accept{client.addr=10.244.3.16:59012}:server{orig_dst=10.0.102.110:9200}:http{v=1.x}:logical{dst=elasticsearch.ghae.svc.cluster.local:9200}:concrete{addr=elasticsearch.ghae.svc.cluster.local:9200}:endpoint{server.addr=10.244.3.18:9200}:orig-proto-upgrade: linkerd_proxy_http::orig_proto: Upgrading request version=HTTP/1.1 absolute_form=false
[ 1837.835154s] DEBUG ThreadId(01) outbound:accept{client.addr=10.244.3.16:59012}:server{orig_dst=10.0.102.110:9200}:http{v=1.x}:logical{dst=elasticsearch.ghae.svc.cluster.local:9200}:concrete{addr=elasticsearch.ghae.svc.cluster.local:9200}:endpoint{server.addr=10.244.3.18:9200}:h2: linkerd_proxy_http::h2: failed error=connection error: broken pipe
[ 1837.835193s] WARN ThreadId(01) outbound:accept{client.addr=10.244.3.16:59012}:server{orig_dst=10.0.102.110:9200}:http{v=1.x}: linkerd_app_core::errors: Failed to proxy request: connection error: broken pipe client.addr=10.244.3.16:59012
[ 1837.835201s] DEBUG ThreadId(01) outbound:accept{client.addr=10.244.3.16:59012}:server{orig_dst=10.0.102.110:9200}:http{v=1.x}: linkerd_app_core::errors: Closing server-side connection
[ 1837.835221s] DEBUG ThreadId(01) outbound:accept{client.addr=10.244.3.16:59012}:server{orig_dst=10.0.102.110:9200}:http{v=1.x}: linkerd_app_core::errors: Handling error with HTTP response status=502 Bad Gateway version=HTTP/1.1
[ 1837.835262s] DEBUG ThreadId(01) outbound:accept{client.addr=10.244.3.16:59012}:server{orig_dst=10.0.102.110:9200}:http{v=1.x}: linkerd_proxy_http::server: The stack is tearing down the connection
[ 1837.835290s] DEBUG ThreadId(01) outbound:accept{client.addr=10.244.3.16:59012}: linkerd_app_core::serve: Connection closed
[ 1837.835305s] DEBUG ThreadId(01) evict{target=Accept { orig_dst: OrigDstAddr(10.0.102.110:9200), protocol: () }}: linkerd_cache: Awaiting idleness
[ 1837.835312s] DEBUG ThreadId(01) evict{target=Logical { orig_dst: OrigDstAddr(10.0.102.110:9200), protocol: Http1, profile: Some(..), logical_addr: Some(LogicalAddr(NameAddr { name: "elasticsearch.ghae.svc.cluster.local", port: 9200 })) }}: linkerd_cache: Awaiting idleness
Anything else we need to know?:
You can reach $POD_ID:9200 with no problem at all. I've already talked to @mateiidavid and he asked me to write the issue (for more context on our previous conversation https://linkerd.slack.com/archives/C89RTCWJF/p1621515416063200).
Please, let me know if there's anything I could do to help
Environment:
- linkerd/namerd version, config files
Linkerd 2.10.1
- Platform, version, and config files (Kubernetes, DC/OS, etc):
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.7", GitCommit:"8ab00ff68a1763b5a506a7073cb8e67b12dfbcd7", GitTreeState:"clean", BuildDate:"2021-03-10T23:40:01Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
- Cloud provider or hardware configuration
Azure Kubernetes Service
Thank you very much!