Skip to content

Failed requests to destination when using headless services #11065

@someone-stole-my-name

Description

@someone-stole-my-name

What is the issue?

When trying to communicate with a headless service using its DNS name, the destination proxy tries to continuously discover its profile (/io.linkerd.proxy.destination.Destination/GetProfile) and fails. These failed requests do not show in tap and, when using 2.13 they do not count towards the "global" success rate of destination see #11066.

Ideally this setup should not result in failed requests to destination, I assume that profiles are not available either since those requests are failing?

How can it be reproduced?

  • Add the manifests from this gist
  • Open viz and go to /namespaces/linkerd/deployments/linkerd-destination

Logs, error output, etc

[    18.992783s] DEBUG ThreadId(01) inbound:accept{client.addr=10.244.0.26:56094}:server{port=8080}:http:http{name=10-244-0-28.server.default.svc.cluster.local:8080}:profile:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: tower::balance::p2c::service: updating from discover
[    18.992895s] DEBUG ThreadId(01) inbound:accept{client.addr=10.244.0.26:56094}:server{port=8080}:http:http{name=10-244-0-28.server.default.svc.cluster.local:8080}:profile:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: tower::buffer::worker: service.ready=true processing request
[    18.993060s] DEBUG ThreadId(01) inbound:accept{client.addr=10.244.0.26:56094}:server{port=8080}:http:http{name=10-244-0-28.server.default.svc.cluster.local:8080}:profile:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}:endpoint{addr=10.244.0.24:8086}:h2:Connection{peer=Client}: h2::codec::framed_write: send frame=Headers { stream_id: StreamId(69), flags: (0x4: END_HEADERS) }
[    18.993115s] DEBUG ThreadId(01) inbound:accept{client.addr=10.244.0.26:56094}:server{port=8080}:http:http{name=10-244-0-28.server.default.svc.cluster.local:8080}:profile:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}:endpoint{addr=10.244.0.24:8086}:h2:Connection{peer=Client}: h2::codec::framed_write: send frame=Data { stream_id: StreamId(69) }
[    18.993126s] DEBUG ThreadId(01) inbound:accept{client.addr=10.244.0.26:56094}:server{port=8080}:http:http{name=10-244-0-28.server.default.svc.cluster.local:8080}:profile:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}:endpoint{addr=10.244.0.24:8086}:h2:Connection{peer=Client}: h2::codec::framed_write: send frame=Data { stream_id: StreamId(69), flags: (0x1: END_STREAM) }
[    18.996186s] DEBUG ThreadId(01) inbound:accept{client.addr=10.244.0.26:56094}:server{port=8080}:http:http{name=10-244-0-28.server.default.svc.cluster.local:8080}:profile:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}:endpoint{addr=10.244.0.24:8086}:h2:Connection{peer=Client}: h2::codec::framed_read: received frame=Headers { stream_id: StreamId(69), flags: (0x5: END_HEADERS | END_STREAM) }
[    18.996308s] DEBUG ThreadId(01) inbound:accept{client.addr=10.244.0.26:56094}:server{port=8080}:http:http{name=10-244-0-28.server.default.svc.cluster.local:8080}:profile: linkerd_tonic_watch: Request failed status=status: Unknown, message: "failed to get pod for hostname 10-244-0-28: no pod found in Endpoints default/server for hostname 10-244-0-28", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Wed, 28 Jun 2023 01:50:06 GMT"} }
[    18.996334s] DEBUG ThreadId(01) inbound:accept{client.addr=10.244.0.26:56094}:server{port=8080}:http:http{name=10-244-0-28.server.default.svc.cluster.local:8080}:profile: linkerd_tonic_watch: Recovering

output of linkerd check -o short

linkerd-version
---------------
‼ cli is up-to-date
    is running version 2.13.4 but the latest stable version is 2.13.5
    see https://linkerd.io/2.13/checks/#l5d-version-cli for hints

control-plane-version
---------------------
‼ control plane is up-to-date
    is running version 2.13.4 but the latest stable version is 2.13.5
    see https://linkerd.io/2.13/checks/#l5d-version-control for hints

linkerd-control-plane-proxy
---------------------------
‼ control plane proxies are up-to-date
    some proxies are not running the current version:
        * linkerd-destination-6b566cf687-mzs2w (stable-2.13.4)
        * linkerd-identity-77bbfc58bb-mgrwh (stable-2.13.4)
        * linkerd-proxy-injector-6f5b6c8798-nw9f5 (stable-2.13.4)
    see https://linkerd.io/2.13/checks/#l5d-cp-proxy-version for hints

linkerd-viz
-----------
‼ viz extension proxies are up-to-date
    some proxies are not running the current version:
        * metrics-api-59c76c4d75-vqkw5 (stable-2.13.4)
        * prometheus-b7b44d965-dpflx (stable-2.13.4)
        * tap-7c8fb95758-tn5zb (stable-2.13.4)
        * tap-injector-586d58cf8f-x8t9r (stable-2.13.4)
        * web-7cf5484879-9g888 (stable-2.13.4)
    see https://linkerd.io/2.13/checks/#l5d-viz-proxy-cp-version for hints

Status check results are √

Environment

  • Kubernetes version: 1.26
  • Environment: EKS & Minikube

Possible solution

No response

Additional context

No response

Would you like to work on fixing this bug?

None

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions