-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
What is the issue?
Since the update to 2.13.5 we experience sporadic issues where the linkerd-proxy seems to connect to endpoints that don't exist anymore since days.
We think it's triggered by an issue to connect to linkerd-destination (which is a different problem).
Restarting linkerd-destination solves the issue.
We also compared endpoints from linkerd diagnostics endpoints myservice... with kubectl get endpoints myservice and they seem to match. So we don't think that linkerd-destination contains stale data but rather the proxies.
The affected proxies did not have any pending endpoints after the issue, but we currently don't have the data to understand what it looked like during the issue
❯ curl -s localhost:8000/metrics | grep endpoints | grep myservice
outbound_http_balancer_endpoints{parent_group="core",parent_kind="Service",parent_namespace="prod",parent_name="myservice",parent_port="80",parent_section_name="",backend_group="",backend_kind="default",backend_namespace="",backend_name="service",backend_port="",backend_section_name="",endpoint_state="pending"} 0
outbound_http_balancer_endpoints{parent_group="core",parent_kind="Service",parent_namespace="prod",parent_name="myservice",parent_port="80",parent_section_name="",backend_group="",backend_kind="default",backend_namespace="",backend_name="service",backend_port="",backend_section_name="",endpoint_state="ready"} 419
How can it be reproduced?
Not clear. We updated to 2.13.5 and since then had 3 issues over the course of a few days.
Logs, error output, etc
{
"message": "HTTP/1.1 request failed",
"attributes": {
"threadId": "ThreadId(1)",
"spans": [
{
"name": "outbound"
},
{
"name": "proxy",
"addr": "172.20.207.194:80"
},
{
"name": "rescue",
"client": {
"addr": "10.250.154.125:56774"
}
}
],
"level": "INFO",
"fields": {
"error": "logical service myservice.prod.svc.cluster.local:80: Service.myservice:80: endpoint 10.250.162.250:80: operation was canceled: connection was not ready"
},
"timestamp": "[104343.356432s]",
"target": "linkerd_app_core::errors::respond"
}
}
}
{
"message": "Unexpected error",
"attributes": {
"threadId": "ThreadId(1)",
"spans": [
{
"name": "outbound"
},
{
"name": "proxy",
"addr": "172.20.207.194:80"
},
{
"name": "rescue",
"client": {
"addr": "10.250.154.125:56774"
}
}
],
"level": "WARN",
"fields": {
"error": "logical service myservice.prod.svc.cluster.local:80: Service.prod.myservice:80: endpoint 10.250.162.250:80: operation was canceled: connection was not ready"
},
"timestamp": "[104343.356447s]",
"target": "linkerd_app_outbound::http::server"
}
}
}
{
"message": "Service failed",
"attributes": {
"threadId": "ThreadId(1)",
"spans": [
{
"name": "outbound"
},
{
"name": "proxy",
"addr": "172.20.207.194:80"
},
{
"ns": "prod",
"port": "80",
"name": "service"
},
{
"name": "endpoint",
"addr": "10.250.162.250:80"
}
],
"level": "WARN",
"fields": {
"error": "channel closed"
},
"timestamp": "[104343.925284s]",
"target": "linkerd_reconnect"
}
}
}
{
"message": "Failed to connect",
"attributes": {
"threadId": "ThreadId(1)",
"spans": [
{
"name": "outbound"
},
{
"name": "proxy",
"addr": "172.20.207.194:80"
},
{
"ns": "prod",
"port": "80",
"name": "service"
},
{
"name": "endpoint",
"addr": "10.250.162.250:80"
}
],
"level": "WARN",
"fields": {
"error": "Connection refused (os error 111)"
},
"timestamp": "[104344.409821s]",
"target": "linkerd_reconnect"
}
}
}
output of linkerd check -o short
linkerd-identity
----------------
‼ issuer cert is valid for at least 60 days
issuer certificate will expire on 2023-11-08T07:32:15Z
see https://linkerd.io/2.14/checks/#l5d-identity-issuer-cert-not-expiring-soon for hints
linkerd-webhooks-and-apisvc-tls
-------------------------------
‼ proxy-injector cert is valid for at least 60 days
certificate will expire on 2023-10-18T13:43:33Z
see https://linkerd.io/2.14/checks/#l5d-proxy-injector-webhook-cert-not-expiring-soon for hints
‼ sp-validator cert is valid for at least 60 days
certificate will expire on 2023-10-21T08:56:13Z
see https://linkerd.io/2.14/checks/#l5d-sp-validator-webhook-cert-not-expiring-soon for hints
‼ policy-validator cert is valid for at least 60 days
certificate will expire on 2023-11-08T09:31:39Z
see https://linkerd.io/2.14/checks/#l5d-policy-validator-webhook-cert-not-expiring-soon for hints
control-plane-version
---------------------
‼ control plane is up-to-date
is running version 2.13.5 but the latest stable version is 2.14.1
see https://linkerd.io/2.14/checks/#l5d-version-control for hints
‼ control plane and cli versions match
control plane running stable-2.13.5 but cli running stable-2.14.1
see https://linkerd.io/2.14/checks/#l5d-version-control for hints
linkerd-control-plane-proxy
---------------------------
‼ control plane proxies are up-to-date
some proxies are not running the current version:
* linkerd-destination-5b5ddcf5d4-45glv (v2.207.0)
* linkerd-destination-5b5ddcf5d4-94j2x (v2.207.0)
* linkerd-destination-5b5ddcf5d4-g95bm (v2.207.0)
* linkerd-destination-5b5ddcf5d4-gxvtz (v2.207.0)
* linkerd-destination-5b5ddcf5d4-jmfn8 (v2.207.0)
* linkerd-identity-9559b4d7f-96kv7 (v2.207.0)
* linkerd-identity-9559b4d7f-gqddq (v2.207.0)
* linkerd-identity-9559b4d7f-gx4bz (v2.207.0)
* linkerd-identity-9559b4d7f-sfkb7 (v2.207.0)
* linkerd-identity-9559b4d7f-sl7ck (v2.207.0)
* linkerd-proxy-injector-6688d4487f-b6w99 (v2.207.0)
* linkerd-proxy-injector-6688d4487f-bmgqm (v2.207.0)
* linkerd-proxy-injector-6688d4487f-cf2ss (v2.207.0)
* linkerd-proxy-injector-6688d4487f-ffzlj (v2.207.0)
* linkerd-proxy-injector-6688d4487f-n8hxk (v2.207.0)
* linkerd-sp-validator-dbcc64849-7846s (v2.207.0)
* linkerd-sp-validator-dbcc64849-7d6kn (v2.207.0)
* linkerd-sp-validator-dbcc64849-7pqw7 (v2.207.0)
* linkerd-sp-validator-dbcc64849-92jzx (v2.207.0)
* linkerd-sp-validator-dbcc64849-jkws8 (v2.207.0)
see https://linkerd.io/2.14/checks/#l5d-cp-proxy-version for hints
‼ control plane proxies and cli versions match
linkerd-destination-5b5ddcf5d4-45glv running v2.207.0 but cli running stable-2.14.1
see https://linkerd.io/2.14/checks/#l5d-cp-proxy-cli-version for hints
linkerd-viz
-----------
‼ linkerd-viz pods are injected
could not find proxy container for prometheus-scrape-1-5585795fbd-l4sn5 pod
see https://linkerd.io/2.14/checks/#l5d-viz-pods-injection for hints
‼ viz extension pods are running
container "linkerd-proxy" in pod "prometheus-scrape-1-5585795fbd-l4sn5" is not ready
see https://linkerd.io/2.14/checks/#l5d-viz-pods-running for hints
‼ viz extension proxies are up-to-date
some proxies are not running the current version:
* grafana-864b6b8ddb-jxlpk (v2.207.0)
* metrics-api-5484cdf977-llg6t (v2.207.0)
* tap-58654c968b-7q5hm (v2.207.0)
* tap-injector-55597d88c7-xd7wp (v2.207.0)
* web-cbdb85945-b5s27 (v2.207.0)
see https://linkerd.io/2.14/checks/#l5d-viz-proxy-cp-version for hints
‼ viz extension proxies and cli versions match
grafana-864b6b8ddb-jxlpk running v2.207.0 but cli running stable-2.14.1
see https://linkerd.io/2.14/checks/#l5d-viz-proxy-cli-version for hints
linkerd-smi
-----------
‼ Linkerd extension command linkerd-smi exists
exec: "linkerd-smi": executable file not found in $PATH
see https://linkerd.io/2.14/checks/#extensions for hints
Status check results are √
Environment
- EKS
- Kubernetes 1.24
- Linkerd 2.13.5
Possible solution
No response
Additional context
No response
Would you like to work on fixing this bug?
maybe