-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
What is the issue?
We use the opentelemetry collector as a DaemonSet and send traces from our pods to the collector using the node ip address. When we restart the collector it receives a new Pod IP, but the Node IP remains the same. Some proxies still try to connect to the old Pod IP and, therefore, fail to reconnect.
Example:
Emojiservice is supposed to send traces to the Collector service. The collector runs as a DaemonSet and exposes port 4317 (gRPC). We inject the NodeIP via the downwards API into the emojiservice to make it send traces to 10.167.0.1:4317. This is resolved to the PodIP of the Collector 10.169.10.1:4317.
Now, we restart the Daemonset and the PodIP of the Collector changes to 10.169.11.1. I still see debug log entries from the Emojiservice sidecar that try to connect to the old IP, though (see log example).
I believe it is related to #8956, but I don't know how and if I can use the diagnostics command for IP based connections.
How can it be reproduced?
Call instances of a DaemonSet using the NodeIP address from the K8s downwards API. Restart the pods of the Daemonset to assign a new PodIP to them. See that connections using the NodeIP still use the old PodIP and fail to be re-established.
Logs, error output, etc
{
"id": "AQAAAYcDrhY0uLmpXQAAAABBWWNEcmgya0FBQjMxcXIxZEpFSnJBQTQ",
"content": {
"timestamp": "2023-03-21T10:19:13.332Z",
"tags": [
"short_image:proxy",
"kube_container_name:linkerd-proxy",
"image_tag:stable-2.12.4",
"pod_phase:running",
"source:proxy",
"kube_ownerref_kind:replicaset",
"container_name:linkerd-proxy",
"cloud_provider:gcp"
],
"attributes": {
"threadId": "ThreadId(1)",
"spans": [
{
"name": "outbound"
},
{
"name": "proxy",
"addr": "10.167.0.1:4317"
}
],
"level": "DEBUG",
"fields": {
"server": {
"addr": "10.169.10.1:4317"
},
"message": "Connecting"
},
"timestamp": "[ 75611.397246s]",
"target": "linkerd_proxy_transport::connect"
}
}
}output of linkerd check -o short
» linkerd check -o short
Status check results are √
Environment
- Kubernetes Version: v1.24.10-gke.2300
- Environment: GKE
- LinkerD Version: stable-2.12.4
Possible solution
No response
Additional context
No response
Would you like to work on fixing this bug?
None