What happened?
Adding --experimental-enable-distributed-tracing works, but causes a memory leak, of about 1GB per hour in our setup. Instead of expected ~2GB, it got to about 12GB in 7 hours.
What did you expect to happen?
Stable memory usage around 1.8-2.0 GB of RSS.
How can we reproduce it (as minimally and precisely as possible)?
Run with --experimental-enable-distributed-tracing for few hours. It is sufficient to enable it on one member.
Anything else we need to know?
The tracing collector endpoint doesn't need to be configured or listening. Having otelcol on 4317 doesn't change anything (beyond actually making tracing work).
Etcd version (please run commands below)
Details
$ etcd --version
etcd Version: 3.5.0
Git SHA: f99cada05
Go Version: go1.16.6
Go OS/Arch: linux/amd64
$ etcdctl version
etcdctl version: 3.5.0
API version: 3.5
Etcd configuration (command line flags or environment variables)
etcd, Kubernetes, OKD / Openshift 4.9, 3 members.
etcd --experimental-enable-distributed-tracing --logger=zap --log-level=info --initial-advertise-peer-urls=https://10.10.0.102:2380 --cert-file=/etc/kubernetes/static-pod-certs/secrets/etcd-all-certs/etcd-serving-master-1.example.com.crt --key-file=/etc/kubernetes/static-pod-certs/secrets/etcd-all-certs/etcd-serving-master-1.example.com.key --trusted-ca-file=/etc/kubernetes/static-pod-certs/configmaps/etcd-serving-ca/ca-bundle.crt --client-cert-auth=true --peer-cert-file=/etc/kubernetes/static-pod-certs/secrets/etcd-all-certs/etcd-peer-master-1.example.com..crt --peer-key-file=/etc/kubernetes/static-pod-certs/secrets/etcd-all-certs/etcd-peer-master-1.example.com.key --peer-trusted-ca-file=/etc/kubernetes/static-pod-certs/configmaps/etcd-peer-client-ca/ca-bundle.crt --peer-client-cert-auth=true --advertise-client-urls=https://10.10.0.102:2379 --listen-client-urls=https://0.0.0.0:2379,unixs://10.10.0.102:0 --listen-peer-urls=https://0.0.0.0:2380 --metrics=extensive --listen-metrics-urls=https://0.0.0.0:9978
running in cri-o
Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)
Details
[root@master-1 /]# etcdctl member list -w table
+------------------+---------+------------------------------+--------------------------+--------------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+---------+------------------------------+--------------------------+--------------------------+------------+
| 10f8cf6269xxx | started | master-2.example.com | https://10.10.0.103:2380 | https://10.10.0.103:2379 | false |
| a2bbe7149xxx | started | master-1.example.com | https://10.10.0.102:2380 | https://10.10.0.102:2379 | false |
| acb2c160xxx | started | master-0.example.com | https://10.10.0.101:2380 | https://10.10.0.101:2379 | false |
+------------------+---------+------------------------------+--------------------------+--------------------------+------------+
Relevant log output
No fatal issues in the logs.
What happened?
Adding
--experimental-enable-distributed-tracingworks, but causes a memory leak, of about 1GB per hour in our setup. Instead of expected ~2GB, it got to about 12GB in 7 hours.What did you expect to happen?
Stable memory usage around 1.8-2.0 GB of RSS.
How can we reproduce it (as minimally and precisely as possible)?
Run with
--experimental-enable-distributed-tracingfor few hours. It is sufficient to enable it on one member.Anything else we need to know?
The tracing collector endpoint doesn't need to be configured or listening. Having
otelcolon 4317 doesn't change anything (beyond actually making tracing work).Etcd version (please run commands below)
Details
Etcd configuration (command line flags or environment variables)
etcd, Kubernetes, OKD / Openshift 4.9, 3 members.
etcd --experimental-enable-distributed-tracing --logger=zap --log-level=info --initial-advertise-peer-urls=https://10.10.0.102:2380 --cert-file=/etc/kubernetes/static-pod-certs/secrets/etcd-all-certs/etcd-serving-master-1.example.com.crt --key-file=/etc/kubernetes/static-pod-certs/secrets/etcd-all-certs/etcd-serving-master-1.example.com.key --trusted-ca-file=/etc/kubernetes/static-pod-certs/configmaps/etcd-serving-ca/ca-bundle.crt --client-cert-auth=true --peer-cert-file=/etc/kubernetes/static-pod-certs/secrets/etcd-all-certs/etcd-peer-master-1.example.com..crt --peer-key-file=/etc/kubernetes/static-pod-certs/secrets/etcd-all-certs/etcd-peer-master-1.example.com.key --peer-trusted-ca-file=/etc/kubernetes/static-pod-certs/configmaps/etcd-peer-client-ca/ca-bundle.crt --peer-client-cert-auth=true --advertise-client-urls=https://10.10.0.102:2379 --listen-client-urls=https://0.0.0.0:2379,unixs://10.10.0.102:0 --listen-peer-urls=https://0.0.0.0:2380 --metrics=extensive --listen-metrics-urls=https://0.0.0.0:9978running in
cri-oEtcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)
Details
Relevant log output
No fatal issues in the logs.