Possible Memory Leak in Proxy Sidecar

## Bug Report

### What is the issue?

Yesterday one of our meshed services started crashlooping, because the liveness probe failed. Interestingly, because it was not able to resolve the IP of another service (`java.net.UnknownHostException: xxx.default.svc.cluster.local.`). The restarts did not help to recover from the problem and evenually we had to recreate the whole Pod manually. As far as I see, the health checks of our service and the proxy failed, but the proxy was never restarted.

The proxy sidecar of that Pod started to hog memory 5h30m before it failed:

![screenshot 2018-12-20 02](https://user-images.githubusercontent.com/1698599/50279801-16051180-044b-11e9-93dd-0d5ab24f5e40.png)

When the actual failures began, it stopped allocating memory, but spammed this error message with a rate of `~7k` per second:

```
2018-12-19T21:01:33.000210171Z WARN trust_dns_proto::udp::udp_stream could not get next random port, delaying
```

Also it consumed a lot more of CPU time:

![screenshot 2018-12-20 03](https://user-images.githubusercontent.com/1698599/50279809-1bfaf280-044b-11e9-9a27-7f5df582b3de.png)

Since the failure happeneded yesterday night, we just restarted the Pods and did not have time to collect more data and investigate further. I will collect more data, if we see a growing memory consumption again, so hints where to look at would be appreciated.


### How can it be reproduced?

Unfortunately I do not know.


#### `linkerd check` output

```text
% linkerd check
kubernetes-api: can initialize the client..................................[ok]
kubernetes-api: can query the Kubernetes API...............................[ok]
kubernetes-api: is running the minimum Kubernetes API version..............[ok]
linkerd-api: control plane namespace exists................................[ok]
linkerd-api: control plane pods are ready..................................[ok]
linkerd-api: can initialize the client.....................................[ok]
linkerd-api: can query the control plane API...............................[ok]
linkerd-api[kubernetes]: control plane can talk to Kubernetes..............[ok]
linkerd-api[prometheus]: control plane can talk to Prometheus..............[ok]
linkerd-api: no invalid service profiles...................................[ok]
linkerd-version: can determine the latest version..........................[ok]
linkerd-version: cli is up-to-date.........................................[ok]
linkerd-version: control plane is up-to-date...............................[ok]

Status check results are [ok]
```

### Environment

- Kubernetes Version: `v1.11.5`
- Cluster Environment: Built from scratch on AWS.
- Host OS: `CoreOS`
- Linkerd version: `2.1.0-stable`

### Possible solution

* restart the Pod manually

### Additional context

* We only injected Linkerd into two services, yet.
* We added Linkerd yesterday morning to our production system.
* There was another proxy that kept allocation memory
  * We preemtively recreated the Pod, because it was late.
  * It was a different deployment, written in a different language.
* Affecected services get around 10 requests per minute.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible Memory Leak in Proxy Sidecar #2012

Bug Report

What is the issue?

How can it be reproduced?

`linkerd check` output

Environment

Possible solution

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Possible Memory Leak in Proxy Sidecar #2012

Description

Bug Report

What is the issue?

How can it be reproduced?

linkerd check output

Environment

Possible solution

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`linkerd check` output