BUG : MAX 100 HTTP2 streams limit between Istio sidecar

# description
when a client/server application do HTTP2 Streams (formerly gRPC) behind Istio sidecar there is a limit of 100 Streams per TCP connexion

# reproducing
Test is done on a GKE 1.9.6 K8s cluster
Istio is 0.7.1 + auto-inject sidecar
Istio installed using the `istio.yaml` file, without mTLS
auto-inject installed by following the Istio docs

then 
```
git clone git@github.com:prune998/goHelloGrpcStream.git
cd goHelloGrpcStream
git checkout prune/envoy-perf
kubectl apply -f kubernetes/deployment-autoinject-istio.yml
```

This will create 2 deployments, a loadtest-client and a greeter-server, with the Istio sidecar injected.

The loadtest-client pod will not do anything so Istio have some time to start.
Then start the client by hand and watch the logs of the server.
The client will open 400 TCP connexction with a gRPC Stream in each.

```
kubectl exec -ti $(kubectl get pods --selector=app=loadtest-client -o jsonpath='{.items..metadata.name}')  /root/loadtest_client
```

On the server : 
```
greeter-server-658bbbbb9-qsm5x greeter-server time="2018-04-13T18:08:06Z" level=info msg="Reveived Stream message Ping 0" endpoint=SayHelloStream port=7788
greeter-server-658bbbbb9-qsm5x greeter-server time="2018-04-13T18:08:06Z" level=info msg="Reveived Stream message Ping 1" endpoint=SayHelloStream port=7788
greeter-server-658bbbbb9-qsm5x greeter-server time="2018-04-13T18:08:06Z" level=info msg="Reveived Stream message Ping 2" endpoint=SayHelloStream port=7788
greeter-server-658bbbbb9-qsm5x greeter-server time="2018-04-13T18:08:06Z" level=info msg="Reveived Stream message Ping 3" endpoint=SayHelloStream port=7788
...
greeter-server-658bbbbb9-qsm5x greeter-server time="2018-04-13T18:08:09Z" level=info msg="Reveived Stream message Ping 200" endpoint=SayHelloStream port=7788
```

Actually, the number of possible TCP connexions depends on the number of CPU you have on your node. So 2 CPU = 200 streams max. 4CPU -> 400 streams max
If you have 4CPU or more, start the client with a higher number of streams : 
```
kubectl exec -ti $(kubectl get pods --selector=app=loadtest-client -o jsonpath='{.items..metadata.name}') -- /root/loadtest_client -clients=500
```

If you inject the sidecar by hand, you can also add the start option `--concurrency=1`.

# making it work
This issue does not show up where there is only one Istio Sidecar (in fact, one Envoy proxy) in the middle.
Edit the deployment and set the sidecar injection to `false` : 
```
"sidecar.istio.io/inject": "false"
```

Then start the loadtest-client again : 
```
kubectl exec -ti $(kubectl get pods --selector=app=loadtest-client -o jsonpath='{.items..metadata.name}') -- /root/loadtest_client -clients=500
```

```
stern -s 1s greeter-server
...
greeter-server-658bbbbb9-qsm5x greeter-server time="2018-04-13T18:31:59Z" level=info msg="Reveived Stream message Ping 0" endpoint=SayHelloStream port=7788
greeter-server-658bbbbb9-qsm5x greeter-server time="2018-04-13T18:31:59Z" level=info msg="Reveived Stream message Ping 1" endpoint=SayHelloStream port=7788
greeter-server-658bbbbb9-qsm5x istio-proxy [libprotobuf INFO src/istio/mixerclient/check_cache.cc:155] Add a new Referenced for check cache: Absence-keys: source.uid, Exact-keys: context.protocol, destination.service, destination.uid, source.ip,
greeter-server-658bbbbb9-qsm5x greeter-server time="2018-04-13T18:31:59Z" level=info msg="Reveived Stream message Ping 2" endpoint=SayHelloStream port=7788
greeter-server-658bbbbb9-qsm5x greeter-server time="2018-04-13T18:31:59Z" level=info msg="Reveived Stream message Ping 3" endpoint=SayHelloStream port=7788
greeter-server-658bbbbb9-qsm5x greeter-server time="2018-04-13T18:31:59Z" level=info msg="Reveived Stream message Ping 4" endpoint=SayHelloStream port=7788
greeter-server-658bbbbb9-qsm5x greeter-server time="2018-04-13T18:31:59Z" level=info msg="Reveived Stream message Ping 5" endpoint=SayHelloStream port=7788
greeter-server-658bbbbb9-qsm5x greeter-server time="2018-04-13T18:31:59Z" level=info msg="Reveived Stream message Ping 6" endpoint=SayHelloStream port=7788
greeter-server-658bbbbb9-qsm5x greeter-server time="2018-04-13T18:31:59Z" level=info msg="Reveived Stream message Ping 7" endpoint=SayHelloStream port=7788
...
greeter-server-658bbbbb9-qsm5x greeter-server time="2018-04-13T18:32:38Z" level=info msg="Reveived Stream message Ping 398" endpoint=SayHelloStream port=7788
greeter-server-658bbbbb9-qsm5x greeter-server time="2018-04-13T18:32:39Z" level=info msg="Reveived Stream message Ping 399" endpoint=SayHelloStream port=7788
```



# Discussion
- I'm not able to tell where this limitation comes from.
- The client and the server obviously have no limitations when Istio is not involved
- I can reproduce it using the Istio Ingress controler, so I can confirm this only happen in between the 2 `Envoy` processes
- I took tons of `tcpdump`, I added tons of metrics, and I can confirm the limitation is in the first `Istio Proxy`
To be sure everybody understand, here is a diagram.
Let's assume all pods sees only one CPU so only ONE TCP CNX is possible. We start the client with 200 Streams.
![image](https://user-images.githubusercontent.com/1110398/38752661-0b19fd8a-3f2a-11e8-99fb-55430a53f732.png)

This is color coded... follow the lines with the same color to see what happen.

- this is not linked (I think) to circuit breaker as the default minium in Envoy is 1024 and we are far less than that. Also, If I set CB really low to trigger an error, I do get `UO 503` errors
- as HTTP2 protocol specify, when the `client` connect to the `server`, the `server` immediately send the `max_concurrent_streams` SETTINGS back to the client. I can see that happening when the Servers' Istio-Proxy connect to the `greeter-server`. Though, this information does not seem to be shipped to the other (the first) Istio-Proxy. In this scenario, the first proxy should use the `CB` value, or the default value, 1024. Not 100...
- When the client start closing some streams, the other (>100) streams start to connect to the server. So Istio-Proxy queue all the streams, waiting fo the oldest to close.  You can test that by adding the option `-cnxDelay=60s` when starting the client. After a minute you'll see some deconnexions from the old streams and some new `Reveived Stream message` with higher ping IDs.
- When Istio-Proxy starts queueing the streams, I see no counters about that. The `cluster.out.greeter-server.default.svc.cluster.local|grpc-greeter.upstream_cx_active:` for the outgoing cluster is 1 and the `cluster.out.greeter-server.default.svc.cluster.local|grpc-greeter.upstream_rq_active:` is 200. Only the `cluster.out.greeter-server.default.svc.cluster.local|grpc-greeter.upstream_cx_tx_bytes_total:` will reflect the real number of requests, which is 100x97bytes (mostly).

# conclusion
I really don't know where this 100 streams / TCP cnx come from but it's a serious limitation.

I tried going through the Istio and Envoy code but found nothing, and building my own Envoy binary tool almost 3 hours ... :)
Help would be really much appreciated...


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG : MAX 100 HTTP2 streams limit between Istio sidecar #4940

description

reproducing

making it work

Discussion

conclusion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BUG : MAX 100 HTTP2 streams limit between Istio sidecar #4940

Description

description

reproducing

making it work

Discussion

conclusion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions