Skip to content

Improving load balancer performance#2491

Merged
arkodg merged 2 commits intomoby:masterfrom
ahjumma:master
Feb 17, 2020
Merged

Improving load balancer performance#2491
arkodg merged 2 commits intomoby:masterfrom
ahjumma:master

Conversation

@ahjumma
Copy link
Contributor

@ahjumma ahjumma commented Dec 16, 2019

TL;DR
Updates to improve load balancer performance at a high load scenario.

Resolves moby/moby#35082

Detailed explanation is provided at moby/moby#35082 (comment)

IPVS module used for swarm load balancer had a performance issue
under a high load situation. conn_reuse_mode=0 sysctl variable can
be set to handle the high load situation by reusing existing
connection entries in the IPVS table.

Under a high load, IPVS module was dropping tcp SYN packets whenever
a port reuse is detected with a connection in TIME_WAIT status forcing
clients to re-initiate tcp connections after request timeout events.
By setting conn_reuse_mode=0, IPVS module avoids special handling of
existing entries in the IPVS connection table.
Along with expire_nodest_conn=1, swarm load balancer can handle
a high load of requests and forward connections to newly joining
backend services.

Signed-off-by: Andrew Kim <[email protected]>
@arkodg
Copy link
Contributor

arkodg commented Dec 30, 2019

thanks for running the tests and contributing ! Had a few questions -

  1. should we also set net/ipv4/vs/expire_quiescent_template to 1 which should expire persistent connections to the real server with weight 0 (after the backend is down thanks to the net/ipv4/vs/expire_nodest_conn=1 setting ), but I see there is an open issue kube-proxy ipvs conn_reuse_mode setting causes errors with high load from single client kubernetes/kubernetes#81775 so would like to understand the negative implications of setting net/ipv4/vs/conn_reuse_mode to 0
    cc: @lbernail

  2. will this setting work for most kernels ?

@geekdave
Copy link

This fix resolved the same issue for me. My findings here: moby/moby#35082 (comment)

Happy to help in any way to move this forward!

@geekdave
Copy link

@arkodg do you think this fix is likely to be introduced at this layer? Trying to understand if we should add scripting to our deployment that adds this after every docker stack deploy:

sudo nsenter --net=/var/run/docker/netns/{your_load_balancer} sysctl -w net.ipv4.vs.conn_reuse_mode=0
sudo nsenter --net=/var/run/docker/netns/{your_load_balancer} sysctl -w net.ipv4.vs.expire_nodest_conn=1

...or whether we should wait for a fix here. Any guidance would be much appreciated!

@ahjumma
Copy link
Contributor Author

ahjumma commented Feb 13, 2020

@geekdave
That's our current solution to this problem.

default_network_id=$(docker network ls | grep "myservice_default" | awk '{print $1}')
sudo nsenter --net=/var/run/docker/netns/lb_${default_network_id:0:9} sysctl -w net.ipv4.vs.conn_reuse_mode=0
sudo nsenter --net=/var/run/docker/netns/lb_${default_network_id:0:9} sysctl -w net.ipv4.vs.expire_nodest_conn=1

Once your stack is running, consecutive docker stack deploy is unnecessary since your load balancer sandbox will be re-used throughout the lifetime of your docker stack.

@geekdave
Copy link

Thanks for this, @ahjumma ! We have some tooling that performs an initial stack deploy on fresh machines. I'm considering adding some scripting that waits in a loop until the ${service}_default network is created before running your commands. Does that seem like the right approach?

@lbernail
Copy link
Contributor

👋 In kube-proxy we set:

  • net.ipv4.vs.conn_reuse_mode=0
  • net.ipv4.vs.expire_nodest_conn=1
  • net.ipv4.vs.expire_quiescent_template=1

expire_nodest_conn is required to avoid blackholing traffic to existing destinations after removing the RealServer (RST on next packet to backend for TCP, ICMP destination unreachable for UDP)

expire_quiescent_template is great when you set weight to 0 to avoid creating new connections to a real server with weight 0

conn_reuse_mode is great for performances: if you set it to 1, a new connection reusing the same 5 tuples will get its first packet dropped, which means the new connection will wait for 1s to retry.
However, it leads to some problems in kube-proxy because if ports are reused fast enough a realserver always has Active/InActive Connections and is never garbage collected (we currently asynchronously gc backends with weight 0 when they have no connections, but we are looking into ways to forcefully delete the backend after a while to avoid this situation)

@ahjumma
Copy link
Contributor Author

ahjumma commented Feb 13, 2020

thanks for running the tests and contributing ! Had a few questions -

  1. should we also set net/ipv4/vs/expire_quiescent_template to 1 which should expire persistent connections to the real server with weight 0 (after the backend is down thanks to the net/ipv4/vs/expire_nodest_conn=1 setting ), but I see there is an open issue kubernetes/kubernetes#81775 so would like to understand the negative implications of setting net/ipv4/vs/conn_reuse_mode to 0
    cc: @lbernail
  2. will this setting work for most kernels ?

Hi @arkodg.

It does seem net.ipv4.vs.expire_quiescent_template=1 would be also helpful.

Regarding your concern raised by possible problematic behaviours of net.ipv4.vs.conn_reuse_mode=0, net.ipv4.vs.expire_nodest_conn=1 is what allows by-passing the problem.
This has been acknowledged in kubernetes/kubernetes/issues/81775#issuecomment-542542167 you have referenced .

For the second question, what's the earliest linux kernel that Docker Swarm needs to support?

@arkodg
Copy link
Contributor

arkodg commented Feb 14, 2020

thanks @lbernail @ahjumma
can we also include net.ipv4.vs.expire_quiescent_template=1

Further improving load balancer performance by expiring
connections to servers with weights set to 0.

Signed-off-by: Andrew Kim <[email protected]>
@ahjumma
Copy link
Contributor Author

ahjumma commented Feb 14, 2020

Hi @arkodg ,

According to @lbernail's previous finding, kernel 2.6 is supported.

@arkodg arkodg requested review from euanh and selansen February 14, 2020 21:59
@arkodg
Copy link
Contributor

arkodg commented Feb 14, 2020

can you ptal as well @lbernail

@selansen
Copy link
Contributor

will look into it soon.

Copy link
Contributor

@selansen selansen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went through entire discussion . Looks good to me.

@arkodg arkodg merged commit 6659f7f into moby:master Feb 17, 2020
thaJeztah added a commit to thaJeztah/docker that referenced this pull request Feb 17, 2020
full diff: moby/libnetwork@feeff4f...6659f7f

includes:

- moby/libnetwork#2317 Allow bridge net driver to skip IPv4 configuration of bridge interface
    - adds support for a `com.docker.network.bridge.inhibit_ipv4` label/configuration
    - addresses moby#37430 Prevent bridge network driver from setting IPv4 address on bridge interface
- moby/libnetwork#2454 Support for com.docker.network.host_ipv4 driver label
    - addresses moby#30053 Unable to choose outbound (external) IP for containers
- moby/libnetwork#2491 Improving load balancer performance
    - addresses moby#35082 [SWARM] Very poor performance for ingress network with lots of parallel requests

Signed-off-by: Sebastiaan van Stijn <[email protected]>
@cuongnv4sapo
Copy link

cuongnv4sapo commented Jun 19, 2020

I did the same steps but the ingress LB still low. Did I missed any steps
sudo nsenter --net=/var/run/docker/netns/lb_o7wjqxluj sysctl -w net.ipv4.vs.conn_reuse_mode = 0
sudo nsenter --net=/var/run/docker/netns/lb_o7wjqxluj sysctl -w net.ipv4.vs.expire_nodest_conn = 1
sudo nsenter --net=/var/run/docker/netns/lb_o7wjqxluj sysctl -w net.ipv4.vs.expire_quiescent_template = 1

Do I have to restart server!?
Or do I have to set same configs like above on the host deployed swarm!?

evol262 pushed a commit to evol262/moby that referenced this pull request Jan 12, 2022
relates to moby#35082, moby/libnetwork#2491

Previously, values for expire_quiescent_template, conn_reuse_mode,
and expire_nodest_conn were set only system-wide. Also apply them
for new lb_* and ingress_sbox sandboxes, so they are appropriately
propagated

Signed-off-by: Ryan Barry [email protected]

Signed-off-by: Ryan Barry <[email protected]>
evol262 pushed a commit to evol262/moby that referenced this pull request Jan 14, 2022
relates to moby#35082, moby/libnetwork#2491

Previously, values for expire_quiescent_template, conn_reuse_mode,
and expire_nodest_conn were set only system-wide. Also apply them
for new lb_* and ingress_sbox sandboxes, so they are appropriately
propagated

Signed-off-by: Ryan Barry <[email protected]>
evol262 pushed a commit to evol262/moby that referenced this pull request May 17, 2022
relates to moby#35082, moby/libnetwork#2491

Previously, values for expire_quiescent_template, conn_reuse_mode,
and expire_nodest_conn were set only system-wide. Also apply them
for new lb_* and ingress_sbox sandboxes, so they are appropriately
propagated

Signed-off-by: Ryan Barry <[email protected]>
evol262 pushed a commit to evol262/libnetwork that referenced this pull request May 31, 2022
Pull moby/moby#43146 and
moby/moby#43670 into 20.10

relates to #35082, moby#2491

Previously, values for expire_quiescent_template, conn_reuse_mode,
and expire_nodest_conn were set only system-wide. Also apply them
for new lb_* and ingress_sbox sandboxes, so they are appropriately
propagated

Signed-off-by: Ryan Barry <[email protected]>
evol262 pushed a commit to evol262/libnetwork that referenced this pull request May 31, 2022
Pull moby/moby#43146 and
moby/moby#43670 into 20.10

relates to #35082, moby#2491

Previously, values for expire_quiescent_template, conn_reuse_mode,
and expire_nodest_conn were set only system-wide. Also apply them
for new lb_* and ingress_sbox sandboxes, so they are appropriately
propagated

Signed-off-by: Ryan Barry <[email protected]>
evol262 pushed a commit to evol262/libnetwork that referenced this pull request May 31, 2022
Pull moby/moby#43146 and
moby/moby#43670 into 20.10

relates to #35082, moby#2491

Previously, values for expire_quiescent_template, conn_reuse_mode,
and expire_nodest_conn were set only system-wide. Also apply them
for new lb_* and ingress_sbox sandboxes, so they are appropriately
propagated

Signed-off-by: Ryan Barry <[email protected]>
evol262 pushed a commit to evol262/libnetwork that referenced this pull request Jun 1, 2022
Pull moby/moby#43146 and
moby/moby#43670 into 20.10

relates to #35082, moby#2491

Previously, values for expire_quiescent_template, conn_reuse_mode,
and expire_nodest_conn were set only system-wide. Also apply them
for new lb_* and ingress_sbox sandboxes, so they are appropriately
propagated

Signed-off-by: Ryan Barry <[email protected]>
Co-authored-by: Bjorn Neergaard <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[SWARM] Very poor performance for ingress network with lots of parallel requests

6 participants