Skip to content

Apply peformance tuning to new sandboxes also#43146

Merged
thaJeztah merged 1 commit intomoby:masterfrom
evol262:fix/ingress-namespace-performance
May 18, 2022
Merged

Apply peformance tuning to new sandboxes also#43146
thaJeztah merged 1 commit intomoby:masterfrom
evol262:fix/ingress-namespace-performance

Conversation

@evol262
Copy link

@evol262 evol262 commented Jan 12, 2022

relates to #35082, moby/libnetwork#2491

Previously, values for expire_quiescent_template, conn_reuse_mode,
and expire_nodest_conn were set only system-wide. Also apply them
for new lb_* and ingress_sbox sandboxes, so they are appropriately
propagated

Signed-off-by: Ryan Barry [email protected]

Signed-off-by: Ryan Barry [email protected]

- What I did

- How I did it

- How to verify it

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

@thaJeztah
Copy link
Member

Looks like this replaces #42897 (looks like there was a bad rebase on that one?)

I see there's a typo in the commit message (peformance instead of performance); if you're updating, perhaps you could fix that one as well (in case we need to search in git history).

I'm also curious: why move the values inline here instead of retaining the loadBalancerConfig package-level variable?

I think that was my suggestion in f014fa2#r718371177, where I initially thought both were setting the same options. The variable was only used in a single place, so I suggested to either move them closer to where they're used, or just to inline them (to reduce the cognitive overload of finding "where" they're used, and the reverse ("what options are in loadBalancerConfig and ingressConfig ?)

That said, the earlier version did set two different options in two places; see f014fa2, so I'm a bit confused now why this PR is different from the earlier iteration; I think this PR is missing some of the changes from f014fa2 / #42897

@evol262 evol262 force-pushed the fix/ingress-namespace-performance branch from 33379c5 to 9c471df Compare January 14, 2022 00:48
@evol262 evol262 requested a review from samuelkarp January 14, 2022 00:49
@evol262
Copy link
Author

evol262 commented Jan 14, 2022

Yep -- the rebase did clobber the old PR when trying to sign off one of the commits.

The change to libnetwork/controller.go also got missed when I pulled it over. Added now. @thaJeztah is exactly right about the inlining -- it was a suggestion from the previous PR

@thaJeztah thaJeztah added area/networking Networking area/performance kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. status/2-code-review labels Jan 20, 2022
@evol262
Copy link
Author

evol262 commented May 12, 2022

@thaJeztah @samuelkarp @corhere can I get a re-review?

@thaJeztah thaJeztah added this to the 22.06.0 milestone May 12, 2022
Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes look ok, but looks like one change is in the wrong commit; can you fix that?

relates to moby#35082, moby/libnetwork#2491

Previously, values for expire_quiescent_template, conn_reuse_mode,
and expire_nodest_conn were set only system-wide. Also apply them
for new lb_* and ingress_sbox sandboxes, so they are appropriately
propagated

Signed-off-by: Ryan Barry <[email protected]>
@evol262 evol262 force-pushed the fix/ingress-namespace-performance branch from 9c471df to 0dd3a2e Compare May 17, 2022 19:45
@evol262 evol262 requested a review from thaJeztah May 17, 2022 19:46
@evol262
Copy link
Author

evol262 commented May 17, 2022

CI failure looks unrelated here. Can someone kick it?

@thaJeztah
Copy link
Member

(kicked CI)

@corhere is #43146 (comment) ok with you, or do you still prefer to have it addressed?

@corhere
Copy link
Contributor

corhere commented May 17, 2022

@thaJeztah I would prefer to have it addressed, but I wouldn't block merging on it It's been addressed; merge away! :shipit:

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (kicked CI once more just to be sure, see my comment higher up)

@thaJeztah
Copy link
Member

Oh; I was wrong; looks like it is a known flaky (but on specific architectures); #41561

For posterity; failure was;

=== RUN   TestNetworkLocalhostTCPNat
    nat_test.go:55: assertion failed: hi yall (msg string) !=  (string)
--- FAIL: TestNetworkLocalhostTCPNat (0.99s)

anyway, let's see if CI goes green and get this in

@thaJeztah thaJeztah dismissed samuelkarp’s stale review May 18, 2022 16:28

Looks like #43146 (comment) was addressed

@thaJeztah
Copy link
Member

Bringing this one in; CI is green now 👍

@thaJeztah thaJeztah merged commit 1aea4c2 into moby:master May 18, 2022
@xinfengliu
Copy link
Contributor

xinfengliu commented May 20, 2022

I found another issue. The sysctl settings are not applied after the 1st time starting docker daemon since OS boot. Then after systemctl restart docker, the settings are applied.

<right after booting OS>
docker@ub2204-1:~$ sudo nsenter --net=/var/run/docker/netns/lb_wgyv3nagr sysctl net.ipv4.vs.expire_nodest_conn
net.ipv4.vs.expire_nodest_conn = 0
docker@ub2204-1:~$ sudo sysctl net.ipv4.vs.expire_nodest_conn
net.ipv4.vs.expire_nodest_conn = 0

<restart docker daemon>
docker@ub2204-1:~$ sudo systemctl restart docker

docker@ub2204-1:~$ sudo nsenter --net=/var/run/docker/netns/lb_wgyv3nagr sysctl net.ipv4.vs.expire_nodest_conn
net.ipv4.vs.expire_nodest_conn = 1
docker@ub2204-1:~$ sudo sysctl net.ipv4.vs.expire_nodest_conn
net.ipv4.vs.expire_nodest_conn = 1

(Above testing was run with the dockerd binary including this PR.)

This means the settings would get lost after the user reboots the system and they probably would not notice that.

@thaJeztah
Copy link
Member

@xinfengliu nice find! Looks like there's some other code-path that should also apply it.

Could you perhaps open a ticket for that? Just to be sure it's not forgotten (easy to miss comments on merged pull requests); contributions welcome as well of course!

@xinfengliu
Copy link
Contributor

xinfengliu commented May 22, 2022

@thaJeztah

After further checking, I find it is because ip_vs module is not loaded by default in my system, docker loads ip_vs on-demand in addServiceBinding, but docker tries to set the ipvs sysctl parameters for the first time in NewSandbox, the ip_vs module has not been loaded yet.

May 22 13:22:53 ub2204-1 dockerd[1053]: time="2022-05-22T13:22:53.350537245Z" level=error msg="error reading the kernel parameter net.ipv4.vs.conn_reuse_mode" error="open /proc/sys/net/ipv4/vs/conn_reuse_mode: no such file or directory"
May 22 13:22:53 ub2204-1 dockerd[1053]: time="2022-05-22T13:22:53.350561161Z" level=error msg="error reading the kernel parameter net.ipv4.vs.expire_nodest_conn" error="open /proc/sys/net/ipv4/vs/expire_nodest_conn: no such file or directory"
May 22 13:22:53 ub2204-1 dockerd[1053]: time="2022-05-22T13:22:53.350567369Z" level=error msg="error reading the kernel parameter net.ipv4.vs.expire_quiescent_template" error="open /proc/sys/net/ipv4/vs/expire_quiescent_template: no such file or directory"
May 22 13:22:53 ub2204-1 dockerd[1053]: time="2022-05-22T13:22:53.350577244Z" level=error msg="error reading the kernel parameter net.ipv4.vs.conn_reuse_mode" error="open /proc/sys/net/ipv4/vs/conn_reuse_mode: no such file or directory"

Not sure if we should create an moby issue for it (addressing it in moby code). I guess we could use sync.Once in addLBBackend to set those sysctl parameters.

@evol262
Copy link
Author

evol262 commented May 23, 2022

Even if we did use sync.Once, there's a basic issue of idempotency. It should not be the case that "the first time I start dockerd, it behaves differently".

Other modules are loaded (nf_conntrack[_netlink]) elsewhere as part of bringup to avoid this, and the same thing would make sense here. I'll find a good point to inject it.

evol262 pushed a commit to evol262/moby that referenced this pull request May 31, 2022
Previously, with the patch from moby#43146, it was possible for a
network configured with a single ingress or load balancer on a
distribution which does not have the `ip_vs` kernel module loaded
by default to try to apply sysctls which did not exist yet, and
subsequently dynamically load the module as part of ipvs/netlink.go.

This module is vendored, and not a great place to try to tie back
into core libnetwork functionality, so also ensure that the sysctls
(which are idempotent) are called after ingress/lb creation once
`ipvs` has been initialized.

Signed-off-by: Ryan Barry <[email protected]>
evol262 pushed a commit to evol262/libnetwork that referenced this pull request May 31, 2022
Pull moby/moby#43146 and
moby/moby#43670 into 20.10

relates to #35082, moby#2491

Previously, values for expire_quiescent_template, conn_reuse_mode,
and expire_nodest_conn were set only system-wide. Also apply them
for new lb_* and ingress_sbox sandboxes, so they are appropriately
propagated

Signed-off-by: Ryan Barry <[email protected]>
evol262 pushed a commit to evol262/libnetwork that referenced this pull request May 31, 2022
Pull moby/moby#43146 and
moby/moby#43670 into 20.10

relates to #35082, moby#2491

Previously, values for expire_quiescent_template, conn_reuse_mode,
and expire_nodest_conn were set only system-wide. Also apply them
for new lb_* and ingress_sbox sandboxes, so they are appropriately
propagated

Signed-off-by: Ryan Barry <[email protected]>
evol262 pushed a commit to evol262/libnetwork that referenced this pull request May 31, 2022
Pull moby/moby#43146 and
moby/moby#43670 into 20.10

relates to #35082, moby#2491

Previously, values for expire_quiescent_template, conn_reuse_mode,
and expire_nodest_conn were set only system-wide. Also apply them
for new lb_* and ingress_sbox sandboxes, so they are appropriately
propagated

Signed-off-by: Ryan Barry <[email protected]>
evol262 pushed a commit to evol262/libnetwork that referenced this pull request Jun 1, 2022
Pull moby/moby#43146 and
moby/moby#43670 into 20.10

relates to #35082, moby#2491

Previously, values for expire_quiescent_template, conn_reuse_mode,
and expire_nodest_conn were set only system-wide. Also apply them
for new lb_* and ingress_sbox sandboxes, so they are appropriately
propagated

Signed-off-by: Ryan Barry <[email protected]>
Co-authored-by: Bjorn Neergaard <[email protected]>
@evol262 evol262 deleted the fix/ingress-namespace-performance branch March 9, 2023 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/networking Networking area/performance kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. status/4-merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants