network: Use tc filtering rules in bridge mode#827
network: Use tc filtering rules in bridge mode#827sboeuf merged 4 commits intokata-containers:masterfrom
Conversation
|
/test |
| return nil | ||
| } | ||
|
|
||
| func addQdiscIngress(index int) error { |
There was a problem hiding this comment.
What happens if index < 0? Does netlink.QdiscAdd() handle that? Is it a valid index? (might be given QdiscAttrs.Index is an int wheres all other members are uint32?!?)
There was a problem hiding this comment.
@jodh-intel netlink.QdiscAdd will handle the case where the network index is negative.
It will error out if it cannot find an interface with the specified index.
| return nil | ||
| } | ||
|
|
||
| func addRedirectTCFilter(sourceIndex, destIndex int) error { |
There was a problem hiding this comment.
Same question - what if either of these two params are negative?
a735361 to
2e26a2b
Compare
|
@sboeuf I was planning to replace the bridge mode initially. I pushed the changes for testing. |
|
cc @mcastelino PTAL. |
mcastelino
left a comment
There was a problem hiding this comment.
Minor comment. We need to undo the qdisk we added on ingress on the CNI/CNM created interface.
virtcontainers/network.go
Outdated
| return nil | ||
| } | ||
|
|
||
| func tcRemoveNetwork(endpoint Endpoint) error { |
There was a problem hiding this comment.
@amshinde you need to remove the ingress qdisc you added to the veth/incoming interface. As part of the cleanup you should remove that too.
So you need to undo what do you did before
if err := addQdiscIngress(attrs.Index); err != nil {
return err
}
|
Some more context for this review |
|
@bergwolf Can you take a look as well, I think you guys are using |
2e26a2b to
7bd8355
Compare
|
/retest |
| return err | ||
| } | ||
|
|
||
| if err := removeRedirectTCFilter(link); err != nil { |
There was a problem hiding this comment.
@amshinde you should not need to do this. Removing the qdisk removes the filters. But does not hurt.
7bd8355 to
658af9a
Compare
|
/retest |
Codecov Report
@@ Coverage Diff @@
## master #827 +/- ##
=========================================
Coverage ? 66.09%
=========================================
Files ? 87
Lines ? 10705
Branches ? 0
=========================================
Hits ? 7076
Misses ? 2897
Partials ? 732 |
|
@grahamwhaley Is the metrics failure here a valid failure? |
|
@amshinde probably not - we had a bunch of fails for the last 36h - I have tweaked the settings as per XXX to avoid false failures whilst I look at how to get things more stable. but, as we discussed, you are better off running the network cpu test and/or other network tests locally to check if you have made any network impact. I'm half way through adding that test to the report gen btw. |
|
s/XXX/ kata-containers/ci#72 / |
virtcontainers/network.go
Outdated
| } | ||
| return untapNetworkPair(endpoint) | ||
| case NetXConnectTCFilterModel: | ||
| netPair.NetInterworkingModel = NetXConnectTCFilterModel |
There was a problem hiding this comment.
This is not needed. The switch already checks this variable, there's no reason to reassign it.
virtcontainers/network.go
Outdated
| return nil | ||
| } | ||
|
|
||
| func tcRedirectNetwork(endpoint Endpoint, numCPUs uint32, disableVhostNet bool) error { |
There was a problem hiding this comment.
Just a comment about naming here. I think it'd be clearer to call it setupTCFiltering().
virtcontainers/network.go
Outdated
| return nil | ||
| } | ||
|
|
||
| func tcRemoveNetwork(endpoint Endpoint) error { |
virtcontainers/network.go
Outdated
| case "enlightened": | ||
| *n = NetXConnectEnlightenedModel | ||
| return nil | ||
| case "tcfilter": |
There was a problem hiding this comment.
I know it's not only related to this PR, but we should define a constant for this. We don't want to play around with strings like this.
| return nil | ||
| } | ||
|
|
||
| func addQdiscIngress(index int) error { |
There was a problem hiding this comment.
A comment about this function role would be nice (along those lines):
// addQdiscIngress creates a traffic rule for any incoming traffic to the interface
// designated by its index.| return nil | ||
| } | ||
|
|
||
| func addRedirectTCFilter(sourceIndex, destIndex int) error { |
virtcontainers/network.go
Outdated
|
|
||
| for _, qdisc := range qdiscs { | ||
| ingress, ok := qdisc.(*netlink.Ingress) | ||
|
|
| # Used when the Container network interface can be bridged using | ||
| # macvtap. | ||
| # | ||
| # - tcfilter |
There was a problem hiding this comment.
Have we measured the perf impact before we take this decision?
There was a problem hiding this comment.
@sboeuf I ran some iperf tests to compare macvtap, bridge and tcfilter solutions on an Intel nuc.
With macvtap I got around 15Gbs throughtput, tcfiltering giving 13.5 Gbs while bridge gave around 12.9 to 13.5 Gbs.
So we get slightly better performance with tcfiltering as compared to bridge mode, but less performance as compared to macvtap. But I think, given that we get added functionality in terms of support for ipvlan, we can make this as the default. In case ipvlan support is not needed, users can choose to use macvtap instead.
WDYT?
There was a problem hiding this comment.
Oh that's cool, thanks for running those tests :)
I don't have any strong opinion on this, but we need the community feedback here. I think we need to decide if we want the default to be the most performant or the most functional.
/cc @bergwolf @jodh-intel @egernst @mcastelino @WeiZhang555 @grahamwhaley @kata-containers/runtime
There was a problem hiding this comment.
@amshinde have you measured if this impacts launch time latency? As this is simpler, we may be a bit faster to launch a kata-container with this. If @egernst can let us know how this performs when pushing 40Gbps, that may give us a better data point.
/cc @grahamwhaley
There was a problem hiding this comment.
@mcastelino I did see the latency was about the same, but did see some improvements in jitter, but this was over a tiny data set. I'll rerun the tests few more times for latency and post my results.
658af9a to
71e7fa6
Compare
|
@amshinde ping :) |
|
@sboeuf I have addressed all your comments. PTAL. |
|
/test |
|
@amshinde thanks! |
8152832 to
20774ea
Compare
|
/retest |
|
@sboeuf As discussed, I have dropped the commit to make this mode as the default. Will open another PR to address that. |
|
@amshinde I'll merge as soon as we get the CI all green! |
|
@sboeuf Can we merge this? |
|
@jodh-intel Can you take another look? |
20774ea to
6e13e69
Compare
|
/retest |
|
Retriggering CI for code coverage. |
Introduce a new mode that uses tc filters to redirect traffic from the network interface created by the network plugin to a tap interface that we connect to the VM. This mode will help support ipvlan as well. Fixes kata-containers#144 Signed-off-by: Archana Shinde <[email protected]>
The test verifies tc filter setup by creating a test veth interface. Signed-off-by: Archana Shinde <[email protected]>
Introduce constants for the network model strings, so as to avoid using the strings directly at multiple places. Signed-off-by: Archana Shinde <[email protected]>
Document this mode for users to be able to use it. Signed-off-by: Archana Shinde <[email protected]>
6e13e69 to
c38792e
Compare
|
/retest |
|
@amshinde a long time ago I added a mode called enlighted interworking mode. The goal of that was to use it after we added tc support. The way I saw it implemented was
That way the user does not make the call. We do. |
A PR now needs *two* labels to be applied before it can be merged. One label must be a backport label from the list below and the other a forward port label: - backport labels: `needs-backport`, `no-backport-needed`, `backport`. - forward-port labels: `needs-forward-port`, `no-forward-port-needed`, `forward-port`. This is to make the maintainer think carefully before merging a PR and hopefully maximise efficient porting. Related: kata-containers/kata-containers#634 Fixes: kata-containers#827. Signed-off-by: James O. D. Hunt <[email protected]>
Instead of creating a bridge, use tc filter rules to redirect
traffic between veth and tap interface.
Signed-off-by: Archana Shinde [email protected]