multi-pool: add support for excluding IP pools from masquerade#40132
multi-pool: add support for excluding IP pools from masquerade#40132gandro merged 1 commit intocilium:mainfrom
Conversation
21ebd25 to
cf2fd17
Compare
|
I have tried this on a local build, and it functionally works for our use case. |
c52d715 to
da8d87d
Compare
|
Thanks for the review so far @bimmlerd I've made an attempt at moving the configuration into the iptables cell as requested, and I've updated the iptables implementation such that it adds a rule at the top of the For the bpf implementation I'm unsure how to pass this. Would really appreciate some suggestions. |
da8d87d to
94cd267
Compare
|
Hi @bimmlerd I added support for multiple CIDRs to the flag (comma-separated) and changed the BPF implementation to use LPM trie maps. I'd appreciate a review on the way the config is done now, and where the map update logic is put in loader.go. The ip-masq-agent BPF map lookup was the same as ours so I made a common function for that as well. I've also found that by default the iptables implementation doesn't masquerade the additional IP pools, but leaving in our iptables implementation anyway to be more explicit about it. Thanks |
|
Administrative note: we're in feature freeze at the moment - since this is new it'll not be merged until 1.18 is branched. Freeze usually also means that review of bugfixes is prioritized at the cost of feature PRs such as this one, might take a bit of patience 🙏 that said, had a brief look and it looks better - but the CODEOWNERs will give detailed reviews |
|
We're encountering a similar issue where a subnet advertised via BGP (used as a secondary Cilium IP pool) requires internet access through an external firewall that performs sNAT. While this PR is currently on hold (or is there a chance it could be merged soon?), we're wondering if it's possible to specify a wildcard range in Specifying @hassenius Are there any known workarounds to achieve this behavior in the meantime? Also, is there an estimated timeline for when this PR might be merged or released? |
The workaround we will use until this is merged is to revert back to iptables masq - so no BPF-based masquerade. Since this is a new feature I imagine it will go into 1.19. |
Thank you. We have come up with this nonMasqueradeCIDR list to exclude our internal subnet (10.0.0.0/8). It's been working fine so far 😊 |
354b66a to
c7fb743
Compare
|
@julianwiedmann & @brb I've re-done the PR to implement both configuration knobs that I mentioned, edit: I did a cursory look, fixed a couple silly bugs, but overall behaviour looks like its working. I only tested the basic happy path as it's quite late so I'll keep testing over the next few days. This is the stateDB table - the CIDRs field might get quite unwieldy on larger pools so I might remove that or limit how many are shown here. |
|
|
Looked into the failing test "K8s Network E2E tests (dual)" - looks to be timing out on endpoint creation. Found some other pipelines from unrelated MRs with the same issue (I downloaded the logs and checked): Could we give it a retry? Also thanks @thehonker - appreciate you testing it out |
fristonio
left a comment
There was a problem hiding this comment.
cilium/endpoint: Looks good 🚀
One question to clarify my understanding of the behavior.
julianwiedmann
left a comment
There was a problem hiding this comment.
Thank you for the fast turnaround! I like that we're now getting much closer to a proposal how we expect folks to actually use this feature. And we don't require them to push down magic IP ranges into the datapath :).
Let's see how IPAM folks feel about this.
qmonnet
left a comment
There was a problem hiding this comment.
Thanks! I just looked at the docs, I've got minor style suggestions (inline below).
|
/test |
gandro
left a comment
There was a problem hiding this comment.
High-level, I think this is a solid approach. Structure-wise, I've left some feedback for sig-ipam
|
@julianwiedmann if you are happy with the latest changes, could you remove the dont-merge/discussion label? |
gandro
left a comment
There was a problem hiding this comment.
IPAM-wise this looks good to me!
|
/test |
julianwiedmann
left a comment
There was a problem hiding this comment.
Thanks! Just a few minor comments inline.
|
Thanks a lot @julianwiedmann, appreciate the review (and the idea for this approach). I've removed the unnecessary add_world_entry and set_identity_mark calls from the tests and added two additional tests to verify behaviour without the SKIP_MASQ flag |
|
/test |
This adds a new flag only-masquerade-default-pool which disables masquerading for pods which have IPs allocated non-default pools. A per-pool annotation (ipam.cilium.io/skip-masquerade) is also added to configure this behaviour in a more granular way. This can be particularly useful when using the newly supported combination of multi-pool ipam and tunneling, where you may have the default pool be tunnelled, and additional pools handled natively (for example in conjunction with bgp control plane) This feature is supported with a new stateDB table for CiliumPodIPPools. At IP allocation time, the chosen pool is looked up on stateDB to look for the skip-masquerade annotation. Signed-off-by: alimehrabikoshki <[email protected]>
|
/test |
|
Looks like we have CI fallout in the multipool workflow from this change, although it's not clear to me why CI on the PR itself didn't catch it:
|
|
Seems like it's a race. But yeah, indeed strange that we didn't catch it in this PR. We need to check in |
|
For posterity: The CI issue has been fixed: #43380 |
Please ensure your pull request adheres to the following guidelines:
description and a
Fixes: #XXXline if the commit addresses a particularGitHub issue.
Fixes: <commit-id>tag, thenplease add the commit author[s] as reviewer[s] to this issue.
This adds a new flag only-masquerade-default-pool which disables masquerading for pods which have IPs allocated non-default pools.
A per-pool annotation (ipam.cilium.io/skip-masquerade) is also added to configure this behaviour in a more granular way.
This can be particularly useful when using the newly supported combination of multi-pool ipam and tunneling, where you may have the default pool be tunnelled, and additional pools handled natively (for example in conjunction with bgp control plane)
This feature is supported with a new stateDB table for CiliumPodIPPools. At IP allocation time, the chosen pool is looked up on stateDB to look for the skip-masquerade annotation.
Fixes: #40131