Skip to content

bpf: nat: Adding a new config field to enable SNAT for remote node. #35823#37568

Merged
joestringer merged 1 commit intocilium:mainfrom
behzad-mir:remoteNode-SNAT
Aug 19, 2025
Merged

bpf: nat: Adding a new config field to enable SNAT for remote node. #35823#37568
joestringer merged 1 commit intocilium:mainfrom
behzad-mir:remoteNode-SNAT

Conversation

@behzad-mir
Copy link
Copy Markdown
Contributor

@behzad-mir behzad-mir commented Feb 11, 2025

  • For first time contributors, read Submitting a pull request
  • All code is covered by unit and/or runtime tests where feasible.
  • All commits contain a well written commit description including a title,
    description and a Fixes: #XXX line if the commit addresses a particular
    GitHub issue.
  • If your commit description contains a Fixes: <commit-id> tag, then
    please add the commit author[s] as reviewer[s] to this issue.
  • All commits are signed off. See the section Developer’s Certificate of Origin
  • Provide a title or release-note blurb suitable for the release notes.
  • Are you a user of Cilium? Please add yourself to the Users doc
  • Thanks for contributing!

Following the issues discussed in 35823 and 17177 we are adding a new config field "enable-remote-node-snat" to enable SNAT for remote node in BPF routing mode.

Currently, Cilium ebpf based masquerading does not perform SNAT on packets destined for a remote node, even if the node CIDR is not listed in the non-masquerade list. This behavior differs from the upstream ip-masq-agent. This behavior prevents pods-remote nodes communication in a BPF-masquerade enabled cluster when pod and node network are in different subnets.

With this change if the new confimap field "enable-remote-node-snat" is set to "true"
then packets destined to remote nodes will be SNATed in bpf routing mode.

Changes are reflected in bpf/lib/nat.h :

Before:

if (identity_is_remote_node(remote_ep->sec_identity))
     return NAT_PUNT_TO_STACK;

After:

if (identity_is_remote_node(remote_ep->sec_identity)) {
    #ifdef ENABLE_REMOTE_NODE_SNAT
    {
	    target->addr = IPV4_MASQUERADE;
	    return NAT_NEEDED;
    }
    #endif
    return NAT_PUNT_TO_STACK;
}

In order to verify and test the expected behavior, we followed the repro in 35823 in a cilium bpf routing cluster with the addition of enable-remote-node-snat = "true" in the configmap. ICMP packet from a pod to a remote node initiated.
The TCP-dump from remote node indicates that the source IPs for incoming packet SNATed to the ip of source node.

Fixes: #35823

Add a new config field to enable remote node masquerading in BPF routing mode. This can help to establish the pods-remote nodes communication in a BPF-masquerade enabled cluster when pod and node network are in different subnets

@behzad-mir behzad-mir requested review from a team as code owners February 11, 2025 23:54
@behzad-mir behzad-mir requested a review from thorn3r February 11, 2025 23:54
@maintainer-s-little-helper

This comment was marked as resolved.

@maintainer-s-little-helper maintainer-s-little-helper bot added dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Feb 11, 2025
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Feb 11, 2025
@behzad-mir behzad-mir changed the title bpf: nat: Adding a new config field to enable SNAT for remote node. bpf: nat: Adding a new config field to enable SNAT for remote node. #35823 Feb 11, 2025
@pchaigno
Copy link
Copy Markdown
Member

Thanks for the contribution! Could you give a bit of context in the commit description, also describe how you tested this, and fix the CI failure? I'll mark as draft in the meantime to avoid notifying reviewers.

@pchaigno pchaigno marked this pull request as draft February 12, 2025 12:36
@pchaigno pchaigno added the release-note/minor This PR changes functionality that users may find relevant to operating Cilium. label Feb 12, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Feb 12, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Feb 12, 2025
@behzad-mir
Copy link
Copy Markdown
Contributor Author

Thanks for the contribution! Could you give a bit of context in the commit description, also describe how you tested this, and fix the CI failure? I'll mark as draft in the meantime to avoid notifying reviewers.

@pchaigno
Thank you for the comment. I addressed the issue you mentioned here. Let me know if the PR looks good or if anything is missing here.

@behzad-mir behzad-mir marked this pull request as ready for review February 12, 2025 23:36
Copy link
Copy Markdown
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add the additional context in the commit description and not the PR description?

In order to verify and test ithe change. We followed the repro in #35823 and verified that the source address is getting SNATed to the source node IP.

Can you describe exactly how you did that with the current code? It does not work for me.

@behzad-mir
Copy link
Copy Markdown
Contributor Author

Could you add the additional context in the commit description and not the PR description?

In order to verify and test ithe change. We followed the repro in #35823 and verified that the source address is getting SNATed to the source node IP.

Can you describe exactly how you did that with the current code? It does not work for me.

Did you follow these steps that Victor provide here:
#35823 (comment)

You have to make sure that when you run k exec -it -n kube-system cilium-zg2mz -- cilium bpf ipmasq list the node subent is not included here.

@behzad-mir
Copy link
Copy Markdown
Contributor Author

Could you add the additional context in the commit description and not the PR description?

In order to verify and test ithe change. We followed the repro in #35823 and verified that the source address is getting SNATed to the source node IP.

Can you describe exactly how you did that with the current code? It does not work for me.

@pchaigno I also updated the commit message with more context about the issue. Let me know if you needed to have a repro with current code.

@tamilmani1989
Copy link
Copy Markdown
Contributor

/test

@behzad-mir
Copy link
Copy Markdown
Contributor Author

Picking up the review from Timo here, as I was assigned initially for loader anyway. Overall looks good, we should just make sure the config variable is covered in verifier tests.

@joestringer is it possible to merge the PR with outstanding request for change from @ti-mo? I think @rgo3 is reviewing in his stead.

@joestringer
Copy link
Copy Markdown
Member

is it possible to merge the PR with outstanding request for change from @ti-mo? I think @rgo3 is reviewing in his stead.

Yes, as long as Robin is good with it, he covers the @cilium/loader code owner. 👍

Copy link
Copy Markdown
Contributor

@rgo3 rgo3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loader changes LGTM!

@behzad-mir
Copy link
Copy Markdown
Contributor Author

is it possible to merge the PR with outstanding request for change from @ti-mo? I think @rgo3 is reviewing in his stead.

Yes, as long as Robin is good with it, he covers the @cilium/loader code owner. 👍

Can you and @pchaigno also make sure that the requested changes are covered or not, thanks.

@vipul-21
Copy link
Copy Markdown
Contributor

/test

@behzad-mir
Copy link
Copy Markdown
Contributor Author

@joestringer @pchaigno I addressed all of the comments and waiting for your review.

@tamilmani1989
Copy link
Copy Markdown
Contributor

/test

1 similar comment
@tamilmani1989
Copy link
Copy Markdown
Contributor

/test

@tamilmani1989
Copy link
Copy Markdown
Contributor

/test

2 similar comments
@tamilmani1989
Copy link
Copy Markdown
Contributor

/test

@tamilmani1989
Copy link
Copy Markdown
Contributor

/test

@kevinvalk
Copy link
Copy Markdown

Sorry to be a party pooper here... but this flag does not differentiate traffic for node InternalIP and ExternalIP. Both of these IPs will receive identity reserved:remote-node (6) and with this flag both traffic to these IPs will be masqueraded. I believe this will make this flag quite limited in its use. In my case for example, my InternalIPs are reachable native, but ExternalIPs are not....

Wouldn't it be way nicer and "easier" (in the end) to somehow allow end user to define which identities should be assigned to InternalIP and ExternalIP node addresses? That way, one could configure that all externalIPs should be given reserved:world identity which also would result in it being nicely masqueraded.

I tried to figure out where in the code node addresses are "assigned" identities, but did not fully grasp how and where this was done.

…led.

Currently, Cilium ebpf based masquerading does not perform SNAT
on packets destined for a remote node, even if the node CIDR is not listed in the non-masquerade list.
This behavior differs from the upstream ip-masq-agent.

With this change if the new confimap field "enable-remote-node-snat" is set to "true"
then packets destined to remote nodes will be SNATed in bpf routing mode.

The proposed change has been verified in a cilium bpf routing cluster with enable-remote-node-snat = "true".
ICMP packet from a pod to a remote node initied.
The TCP-dump from remote node indicates that the source IPs for incoming packet SNATed to the ip of source node.

Signed-off-by: Behzad Mirkhanzadeh <[email protected]>
@tamilmani1989
Copy link
Copy Markdown
Contributor

/test

@tamilmani1989
Copy link
Copy Markdown
Contributor

Sorry to be a party pooper here... but this flag does not differentiate traffic for node InternalIP and ExternalIP. Both of these IPs will receive identity reserved:remote-node (6) and with this flag both traffic to these IPs will be masqueraded. I believe this will make this flag quite limited in its use. In my case for example, my InternalIPs are reachable native, but ExternalIPs are not....

Wouldn't it be way nicer and "easier" (in the end) to somehow allow end user to define which identities should be assigned to InternalIP and ExternalIP node addresses? That way, one could configure that all externalIPs should be given reserved:world identity which also would result in it being nicely masqueraded.

I tried to figure out where in the code node addresses are "assigned" identities, but did not fully grasp how and where this was done.

@kevinvalk that's correct. I believe the intention of PR is to make the behavior consistent with upstream ip-masq-agent. This PR will snat remote internal node ips if ip masq agent config excludes node cidr from non-masquerade list and new flag enable-remote-node-snat is enabled. May be this work can be extended in future to address the usecase you specified.

@tamilmani1989
Copy link
Copy Markdown
Contributor

/ci-clustermesh

@tamilmani1989
Copy link
Copy Markdown
Contributor

/ci-ginkgo

@joestringer
Copy link
Copy Markdown
Member

@kevinvalk I wonder if this is a use case you could capture into the masquerade discussion thread in #36335 ? I do feel overall that we need a more cohesive configuration for all of the masquerade settings, and I think it would help for us to collect together the use cases in one place so that if someone were to try to come up with a cohesive idea about how all the cases could be configured through a single configuration interface then they would have one place to consult.

As far as I understand the common parameters with some of this recent increase in interest is (a) running without tunneling and (b) on networks with nodes across multiple subnets. I'm not entirely sure that more NAT is going to get us into a better space here, because for instance enabling this new flag could cause surprising policy behavior where traffic from Pods towards remote nodes could be misclassified. That said, the main reason I haven't pushed back against this more is that there is an argument for consistency with how upstream ip-masq-agent works. The configuration is arguably pretty clear, even if that causes other features to work poorly.

While it'd be ideal to make this work for all the use cases, I don't think it's fair to block this PR which is useful for some people based on the goal of supporting another use case. From that perspective I agree with @tamilmani1989 that we could consider solving this problem on top.

@joestringer
Copy link
Copy Markdown
Member

Acknowledging the CI is all green now. The test stability in the main tree has not been as green as I'd like, but getting it green again is a step by step process. Thanks for helping by filing and interacting on issues that you've found.

I had been holding back on review of this for a few reasons - one, deferring to @cilium/sig-datapath to better balance the maintenance workload; two, because CI hadn't passed and unless a PR is consistently passing CI it's hard for a third party to understand whether the failures are introduced by the PR or not, and three because I was under the impression some of the feedback had not yet been addressed. Unfortunately given how long it's dragged out, and due to summer vacations the reviewers have rotated a bit. However I think this has received plenty of scrutiny from the primary owners - @cilium/sig-datapath and @cilium/loader . On the last part I think what happened is that some feedback was provided on a couple of the documentation blurbs but not all of them, and you had applied the feedback on the ones which received comments. Documentation/network/concepts/masquerading.rst is not entirely consistent on that front, but that's easy enough to tidy up in a subsequent PR. For future I would suggest setting a line limit suggestion of 80 characters to conform most submissions to that line length, and that could avoid a couple of the back/forth discussion threads seen above.

@tamilmani1989
Copy link
Copy Markdown
Contributor

Thanks @joestringer for approval and merge!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. feature/snat Relates to SNAT or Masquerading of traffic kind/community-contribution This was a contribution made by a community member. release-note/minor This PR changes functionality that users may find relevant to operating Cilium.

Projects

No open projects
Status: Released

Development

Successfully merging this pull request may close these issues.

Difference in cilium bpf masq compared to upstream ip-masq-agent BPF masquerading does not masquerade traffic to remote node's ExternalIP