wireguard: tracking only nodeIPs as AllowedIPs in tunneling mode by smagnani96 · Pull Request #35895 · cilium/cilium

smagnani96 · 2024-11-11T15:38:47Z

I tried to grasp all the details in the commit messages.

wireguard: test: remove unneeded nodeEncryption test: while preparing unit tests for subsequent commits, I noticed that TestAgent_PeerConfig_WithEncryptNode can be removed, as we always expect node IPs to be in allowedIPs regardless of node encryption being enabled or not. This behavior should already be covered in the other tests.
wireguard: test: rework agent tests: this is the commit that unfortunately does contribute a lot to LOCs, despite not modifying the testing logic. With this commit, I'd like to reuse the logic we have in TestAgent_PeerConfig and TestAgent_AllowedIPsRestoration for the subsequent introduced agent behaviors in different routing modes.
wireguard: reduce syscall ConfigureDevice when not needed: as described in the commit message, there are a few cases in which we still configure the device even when not necessarily needed (no ep changes, no key changes, no IPs to add). This could result in unneeded syscalls being executed anyways.
daemon: wireguard: add WireguardTrackAllIPsFallback flag: this is a precaution flag, used in case the new agent logic after the next commit is breaking connectivity. In tunneling mode, we will only track nodeIPs as all the pod-to-pod connections should go through the overlay. In case we're missing something, enabling this flag would allow the agent to track pod IPs even in tunnel mode.
wireguard: track only nodeIPs with tunneling enabled: pretty self-explanatory. Since in tunneling we'd track only node IPs provided in UpdatePeer() , we don't need to register to IPCache events.

Tracking only nodeIPs in WireGuard AllowedIPs with overlay routing, while preserve native routing behavior of tracking both node and pods IPs from IPCache events.

jschwinger233 · 2024-12-18T09:20:15Z

I created a kind cilium (vxlan routing) using this PR, the wg output from a node is:

$ nscontainer kind-worker wg                                                                                                                                                                                  
interface: cilium_wg0                                                                                                                                                                                                
  public key: e7DcIto1CcNWZasdIPiyES1qpET0KPCU8R73zZd28FI=                                                                                                                                                           
  private key: (hidden)                                                                                                                                                                                              
  listening port: 51871                                                                                                                                                                                              
  fwmark: 0x1e00                                                                                                                                                                                                     
                                                                                                                                                                                                                     
peer: +t7+WQjJyRpVsTBEsrgKWXypwxL5FZOcPZ9y1o4oMCQ=                                                                                                                                                                   
  endpoint: 172.20.0.2:51871                                                                                                                                                                                         
  allowed ips: 10.244.0.0/24, fd00:10:244::/64, 172.20.0.2/32, fc00:c111::2/128                                                                                                                                      
  latest handshake: 1 minute, 37 seconds ago                                                                                                                                                                         
  transfer: 3.63 KiB received, 3.57 KiB sent                                                                                                                                                                         
                                                                                                                                                                                                                     
peer: WY3mQEhOPIG4B/H7tPzhcbaHzBHC3lrkkNEFOLQQukk=                                                                                                                                                                   
  endpoint: 172.20.0.5:51871                                                                                                                                                                                         
  allowed ips: 10.244.2.0/24, fd00:10:244:2::/64, 172.20.0.5/32, fc00:c111::5/128                                                                                                                                    
  latest handshake: 1 minute, 41 seconds ago                                                                                                                                                                         
  transfer: 11.79 KiB received, 11.93 KiB sent                                                                                                                                                                       
                                                                                                                                                                                                                     
peer: vr/azY9vQlnrHIt051zSa7e7SnDcWvBnLIH3Q3S9CAQ=                                                                                                                                                                   
  endpoint: 172.20.0.3:51871                                                                                                                                                                                         
  allowed ips: fd00:10:244:1::/64, 172.20.0.3/32, fc00:c111::3/128, 10.244.1.0/24                                                                                                                                    
  latest handshake: 2 minutes ago                                                                                                                                                                                    
  transfer: 14.43 KiB received, 14.00 KiB sent

There are still cidrs in allowed ips, are they expected?

asauber

Approving for CLI, since the new flag and its usage look correct.

jschwinger233

just a small question regarding forceUpdate, others lgtm.

pkg/wireguard/agent/agent.go

smagnani96 · 2025-01-10T14:00:54Z

/test

gandro

Awesome work! Looks correct to me. I have left a few minor suggestions, though none of them are blocking to get this merged

pkg/wireguard/agent/agent_test.go

pkg/wireguard/agent/agent.go

smagnani96 · 2025-01-18T11:35:53Z

Awesome work! Looks correct to me. I have left a few minor suggestions, though none of them are blocking to get this merged

Awesome review, many many thanks!
I just addressed the comments, hopefully improving the clarity especially in tests (2nd commit).
There's also a small improvement in the agent (last commit): in UpdatePeer we don't lock IPcache anymore when not needed.

Could you please take a look at changes? 🙏

smagnani96 · 2025-01-18T11:36:17Z

/test

gandro

Thanks for addressing the feedback! Looks almost good to me, one minor thing regarding the Init() function that I would take a look at.

pkg/wireguard/agent/agent.go

This commit removes the `TestAgent_PeerConfig_WithEncryptNode` from the WireGuard agent testing unit. There are two main reasons for this: 1. test doesn't specifically require nodeEncryption to be true to run, as it doesn't check the daemon config; 2. nodeEncryption use case is already covered in `TestAgent_PeerConfig`: we always insert in allowedIPS the node IPs regardless of the config. For this reason, I believe this test can be safely removed. Signed-off-by: Simone Magnani <[email protected]>

This commit reworks the two test for the WireGuard agent `TestAgent_AllowedIPsRestoration` and `TestAgent_PeerConfig` while preserving the tested logic. The main changes are as follows: 1. add an `assertAllowedIPs` also inside `TestAgent_PeerConfig` to slim down the n. of assertions: expected allowedIPs are now provided as list rather than having to manually test them on multiple lines. 2. introduce `type config struct` to hold the test parameters and `type expectation struct` to hold expected allowedIPs for a given peer: this is useful for subsequent commits, as we can reuse the testing logic that we currently have in the agent for different routing modes. In fact, we are trying to slim down the set of tracked allowedIPs in WireGuard, and this can lead to different agent behaviors depending on the underlying datapath configuration (tunnel, native without CIDRs, native with CIDRs, etc.). This new struct defines, other than the name of a given test and the routing mode, the expectation that must be met everytime invoking the apposite `assertAllowedIPs` in the two tests. A single expectation is a `[]*net.IPNet` representing all the allowedIPs that we expect to be present on a given assertion. 3. modified the two tests according to (2): the only current behavior of the agent is the native routing mode in which we track all IPs. Signed-off-by: Simone Magnani <[email protected]>

Whenever we call the `updatePeerByConfig` in the Wireguard agent, we always perform `ConfigureDevice` even in case there are no new allowedIPs to insert. This could result in many unneeded syscalls being used anyway, such as when: * finishing restoring IPs for a peer with no IP to be removed; * handling an IPCache delete event with no new IP to add. * periodic NodeValidateImplementation with no changes to the node pubkey/IPs The 1st ever call to `UpsertPeer()` will always configure the device, as nodeIPs are new. On the other hand, when restoring allowedIPs or when handling an IPCache event, we can avoid configuring the device when no relevant changes are recorded. Note that IPs that needs to be removed would be correctly handled and removed in any case. Signed-off-by: Simone Magnani <[email protected]>

This commit adds an hidden flag `WireguardTrackAllIPsFallback`. Subsequent commits will try to modify the logic of the agent depending on the underlying routing mode, expecting significant changes mainly when in tunneling. This flag, which is false by default, is introduced in case the new changes break connectivity that we did not foresee. If that is the case, reinstalling Cilium with `wireguard-track-all-ips-fallback` should restore the previous known agent behavior, which is to add all IPs into its allowedIPs regardless the datapath config. Signed-off-by: Simone Magnani <[email protected]>

In the WireGuard agent, we constantly update peers configuration whenever IPs are being upserted/removed, other than keeping track of peers' nodes IPs. This results in having a multitude of entries both in our userspace AllowedIPs map and in the kernel trie, requiring more memory and also computation on each lookup. Let's keep the datapath configuration in the loop, so that we can avoid tracking addresses when not needed. Node IPs of a peer are always tracked to include scenarios such as node encryption, overlay and mixed IPAM modes. That said, the following two routing scenarios have been identified: 1. Tunnel: when using overlay, we don't need to track all IPs, since for pod-to-pod connections we would use node IPs in the overlay traffic. Therefore, no further addresses is needed, so we don't event add the agent as a listener for IPCache events: nodeIPs changes will always be handled by the UpdatePeer() logic. Another advantage of this approach is that IPCache updates don't need to be locked by the agent. 2. Native routing: similarly as we were doing before this patch, in this configuration we keep track of each IP, since we don't have other ways of knowing a-priori from which set of addressed a node will receive the traffic. Configurations such as IPAM legacy, Azure, AlibabaCloud, "crd" mode, delegated, would fall into this scenario. Restoration logic when switching from tunneling to native routing should work without problems: here the only IPs restored from allowedIPs are node IPs, which are either kept if equal to the new ones or removed in case they changed. Upon restart, the agent tracks all pod IPs from IPCache. Restoration logic when switching from native routing to tunneling should not cause problems: for the whole duration of the restoration until `RestoreFinished` is called, we would see in the allowedIPs both the new nodeIPs and all the restored ones. In this case, the agent would not subscribe to IPCache changes/state, therefore it will not track these pod IPs anymore, leading to their removal once restoration has finished. The new pod-to-pod connectivity should go through the tunnel, so nodeIPs must be sufficient, but if we do see failures we can use the new flag `WireguardTrackAllIPsFallback`` (`wireguard-track-all-ips-fallback`): it serves to preserve the old behavior of the agent even with this patch on. In addition, this commit adds to the current agent tests the tunneling scenario, both considering the default behavior and the fallback one when WireguardTrackAllIPsFallback is provided. The fallback testing is intended for tunneling mode only, as in native routing mode we would already have the old behavior of the agent. Fixes: cilium#35331 Signed-off-by: Simone Magnani <[email protected]>

gandro

Thank you!

smagnani96 · 2025-01-20T10:46:38Z

/test

maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Nov 11, 2024

smagnani96 added the dont-merge/preview-only Only for preview or testing, don't merge it. label Nov 11, 2024

jschwinger233 added the feature/wireguard Relates to Cilium's Wireguard feature label Nov 28, 2024

smagnani96 force-pushed the pr/allowed-ips branch 2 times, most recently from 8db9850 to 28d5978 Compare December 2, 2024 00:27

smagnani96 changed the title ~~wip - slim down allowedIPs for wireguard~~ Optimize allowedIPs for wireguard with CIDRs Dec 2, 2024

smagnani96 force-pushed the pr/allowed-ips branch 3 times, most recently from b1455f0 to 954a71c Compare December 3, 2024 08:01

smagnani96 force-pushed the pr/allowed-ips branch from 954a71c to 3489a08 Compare December 11, 2024 17:46

smagnani96 changed the title ~~Optimize allowedIPs for wireguard with CIDRs~~ Wireguard: Tracking AllowedIPs according to dataplane configuration Dec 11, 2024

smagnani96 force-pushed the pr/allowed-ips branch from 3489a08 to ce0907a Compare December 16, 2024 18:54

smagnani96 marked this pull request as ready for review December 16, 2024 20:52

smagnani96 requested review from a team as code owners December 16, 2024 20:52

smagnani96 requested review from jschwinger233, nebril and ysksuzuki December 16, 2024 20:52

smagnani96 marked this pull request as draft December 16, 2024 20:56

smagnani96 force-pushed the pr/allowed-ips branch from ce0907a to 49d5916 Compare December 17, 2024 11:17

smagnani96 force-pushed the pr/allowed-ips branch from 49d5916 to fd7133d Compare December 18, 2024 18:04

smagnani96 requested a review from asauber January 7, 2025 12:03

asauber approved these changes Jan 8, 2025

View reviewed changes

jschwinger233 reviewed Jan 9, 2025

View reviewed changes

pkg/wireguard/agent/agent.go Outdated Show resolved Hide resolved

smagnani96 force-pushed the pr/allowed-ips branch from fcdfdc1 to a302b8c Compare January 10, 2025 13:20

jschwinger233 approved these changes Jan 13, 2025

View reviewed changes

smagnani96 requested a review from gandro January 13, 2025 11:09

gandro approved these changes Jan 14, 2025

View reviewed changes

pkg/wireguard/agent/agent_test.go Outdated Show resolved Hide resolved

pkg/wireguard/agent/agent.go Outdated Show resolved Hide resolved

pkg/wireguard/agent/agent.go Outdated Show resolved Hide resolved

smagnani96 force-pushed the pr/allowed-ips branch from a302b8c to 5ed63ed Compare January 18, 2025 11:18

smagnani96 removed the dont-merge/waiting-for-review Requires further review before merging. label Jan 18, 2025

gandro reviewed Jan 20, 2025

View reviewed changes

pkg/wireguard/agent/agent.go Outdated Show resolved Hide resolved

smagnani96 added 5 commits January 20, 2025 11:11

smagnani96 force-pushed the pr/allowed-ips branch from 5ed63ed to f757637 Compare January 20, 2025 10:19

gandro approved these changes Jan 20, 2025

View reviewed changes

smagnani96 changed the title ~~Wireguard: tracking only nodeIPs as AllowedIPs in tunneling mode~~ wireguard: tracking only nodeIPs as AllowedIPs in tunneling mode Jan 20, 2025

nebril approved these changes Jan 21, 2025

View reviewed changes

julianwiedmann added this pull request to the merge queue Jan 21, 2025

Merged via the queue into cilium:main with commit d89d141 Jan 21, 2025

kaworu mentioned this pull request Mar 3, 2025

CI: Conformance AWS-CNI: hubble-relay: Failed to create peer notify client for peers change notification #37289

Closed

smagnani96 deleted the pr/allowed-ips branch March 18, 2025 11:32

julianwiedmann mentioned this pull request Mar 20, 2025

wireguard: don't tunnel pod->remote-node when n2n encryption is enabled #38226

Closed

julianwiedmann mentioned this pull request May 16, 2025

workflows: Add WireGuard in the Conformance Multi-Pool workflow #39561

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wireguard: tracking only nodeIPs as AllowedIPs in tunneling mode#35895

wireguard: tracking only nodeIPs as AllowedIPs in tunneling mode#35895
julianwiedmann merged 5 commits intocilium:mainfrom
smagnani96:pr/allowed-ips

smagnani96 commented Nov 11, 2024 •

edited

Loading

Uh oh!

jschwinger233 commented Dec 18, 2024

Uh oh!

asauber left a comment

Uh oh!

jschwinger233 left a comment

Uh oh!

Uh oh!

smagnani96 commented Jan 10, 2025

Uh oh!

gandro left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

smagnani96 commented Jan 18, 2025

Uh oh!

smagnani96 commented Jan 18, 2025

Uh oh!

gandro left a comment

Uh oh!

Uh oh!

gandro left a comment

Uh oh!

smagnani96 commented Jan 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

smagnani96 commented Nov 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jschwinger233 commented Dec 18, 2024

Uh oh!

asauber left a comment

Choose a reason for hiding this comment

Uh oh!

jschwinger233 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

smagnani96 commented Jan 10, 2025

Uh oh!

gandro left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

smagnani96 commented Jan 18, 2025

Uh oh!

smagnani96 commented Jan 18, 2025

Uh oh!

gandro left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gandro left a comment

Choose a reason for hiding this comment

Uh oh!

smagnani96 commented Jan 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

smagnani96 commented Nov 11, 2024 •

edited

Loading