Skip to content

Multigateway support for Egress Gateway#39304

Merged
pchaigno merged 6 commits intocilium:mainfrom
carlos-abad:multigateway
May 21, 2025
Merged

Multigateway support for Egress Gateway#39304
pchaigno merged 6 commits intocilium:mainfrom
carlos-abad:multigateway

Conversation

@carlos-abad
Copy link
Copy Markdown
Contributor

Add Multigateway support to egress gateway.

Link to the CFP

Fixes: #38341

Add support for multiple gateways to Cilium Egress Gateway.

@carlos-abad carlos-abad requested review from a team as code owners May 3, 2025 01:30
@carlos-abad carlos-abad requested review from aanm and ysksuzuki May 3, 2025 01:30
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 3, 2025
@carlos-abad carlos-abad requested a review from qmonnet May 3, 2025 01:30
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label May 3, 2025
@carlos-abad
Copy link
Copy Markdown
Contributor Author

@bowei
Copy link
Copy Markdown
Member

bowei commented May 4, 2025

@bowei

@carlos-abad carlos-abad force-pushed the multigateway branch 2 times, most recently from f3cffa3 to 72f2914 Compare May 5, 2025 16:56
@tommyp1ckles tommyp1ckles added the release-note/major This PR introduces major new functionality to Cilium. label May 5, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 5, 2025
@tommyp1ckles tommyp1ckles added dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. feature/egress-gateway Impacts the egress IP gateway feature. labels May 5, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 5, 2025
Copy link
Copy Markdown
Member

@qmonnet qmonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits on the docs.

Regarding the UX, I agree with Tom: do we really need this requirement on the first item of the list being in egressGateway as well?

@carlos-abad carlos-abad force-pushed the multigateway branch 2 times, most recently from 38feeee to ece11c1 Compare May 6, 2025 18:36
@carlos-abad
Copy link
Copy Markdown
Contributor Author

Regarding the UX, I agree with Tom: do we really need this requirement on the first item of the list being in egressGateway as well?

Not really. I removed that requirement following Tom's guidance.

Copy link
Copy Markdown
Member

@qmonnet qmonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK from my side, thank you!

@ysksuzuki ysksuzuki removed their request for review May 7, 2025 15:02
@carlos-abad carlos-abad requested a review from tommyp1ckles May 7, 2025 15:54
Copy link
Copy Markdown
Contributor

@derailed derailed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carlos-abad Nice work!

@joestringer
Copy link
Copy Markdown
Member

/test

@pchaigno pchaigno enabled auto-merge May 20, 2025 20:44
auto-merge was automatically disabled May 21, 2025 00:46

Head branch was pushed to by a user without write access

@tommyp1ckles
Copy link
Copy Markdown
Contributor

Integration tests: The failing test is unrelated (github.com/cilium/cilium/pkg/fswatcher). I reran this test locally (privileged and unprivileged) and it worked perfectly. Maybe a flake?

yeah that ones a known flake

@tommyp1ckles
Copy link
Copy Markdown
Contributor

/test

@pchaigno pchaigno enabled auto-merge May 21, 2025 06:37
Copy link
Copy Markdown
Member

@giorio94 giorio94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! A couple of comments inline for my codeowners.

Since the boolean v6needed is determined from the CEGP destinationCIDRs
field, we can move it to the PolicyConfig object rather than the
PolicyGatewayConfig.

This change will be very useful for the multigateway Egress Policy
since we plan to have multiple policyGatewayConfig objects per
PolicyConfig. This change will safe from storing the exact same boolean
value in all policyGatewayConfig objects.

Signed-off-by: Carlos Abad <[email protected]>
This change adds support for parsing the multiple gateways
from a multigateway CEGP and storing them in a PolicyConfig object.

This change doesn't modify how the BPF tables are written.

Signed-off-by: Carlos Abad <[email protected]>
As discussed in the design the assingment follows a round-robin
algorithm. To create repeatable and stable assignments the lists of
gaterways and endpoints are ordered before the rounds robin-assignment.

Design: CFP-38341-multigateway-egress-gateway-policy

Signed-off-by: Carlos Abad <[email protected]>
auto-merge was automatically disabled May 21, 2025 17:16

Head branch was pushed to by a user without write access

To run this test manually please set the kind cluster using the egressgw
preset:
- make kind-egressgw
- make kind-image
- make kind-egressgw-install-cilium

And run it in unsafe mode.
cilium connectivity test --include-unsafe-tests --test seq-egress-gateway-multigateway

Signed-off-by: Carlos Abad <[email protected]>
Copy link
Copy Markdown
Member

@pippolo84 pippolo84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@pchaigno
Copy link
Copy Markdown
Member

/test

@carlos-abad
Copy link
Copy Markdown
Contributor Author

The Cilium E2E Upgrade test failed because Pod test-conn-disrupt-client-5556646476-lzgrb flow was interrupted (restart count does not match 0 != 1). I dug through the logs and found that the pod restarted. I couldn't find any other reference through the logs that will correlate to the changes in this PR. I also don't remember this failure being in the previous test failures we've seen in this CL.

Maybe it's a flake caused by a pod randomly restarting? If so, can someone rerun the tests?

@pchaigno pchaigno enabled auto-merge May 21, 2025 21:24
@pchaigno pchaigno added this pull request to the merge queue May 21, 2025
Merged via the queue into cilium:main with commit fc6a9f0 May 21, 2025
67 checks passed
Comment on lines +46 to +47
- egressGateway:
nodeSelector:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heads up that these changes were broken on arrival. Evidently we don't have CI coverage for this file, otherwise we would have noticed this earlier. But also in this case the developer didn't test this file and the reviewers didn't test this file, so we merged it in a broken state. This was fixed in #40112.

cc @cilium/egress-gateway as reviewers should we be doing anything differently to avoid introducing breakage like this into the tree?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend we introduce automatic testing for the examples. Maybe with a separate Github flow?

By the way, just mentioning that this file already had a bug in it. When I added the new field I replicated the field above, duplicating the existing bug.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree automated testing of example YAML files would be better than expect reviewers catch every indentation and whatnot issues.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah improving the automated testing seems reasonable. In this case, having a short test to load the YAML using the Cilium libraries to verify that it matches the right syntax would have helped; it doesn't have to be an end-to-end test.

By the way, just mentioning that this file already had a bug in it. When I added the new field I replicated the field above, duplicating the existing bug.

If you know about a bug, I would encourage you to submit a fix for it.

In this case the bug I was referring to is introduced by this specific change, because EgressGateways is defined as []EgressGateway and EgressGateway does not have a egressGateway field inside of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature/egress-gateway Impacts the egress IP gateway feature. kind/community-contribution This was a contribution made by a community member. release-note/major This PR introduces major new functionality to Cilium.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CFP: Multigateway support for Egress Gateway