Skip to content

fix: preserve policy status across multiple gatewayclasses#6153

Closed
zhaohuabing wants to merge 5 commits intoenvoyproxy:mainfrom
zhaohuabing:fix-policy-status-across-gc
Closed

fix: preserve policy status across multiple gatewayclasses#6153
zhaohuabing wants to merge 5 commits intoenvoyproxy:mainfrom
zhaohuabing:fix-policy-status-across-gc

Conversation

@zhaohuabing
Copy link
Copy Markdown
Member

@zhaohuabing zhaohuabing commented May 22, 2025

fixes: #6125
Release note: yes

This is a pretty major change on the policy status, so let's hold off for a bit before cherry-picking it into 1.4 and 1.3, just to be safe.

@zhaohuabing zhaohuabing requested a review from a team as a code owner May 22, 2025 05:58
@zhaohuabing zhaohuabing marked this pull request as draft May 22, 2025 05:59
@zhaohuabing zhaohuabing changed the title preserve policy status cross gc boundary preserve policy status across gc boundary May 22, 2025
@codecov
Copy link
Copy Markdown

codecov bot commented May 22, 2025

Codecov Report

❌ Patch coverage is 0% with 57 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.69%. Comparing base (6f42531) to head (d0b7987).
⚠️ Report is 295 commits behind head on main.

Files with missing lines Patch % Lines
internal/gatewayapi/runner/runner.go 0.00% 57 Missing ⚠️

❌ Your patch status has failed because the patch coverage (0.00%) is below the target coverage (60.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6153      +/-   ##
==========================================
- Coverage   70.80%   70.69%   -0.12%     
==========================================
  Files         220      220              
  Lines       37132    37189      +57     
==========================================
- Hits        26291    26289       -2     
- Misses       9300     9358      +58     
- Partials     1541     1542       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@zhaohuabing zhaohuabing force-pushed the fix-policy-status-across-gc branch 2 times, most recently from fd17502 to 3d16457 Compare May 22, 2025 06:43
@zhaohuabing zhaohuabing marked this pull request as ready for review May 22, 2025 06:43
@zhaohuabing zhaohuabing changed the title preserve policy status across gc boundary fix: preserve policy status across gc boundary May 22, 2025
@zhaohuabing zhaohuabing changed the title fix: preserve policy status across gc boundary fix: preserve policy status across multiple gatewayclasses May 22, 2025
@zhaohuabing zhaohuabing force-pushed the fix-policy-status-across-gc branch 3 times, most recently from 076c387 to e521ed8 Compare May 22, 2025 07:48
@zhaohuabing zhaohuabing marked this pull request as draft May 22, 2025 09:03
@zhaohuabing zhaohuabing force-pushed the fix-policy-status-across-gc branch 4 times, most recently from f1c1290 to 8fb5af5 Compare May 22, 2025 11:01
@zhaohuabing zhaohuabing force-pushed the fix-policy-status-across-gc branch from 8fb5af5 to b184eed Compare May 22, 2025 11:03
@zhaohuabing zhaohuabing force-pushed the fix-policy-status-across-gc branch 2 times, most recently from 9e8c3a8 to ec5caa3 Compare May 22, 2025 13:21
Signed-off-by: Huabing (Robin) Zhao <[email protected]>
Signed-off-by: Huabing (Robin) Zhao <[email protected]>
@zhaohuabing zhaohuabing force-pushed the fix-policy-status-across-gc branch from 9b6ec74 to 69bf81a Compare May 22, 2025 14:34
@zhaohuabing zhaohuabing marked this pull request as ready for review May 22, 2025 14:36
ancestorsFromOtherGatewayClasses := make([]gwapiv1a2.PolicyAncestorStatus, 0)
for _, ancestor := range policyStatus.Ancestors {
belongToCurrentGatewayClass := false
if ancestor.AncestorRef.Group == nil || *ancestor.AncestorRef.Group == "gateway.networking.k8s.io" {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes all xPolicy types use Gateway as the status ancestor, which is recommended by the Gateway API spec.
We may need to clean up the xPolicy status in a follow-up PR in case where other types of resource are used for policy status ancestor.

@zhaohuabing zhaohuabing marked this pull request as draft May 22, 2025 15:05
@zhaohuabing
Copy link
Copy Markdown
Member Author

zhaohuabing commented May 22, 2025

This fix takes a different approach from #4587 because #4587 has a loophole - the status merging logic can't delete an old status ancestor after a target is removed. I plan to raise a follow-up PR to fix that using the same strategy if we agree on this one.

We still have another issue: if the selector changes and no targets are selected anymore, the existing status can't be deleted. This is because the current translator ignores the xPolicy without targets and won't update their statuses. This will be addressed in a separate PR.

@zhaohuabing zhaohuabing marked this pull request as ready for review May 22, 2025 15:17
@zhaohuabing zhaohuabing marked this pull request as draft May 22, 2025 15:28
Signed-off-by: Huabing (Robin) Zhao <[email protected]>
@arkodg
Copy link
Copy Markdown
Contributor

arkodg commented May 23, 2025

@zhaohuabing dont xRoutes status have the same issue ? I believe we solved those in the provider status updater layer ?

@zhaohuabing
Copy link
Copy Markdown
Member Author

zhaohuabing commented May 23, 2025

@zhaohuabing dont xRoutes status have the same issue ? I believe we solved those in the provider status updater layer ?

@arkodg Yes, we fixed a similar issue for xRoute in #4587, but I prefer this solution because the xRoute fix has a loophole, more in this comment: #6153 (comment)

@arkodg
Copy link
Copy Markdown
Contributor

arkodg commented May 27, 2025

hey @zhaohuabing thanks for sharing the past history :) , this looks like two issues

  1. around cleaning up old status condition entries. can we solve this with a separate util function similar to mergeRouteParentStatus that removes any condition (linked to EG controller) whose ObservedGeneration doesnt match the current ObservedGeneration of the client Object ?
    2 ) Regarding this specific issue of multiple Gateways linked to different GatewayClasses, this feels like we need to change the ancestor to GatewayClass instead of Gateway and merge properly in the status update layer ?

@zhaohuabing
Copy link
Copy Markdown
Member Author

zhaohuabing commented May 28, 2025

  1. around cleaning up old status condition entries. can we solve this with a separate util function similar to mergeRouteParentStatus that removes any cond

Agree.

2 ) Regarding this specific issue of multiple Gateways linked to different GatewayClasses, this feels like we need to change the ancestor to GatewayClass instead of Gateway and merge properly in the status update layer ?

I have no strong opinion on this. Things we may need to consider regarding Gateway vs GatewayClass as the policy status ancestor:

@arkodg
Copy link
Copy Markdown
Contributor

arkodg commented May 28, 2025

with Gateway as the ancestor, we cannot create unique status conditions, because the conditions may vary across Gateway linked to different GatewayClasses

@zhaohuabing zhaohuabing marked this pull request as draft June 4, 2025 05:46
@zhaohuabing
Copy link
Copy Markdown
Member Author

around cleaning up old status condition entries. can we solve this with a separate util function similar to mergeRouteParentStatus that removes any condition (linked to EG controller) whose ObservedGeneration doesnt match the current ObservedGeneration of the client Object ?

Thought it again - this can't be solved by ObservedGeneration.
ObservedGeneration comes form the Generation of the xPolicy resource, however, the xPolicy status can change without a bump of its Generation.
For example:
a xPolicy targets a Gateway gw. After gw is deleted, it's still in the status of xPolicy, and this status can't be cleaned because the ObservedGeneration of status is the same as the policy.

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  generation: 11
  name: test
spec:
  connection:
    bufferLimit: 100M
  targetRefs:
  - group: gateway.networking.k8s.io
    kind: Gateway
    name: gw
status:
  ancestors:
  - ancestorRef:
      group: gateway.networking.k8s.io
      kind: Gateway
      name: gw
      namespace: default
    conditions:
    - lastTransitionTime: "2025-06-25T00:55:28Z"
      message: Policy has been accepted.
      observedGeneration: 11
      reason: Accepted
      status: "True"
      type: Accepted
    controllerName: gateway.envoyproxy.io/gatewayclass-controller

@arkodg
Copy link
Copy Markdown
Contributor

arkodg commented Jun 25, 2025

@zhaohuabing same is true for xRoute and linked resources ( backendRef)

@zhaohuabing
Copy link
Copy Markdown
Member Author

zhaohuabing commented Jun 25, 2025

with Gateway as the ancestor, we ca

You mean this issue #774 ? I'm planning to work on that after fixing the xPolicy status.

@zhaohuabing zhaohuabing marked this pull request as ready for review June 25, 2025 02:10
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. Please feel free to give a status update now, ping for review, when it's ready. Thank you for your contributions!

@github-actions github-actions bot added the stale label Jul 30, 2025
@zhaohuabing zhaohuabing marked this pull request as draft July 31, 2025 03:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Policy with targetSelector targeting multiple gateways does not show both ancestors

2 participants