-
Notifications
You must be signed in to change notification settings - Fork 110
Fix webhook admission control deadlock during installation #2179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix webhook admission control deadlock during installation #2179
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2179 +/- ##
=======================================
Coverage 63.31% 63.31%
=======================================
Files 49 49
Lines 1878 1878
=======================================
Hits 1189 1189
Misses 597 597
Partials 92 92 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
0f0389a to
f6b3c9c
Compare
|
/ok-to-test |
|
/cc @Fedosin |
|
/remove-approve Was added automatically as I'm co-release lead. |
|
/lgtm |
This fixes a chicken-and-egg bootstrap issue where the operator would get stuck during KnativeServing installation. Problem: - ValidatingWebhookConfiguration with failurePolicy=Fail intercepts Certificate resource creation - If Certificate resources are created before the webhook pod is ready, the API server rejects them - The activator deployment depends on the routing-serving-certs secret (generated from a Certificate resource) at runtime - Previous stage ordering would check all deployments (including activator) before creating Certificate resources, causing a deadlock Solution: 1. Added CheckWebhookDeployment() function that waits specifically for the webhook deployment to be ready before proceeding 2. Reordered reconciliation stages: - manifests.Install (creates all deployments) - CheckWebhookDeployment (waits for webhook to be ready) - InstallWebhookDependentResources (creates Certificate resources) - CheckDeployments (checks all deployments including activator) This ensures: - Webhook is ready before Certificate creation (avoids admission rejection) - Certificate resources exist before checking activator (avoids missing secret) - Clear error message if webhook deployment is missing from manifest Related functions: - pkg/reconciler/common/deployments.go: Added CheckWebhookDeployment() - pkg/reconciler/knativeserving/knativeserving.go: Reordered stages - pkg/reconciler/common/install.go: Added logging for consistency
f6b3c9c to
4ff3e18
Compare
|
PTAL @houshengbo We're currently preparing release 1.20 so it might be good to get this into this release if you're interested. |
| manifests.SetManifestPaths, // setting path right after applying manifests to populate paths | ||
| common.CheckDeployments, | ||
| manifests.SetManifestPaths, // setting path right after applying manifests to populate paths | ||
| common.CheckWebhookDeployment, // Wait for webhook to be ready before creating Certificate resources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
knativeeventing.go also need this function, please add it there as well.
The name of the deployment resource for eventing is "eventing-webhook", you probably need to make the name as a param to the function "CheckWebhookDeployment".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure about eventing needing this as well?
Serving needed it as there are two install steps where the first one blocked the second one (see line 126 and 129):
operator/pkg/reconciler/knativeserving/knativeserving.go
Lines 126 to 132 in 00cda2a
| manifests.Install, | |
| manifests.SetManifestPaths, // setting path right after applying manifests to populate paths | |
| common.CheckDeployments, | |
| common.InstallWebhookDependentResources, | |
| common.MarkStatusSuccess, | |
| common.DeleteObsoleteResources(ctx, ks, r.installed), | |
| } |
Eventing has only one install step which means nothing is broken (see line 143):
operator/pkg/reconciler/knativeeventing/knativeeventing.go
Lines 143 to 148 in 00cda2a
| manifests.Install, | |
| manifests.SetManifestPaths, // setting path right after applying manifests to populate paths | |
| common.CheckDeployments, | |
| common.MarkStatusSuccess, | |
| common.DeleteObsoleteResources(ctx, ke, r.installed), | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just deployed an KnativeEventing resource with the operator and a the feature transport-encryption: Permissive causing the Certificates to be created without any issues, I guess it works without further changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation!
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: houshengbo The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest |
Hi,
this PR fixes a chicken-and-egg bootstrap issue where the operator would get stuck during KnativeServing installation.
Problem:
Solution:
This ensures:
activatormight restart very few times before CM-Certificate was created bywebhookand Secret with certificate data was created bycert-managerRelated functions:
Tested locally in my kind cluster with a KnativeServing with and without
system-internal-tlsenabled.Fixes #2178
Release Note
/kind bug