-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Add leader election #6683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add leader election #6683
Conversation
e0ad305 to
7b5ee47
Compare
7b5ee47 to
26fb8d9
Compare
26fb8d9 to
210423a
Compare
210423a to
8eae52e
Compare
mattmoor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll TAL at the pkg PR next.
Gopkg.toml
Outdated
| [[override]] | ||
| name = "knative.dev/pkg" | ||
| branch = "master" | ||
| source = "github.com/pmorie/pkg" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pro tip: I usually throw a // TODO(mattmoor): DO NOT SUBMIT around this, so that the bot catches it and leaves a comment.
cmd/autoscaler/main.go
Outdated
| if !leConfig.LeaderElect { | ||
| log.Printf("autoscaler will not run in leader-elected mode") | ||
| run(ctx) | ||
| panic("unreachable") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this not hit when the context is cancelled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(which happens when K8s sends SIGTERM)
| return err == nil, nil | ||
| }); perr != nil { | ||
| log.Fatal("Timed out attempting to get k8s version: ", err) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you'd left this as is, then you could save some kubeclient.Get calls below.
cmd/autoscaler/main.go
Outdated
| if err != nil { | ||
| logger.Fatalw("Failed to get hostname for leader election", zap.Error(err)) | ||
| } | ||
| id = id + "_" + string(uuid.NewUUID()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think hoisting this little bit into a helper would document it in the most straightforward way. At first I read the comment above and was like: "That's not what os.Hostname() does...?" :)
| # giving up; 10 seconds is the value used by core kubernetes controllers. | ||
| renewDeadline: "10s" | ||
| # retryPeriod is how long the leader election client waits between tries of | ||
| # actions; 2 seconds is the value used by core kuberntes controllers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # actions; 2 seconds is the value used by core kuberntes controllers. | |
| # actions; 2 seconds is the value used by core kubernetes controllers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still seems to be a problem
cmd/autoscaler/main.go
Outdated
| logger.Fatalw("leaderelection lost") | ||
| }, | ||
| }, | ||
| // WatchDog: electionChecker, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this?
cmd/autoscaler/main.go
Outdated
| if err := eg.Wait(); err != nil && err != http.ErrServerClosed { | ||
| logger.Errorw("Error while running server", zap.Error(err)) | ||
| } | ||
| panic("unreachable") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd use logger.Fatal in place of panic
cmd/autoscaler/main.go
Outdated
| eg, egCtx := errgroup.WithContext(ctx) | ||
| eg.Go(func() error { | ||
| return customMetricsAdapter.Run(ctx.Done()) | ||
| leaderelection.RunOrDie(ctx, leaderelection.LeaderElectionConfig{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC we get all of this for free in the other binaries due to their use of sharedmain?
| }) | ||
| if err != nil { | ||
| logger.Fatalw("Failed to create admission controller", zap.Error(err)) | ||
| logger.Fatalw("error creating lock: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we leaking resources on uninstall?
8eae52e to
9799bb1
Compare
9799bb1 to
a303651
Compare
|
/retest |
a303651 to
58a39e5
Compare
|
/retest |
9aef37a to
1e9ef61
Compare
vagababov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lint
knative-prow-robot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vagababov: 1 warning.
Details
In response to this:
/lint
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
vagababov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
mattmoor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
/hold
Mostly nits. If we want to get this baking, feel free to /unhold and follow-up, or I can restamp if you want to knock these off now.
cmd/webhook/main.go
Outdated
| // config validation constructors | ||
| tracingconfig "knative.dev/pkg/tracing/config" | ||
| defaultconfig "knative.dev/serving/pkg/apis/config" | ||
| configvalidation "knative.dev/serving/pkg/apis/config/validation" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is sort of an odd place to put this. Can we make it knative.dev/serving/pkg/leaderelection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: the defaulting config is here because defaulting happens as part of the pkg/apis stuff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair. I had something similar in eventing that already went in, mind if we do this in a follow-up?
| @@ -0,0 +1,69 @@ | |||
| # Copyright 2018 The Knative Authors | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Copyright 2018 The Knative Authors | |
| # Copyright 2020 The Knative Authors |
| resourceLock: "leases" | ||
| leaseDuration: "15s" | ||
| renewDeadline: "10s" | ||
| retryPeriod: "2s" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the above should come from defaults, right? If so we generally let the defaulting provide them so that folks can safely tweak the knobs via kubectl edit and not upset the three-way merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, I will do that in a follow-up.
| # giving up; 10 seconds is the value used by core kubernetes controllers. | ||
| renewDeadline: "10s" | ||
| # retryPeriod is how long the leader election client waits between tries of | ||
| # actions; 2 seconds is the value used by core kuberntes controllers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still seems to be a problem
|
|
||
| // ValidateLeaderElectionConfig enriches the leader election config validation | ||
| // with extra validations specific to serving. | ||
| func ValidateLeaderElectionConfig(configMap *corev1.ConfigMap) (*kle.Config, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we go with pkg/leaderelection (comment above) then we should rename ValidateConfig to avoid stutter.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mattmoor, pmorie The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/unhold |
|
/retest Looks like the failure in auto tls tests may be related to #7018 |
|
@pmorie: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
The following is the coverage report on the affected files.
|
mattmoor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
/retest |
Fixes knative/pkg#1007
Proposed Changes
sharedmain, which is used by:/configis changed so that it is simple to enable leader election, but it is not enabled by defaultRelease Note