-
Notifications
You must be signed in to change notification settings - Fork 339
add leader election support to sharedmain #1019
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
vagababov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lint
injection/sharedmain/main.go
Outdated
| logger.Fatalw("Version check failed", zap.Error(err)) | ||
| } | ||
| // run is the main controller flow | ||
| run := func(ctx context.Context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite big. Why not extract it to a separate function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was following the constructions in k8s controller manager, decided to save any cosmetics or discretionary factoring for after it worked. Since that function captures many variables from the enclosing context, extracting a function would be more laborious than it might appear - since you would have to introduce a type or function factory to inject those variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't look closely, but I noticed logger. If it's just 2-3, then passing via args should be alright.
But, sure, we can brush it up later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Frankly, I don't think it's worth it, but you're welcome to do it as a follow-on PR.
injection/sharedmain/main.go
Outdated
| run(ctx) | ||
| panic("unreachable") | ||
| } else { | ||
| // create a unique identifier so that two controllers on the same host don't |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // create a unique identifier so that two controllers on the same host don't | |
| // Create a unique identifier so that two controllers on the same host don't |
injection/sharedmain/main.go
Outdated
| if err != nil { | ||
| logger.Fatalw("Failed to get hostname for leader election", zap.Error(err)) | ||
| } | ||
| id = id + "_" + string(uuid.NewUUID()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| id = id + "_" + string(uuid.NewUUID()) | |
| id += "_" + string(uuid.NewUUID()) |
injection/sharedmain/main.go
Outdated
| logger.Fatalw("Failed to get hostname for leader election", zap.Error(err)) | ||
| } | ||
| id = id + "_" + string(uuid.NewUUID()) | ||
| log.Printf("%v will run in leader-elected mode with id %v", component, id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rm? or logger.Infof().
injection/sharedmain/main.go
Outdated
| }) | ||
| if err != nil { | ||
| logger.Fatalw("Failed to create admission controller", zap.Error(err)) | ||
| logger.Fatalw("error creating lock: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| logger.Fatalw("error creating lock: %v", err) | |
| logger.Fatalw("Error creating lock: %v", zap.Error(err)) |
injection/sharedmain/main.go
Outdated
| Callbacks: leaderelection.LeaderCallbacks{ | ||
| OnStartedLeading: run, | ||
| OnStoppedLeading: func() { | ||
| logger.Fatalw("leaderelection lost") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| logger.Fatalw("leaderelection lost") | |
| logger.Fatal("leaderelection lost") |
should work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand this code correctly, the controller will start up, and try to acquire the leader lock. If it participates in the election and is not elected, it will sit and continue participating in the leader election. If it wins the election, it will serve (run the specified controllers) and will exit if it ever loses an election after that.
Is that correct?
injection/sharedmain/main.go
Outdated
| if !leConfig.LeaderElect { | ||
| log.Printf("%v will not run in leader-elected mode", component) | ||
| run(ctx) | ||
| panic("unreachable") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd go with logger.Fatal() instead, rather than panic. Same below.
knative-prow-robot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vagababov: 11 warnings.
Details
In response to this:
/lint
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
leaderelection/leaderelection.go
Outdated
| corev1 "k8s.io/api/core/v1" | ||
| ) | ||
|
|
||
| const ConfigMapNameEnv = "CONFIG_LEADERELECTION_NAME" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Golint comments: exported const ConfigMapNameEnv should have comment or be unexported. More info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably unexported.
leaderelection/leaderelection.go
Outdated
| EnabledComponents map[string]bool | ||
| } | ||
|
|
||
| func (c *Config) GetComponentConfig(name string) ComponentConfig { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Golint comments: exported method Config.GetComponentConfig should have comment or be unexported. More info.
| }, | ||
| } | ||
|
|
||
| for i, _ := range cases { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| for i, _ := range cases { | |
| for i := range cases { |
Golint range-loop: should omit 2nd value from range; this loop is equivalent to for i := range ....
| }, | ||
| } | ||
|
|
||
| for i, _ := range cases { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| for i, _ := range cases { | |
| for i := range cases { |
Golint range-loop: should omit 2nd value from range; this loop is equivalent to for i := range ....
injection/sharedmain/main.go
Outdated
| if err != nil { | ||
| if apierrors.IsNotFound(err) { | ||
| return kle.NewConfigFromMap(nil) | ||
| } else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Golint indent: if block ends with a return statement, so drop this else and outdent its block. More info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to this comment.
| @@ -0,0 +1,4 @@ | |||
| # See the OWNERS docs at https://go.k8s.io/owners | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought ./hack/update-deps.sh deletes OWNERS files in vendor/.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you run it ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, if you run it applies :)
2ccb64d to
2ff59b4
Compare
2ff59b4 to
5dbd8d0
Compare
5dbd8d0 to
7c06a49
Compare
f78a2c2 to
e28f996
Compare
vaikas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this!
injection/sharedmain/main.go
Outdated
| logger.Fatalw("leaderelection lost") | ||
| }, | ||
| }, | ||
| // TODO: investigate using watchdog |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add an issue to track this and add a pointer to that here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #1048
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
leaderelection/leaderelection.go
Outdated
| } | ||
|
|
||
| if leaseDurationStr, ok := data["leaseDuration"]; ok { | ||
| if leaseDuration, err := time.ParseDuration(leaseDurationStr); err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unless I'm totes reading this wrong, if a user specifies an invalid value here, it will not take effect and they will have no visibility that what they wanted didn't happen?
Should we at the very least log this error?
Just worrying about the lack of visibility here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point; I think that in the presence of an invalid value, you should receive the default value and have a log message. WDYT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it gets the default value now, my worry was the lack of feedback, which the log message would provide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Three choices here:
-
If you supply an invalid value, it fails loudly (i.e.
logger.Fatal) This is most appropriate for startup conditions, where it might block something like a rolling update. -
If an invalid config is supplied, the entire config should be considered invalid and rejected. In this case, this would be adding an else to
return nil, err -
If a partially-invalid config is supplied, do the best you can, and log errors about the parts that didn't work.
I'd most prefer option 1, but given that we're driving this from dynamically-updated configmaps, I think option 2 would be better than option 3, because it makes the application of the ConfigMap much more deterministic (all or nothing, vs all-except-my-typos).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So... @nimakaviani added a validating webhook that allows us to register configmaps for validation and reject them outright if it returns an error. So the right answer here is multi-part:
- This should return an error
- Each downstream webhook should add this to the list of configmaps to validate.
Then the configmap update will be outright rejected and never saved to etcd.
e28f996 to
b4201f2
Compare
injection/sharedmain/main.go
Outdated
| } else if !apierrors.IsNotFound(err) { | ||
| logger.With(zap.Error(err)).Fatalf("Error reading ConfigMap %q", logging.ConfigMapName()) | ||
| } | ||
| // Watch the logging config map and dynamically update logging levels. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want this and the observability configmap to be subject to leader election, or do we want them to apply even when the controller is not leader-elected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these are configured as part of the leader-elected code, I think that there is no notion where these configs apply when the controller is not the leader. If the controller is running and not the leader it is waiting to be the leader. Note, there is an exceptional case where the leader election lock may be lost but the process still running due to deadlock which will be addressed by #1048, but I don't think it changes the equation, and I think it would be fine to have logging and observability config watches stay within the leader-elected section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These watches are used to report information like Go resource usage and where to write structured logs (which may be written for background work even when not leader elected).
Similarly, I worry that the client version check on line 172 could lead to N leader candidates which will exit as soon as they are elected leader because they don't actually like the version of Kubernetes apiserver that they are attached to.
I think I'd prefer to see the leader election more narrowly wrapped around the controller start on 234-235 (and maybe subsequent).
injection/sharedmain/main.go
Outdated
| logger.Info("Starting controllers...") | ||
| go controller.StartAll(ctx.Done(), controllers...) | ||
|
|
||
| profilingServer := profiling.NewServer(profilingHandler) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto on profiling. Do we want the profiler to only run on the elected leader, or do we want to be able to check profiling on all the participants?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here I think on all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, fixed
injection/sharedmain/main.go
Outdated
|
|
||
| // If we have one or more admission controllers, then start the webhook | ||
| // and pass them in. | ||
| if len(webhooks) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question on the webhooks. In particular, one could imagine in leader-election mode having the webhooks run only only on the non-elected nodes, to offload some CPU work from the leader. I could also see just running the webhook on all of them so that the webhook has higher availability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect that webhooks will not be run leader-elected, as they should be horizontally scalable, they are already accessed behind a service, and they do not directly mutate the API server state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with all your statements about webhooks -- my suggestion was to take this code out from inside run() and put it back in the general shared main.
I think @mattmoor 's "mink" distribution combines the webhooks and controllers into a single deployment, for example. It seems surprising to have leader election control this code.
injection/sharedmain/main.go
Outdated
| // Register webhook metrics | ||
| webhook.RegisterMetrics() | ||
|
|
||
| // possible bug? egCtx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a new comment. Should @vagababov or @mattmoor comment on whether this ctx is correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure that this was intentional because the ctx here is being passed into work that is enqueued on the errgroup. So it should key off of the SIGTERM not the first failure in the errgroup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it, it just looked a little fishy to me superficially, hence the note to myself.
injection/sharedmain/main.go
Outdated
| Callbacks: leaderelection.LeaderCallbacks{ | ||
| OnStartedLeading: run, | ||
| OnStoppedLeading: func() { | ||
| logger.Fatalw("leaderelection lost") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand this code correctly, the controller will start up, and try to acquire the leader lock. If it participates in the election and is not elected, it will sit and continue participating in the leader election. If it wins the election, it will serve (run the specified controllers) and will exit if it ever loses an election after that.
Is that correct?
leaderelection/leaderelection.go
Outdated
| } | ||
|
|
||
| if leaseDurationStr, ok := data["leaseDuration"]; ok { | ||
| if leaseDuration, err := time.ParseDuration(leaseDurationStr); err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Three choices here:
-
If you supply an invalid value, it fails loudly (i.e.
logger.Fatal) This is most appropriate for startup conditions, where it might block something like a rolling update. -
If an invalid config is supplied, the entire config should be considered invalid and rejected. In this case, this would be adding an else to
return nil, err -
If a partially-invalid config is supplied, do the best you can, and log errors about the parts that didn't work.
I'd most prefer option 1, but given that we're driving this from dynamically-updated configmaps, I think option 2 would be better than option 3, because it makes the application of the ConfigMap much more deterministic (all or nothing, vs all-except-my-typos).
leaderelection/leaderelection.go
Outdated
|
|
||
| if enabledComponents, ok := data["enabledComponents"]; ok { | ||
| tokens := strings.Split(enabledComponents, ",") | ||
| for i, _ := range tokens { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^
leaderelection/leaderelection.go
Outdated
| for i, _ := range tokens { | ||
| str := tokens[i] | ||
|
|
||
| if str == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this check?
injection/sharedmain/main.go
Outdated
| if err != nil { | ||
| logger.Fatalw("Failed to get hostname for leader election", zap.Error(err)) | ||
| } | ||
| id += "_" + string(uuid.NewUUID()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this a helper in the leaderelection package? Seems like you use it in serving as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
leaderelection/leaderelection.go
Outdated
| RenewDeadline time.Duration | ||
| RetryPeriod time.Duration | ||
|
|
||
| EnabledComponents map[string]bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use sets.String from apimachinery
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
leaderelection/leaderelection.go
Outdated
|
|
||
| if enabledComponents, ok := data["enabledComponents"]; ok { | ||
| tokens := strings.Split(enabledComponents, ",") | ||
| for i, _ := range tokens { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you use sets.String as I suggest below, then the entire loop can be sets.NewString(tokens...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
leaderelection/leaderelection.go
Outdated
| ResourceLock string `json:"resourceLock"` | ||
| LeaseDuration time.Duration `json:"leaseDuration"` | ||
| RenewDeadline time.Duration `json:"renewDeadline"` | ||
| RetryPeriod time.Duration `json:"retryPeriod"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need the json encodings here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct, i'll remove them
b4201f2 to
a3b3051
Compare
yes, if leader election is enabled for the controller by name |
| } | ||
|
|
||
| // GetLeaderElectionConfig gets the leader election config. | ||
| func GetLeaderElectionConfig(ctx context.Context) (*kle.Config, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this method need to be public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is used in controllers that do not use sharedmain, as are other methods from this file that are exported.
injection/sharedmain/main.go
Outdated
| logger.Info("Starting controllers...") | ||
| go controller.StartAll(ctx.Done(), controllers...) | ||
|
|
||
| profilingServer := profiling.NewServer(profilingHandler) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here I think on all
injection/sharedmain/main.go
Outdated
| EventRecorder: recorder, | ||
| }) | ||
| if err != nil { | ||
| logger.Fatalw("Error creating lock: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| logger.Fatalw("Error creating lock: %v", err) | |
| logger.Fatalw("Error creating lock", zap.Error(err)) |
leaderelection/leaderelection.go
Outdated
| corev1 "k8s.io/api/core/v1" | ||
| ) | ||
|
|
||
| const ConfigMapNameEnv = "CONFIG_LEADERELECTION_NAME" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably unexported.
leaderelection/leaderelection.go
Outdated
| return defaultComponentConfig() | ||
| } | ||
|
|
||
| func defaultConfig() Config { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| func defaultConfig() Config { | |
| func defaultConfig() *Config { |
leaderelection/leaderelection.go
Outdated
| } | ||
|
|
||
| func defaultConfig() Config { | ||
| return Config{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return Config{ | |
| return &Config{ |
This way you don't have to take address of the returned object every time
After some thought, I am leaning toward option 1, to error loudly if there is an invalid value and refuse to start. The configmap will be protected by a validating webhook. In most cases this will prevent the configmap from being mutated into an invalid configuration, but it's still possible to get into a state whether the configmap contains invalid data. Since we believe it will be an exceptional condition for this to happen, I think it is okay to refuse to start. If the configmap contains invalid data, and the system is otherwise healthy, the configmap will only ever be able to be updated to a valid state. Similarly, we exit the controller during startup if there's an error parsing the log config. Note, currently this code does not reload a running process' configuration based on changes to the configmap. I spent a bit of time investigating whether the leader election code would tolerate being reconfigured at runtime (the fields of the config are exported and referenced within the code instead of unpacked from the config). I believe it would be possible to investigate whether this would work, but I do not believe the code is designed for that and it is not tested for it. I think we have a few options:
Many changes of configuration would likely be benign in isolation (ie, no other changes to deployed resources for knative controllers), but some could result in controller downtime or other negative effects (like disabling leader election without changing number of controller replicas). In general I expect that operators of knative will rarely change this configuration, and in my experience leader elected controllers frequently move between being the leader, to waiting for the lock, and back again. Since that is the case, i'm leaning toward retaining the existing behavior and reading only once OR making the controller exit (since they frequently exit and are restarted by the kubelet anyway due to losing the leader election lock). |
|
/retest |
98c490c to
0c44ffe
Compare
0c44ffe to
18aad98
Compare
18aad98 to
7aba263
Compare
vagababov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if I left the same comments before, but those are mostly stylistical changes.
injection/sharedmain/main.go
Outdated
| <-egCtx.Done() | ||
|
|
||
| profilingServer.Shutdown(context.Background()) | ||
| // Don't forward ErrServerClosed as that indicates we're already shutting down. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: comment here doesn't make much sense, since we're not forwarding anything, logging at best.
injection/sharedmain/main.go
Outdated
| if !leConfig.LeaderElect { | ||
| logger.Infof("%v will not run in leader-elected mode", component) | ||
| run(ctx) | ||
| logger.Fatal("unreachable") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
won't this be reachable when <-ctx.Done() triggers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this no longer applies, I don't think. I'll remove it.
leaderelection/config.go
Outdated
|
|
||
| if resourceLock, ok := data["resourceLock"]; ok { | ||
| if !validResourceLocks.Has(resourceLock) { | ||
| return nil, fmt.Errorf("resourceLock: invalid value %q: valid values are \"leases\",\"configmaps\",\"endpoints\"", resourceLock) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return nil, fmt.Errorf("resourceLock: invalid value %q: valid values are \"leases\",\"configmaps\",\"endpoints\"", resourceLock) | |
| return nil, fmt.Errorf(`resourceLock: invalid value %q: valid values are "leases", "configmaps", "endpoints"`, resourceLock) |
leaderelection/config.go
Outdated
| EnabledComponents: sets.NewString(), | ||
| } | ||
|
|
||
| if resourceLock, ok := data["resourceLock"]; ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and for the checks below, I'd totally support simplifying this to:
if x := data["resourceLock"]; !validResourceLocks(x) {
return fmt.Errorf(....
}"" is not a valid value is just as good as must not be empty, imo. But will shorten the file in half.
leaderelection/config.go
Outdated
| return defaultComponentConfig() | ||
| } | ||
|
|
||
| func defaultConfig() Config { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why wouldn't we return pointer from the getgo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can actually delete this method entirely now.
|
The following is the coverage report on the affected files.
|
vagababov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
evankanderson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few small comments; the only big one is that I still think the logging and metrics setup should be outside the leader election (so that we can monitor non-elected controllers).
| profilingServer := profiling.NewServer(profilingHandler) | ||
| eg, egCtx := errgroup.WithContext(ctx) | ||
| eg.Go(profilingServer.ListenAndServe) | ||
| go func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering why this is a goroutine, rather than a defer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method won't exit until the server is done or crashes. Is there a downside to using a goroutine here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was just a bit unusual, but it looks like the only leader election option is RunOrDie, so this is fine.
| run := func(ctx context.Context) { | ||
| cmw := SetupConfigMapWatchOrDie(ctx, logger) | ||
| controllers, _ := ControllersAndWebhooksFromCtors(ctx, cmw, ctors...) | ||
| WatchLoggingConfigOrDie(ctx, cmw, logger, atomicLevel, component) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still wondering why we only want to configure logging and monitoring after being elected leader, rather than applying the logging and monitoring configs continuously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was following the pattern in the k8s controller manager, which is to basically do absolutely nothing until you're the leader. I don't see an advantage to watching these configs until you're the leader, but if you want me to change it, I will. Do you want me to change it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea would be to have logs and metrics potentially exporting to a remote destination even for non-leader controllers (so you could count the number of standby working controllers, for example).
But this can be done later, so I'll approve as-is.
injection/sharedmain/main.go
Outdated
| } | ||
|
|
||
| RunLeaderElected(ctx, logger, run, component, leConfig) | ||
| logger.Fatal("unreachable") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why logger.Fatal here, rather than allowing RunLeaderElected to exit at shutdown if necessary?
|
|
||
| // Create a unique identifier so that two controllers on the same host don't | ||
| // race. | ||
| id, err := kle.UniqueID() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for this PR
It seems like we might want to include a metric here indicating the leader-election status.
Can you drop a // TODO: add monitoring for leader election status to this bit of the code (and maybe file a good-first-issue for it)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
leaderelection/config.go
Outdated
| } | ||
|
|
||
| if leaseDurationStr, ok := data["leaseDuration"]; ok { | ||
| if leaseDuration, err := time.ParseDuration(leaseDurationStr); err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Combining with Victor's comment above, what about:
var err error
if config.LeaseDuration, err = time.ParseDuration(data["leaseDuration"]); err != nil {
return nil, fmt.Errorf("leaseDuration: %+v", err)
}| return &config, nil | ||
| } | ||
|
|
||
| return NewConfigFromMap(configMap.Data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we should pass the config from defaultConfig through NewConfigFromMap to make sure that it meets the expected invariants. WDYT?
|
|
||
| func defaultComponentConfig() ComponentConfig { | ||
| return ComponentConfig{ | ||
| LeaderElect: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the default value. I assume you're just trying to make the default very clear?
| cm := os.Getenv(ConfigMapNameEnv) | ||
| if cm == "" { | ||
| return "config-leader-election" | ||
| } | ||
| return cm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| cm := os.Getenv(ConfigMapNameEnv) | |
| if cm == "" { | |
| return "config-leader-election" | |
| } | |
| return cm | |
| if cm := os.Getenv(ConfigMapNameEnv); cm != "" { | |
| return cm | |
| } | |
| return "config-leader-election" |
evankanderson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
| run := func(ctx context.Context) { | ||
| cmw := SetupConfigMapWatchOrDie(ctx, logger) | ||
| controllers, _ := ControllersAndWebhooksFromCtors(ctx, cmw, ctors...) | ||
| WatchLoggingConfigOrDie(ctx, cmw, logger, atomicLevel, component) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea would be to have logs and metrics potentially exporting to a remote destination even for non-leader controllers (so you could count the number of standby working controllers, for example).
But this can be done later, so I'll approve as-is.
| profilingServer := profiling.NewServer(profilingHandler) | ||
| eg, egCtx := errgroup.WithContext(ctx) | ||
| eg.Go(profilingServer.ListenAndServe) | ||
| go func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was just a bit unusual, but it looks like the only leader election option is RunOrDie, so this is fine.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: evankanderson, pmorie The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* Add leader election config and to sharedmain * Add new dependencies * Extract method for RunLeaderElected * Make leader election config constructor validate * Rename leader election files * Always start profiling server whether component has LE lock or not * Fix entering unreachable section when leader election is disabled * Address PR feedback
* add leader election support to sharedmain (#1019) * Add leader election config and to sharedmain * Add new dependencies * Extract method for RunLeaderElected * Make leader election config constructor validate * Rename leader election files * Always start profiling server whether component has LE lock or not * Fix entering unreachable section when leader election is disabled * Address PR feedback * Fix missing import
For #1007; this PR adds WIP support for leader election to
sharedmain.