-
Notifications
You must be signed in to change notification settings - Fork 40.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cancellable leader election #57932
Cancellable leader election #57932
Conversation
/assign @cheftako |
a67cfa9
to
b013fef
Compare
/retest |
b013fef
to
ddf6fdb
Compare
/retest |
ddf6fdb
to
fdd468d
Compare
@kubernetes/sig-api-machinery-pr-reviews PTAL |
@ash2k where is your integration test that makes your case? It's pretty trivial to plumb signal-handlers through the stop channel and others have written code that depends on this external to k8s. You're basically forcing everyone who uses this to change, so let's be certain to dot all the i's and cross all the t's. |
@timothysc There seem to be some misunderstanding here. I'm saying once you start the leader election, there is no way to ask it to stop. I.e. there is no way to make the method Examples of my integration tests can be found here https://github.com/atlassian/smith/tree/master/it |
I'm ok with it, but you're changing common code that will affect multiple parties that use the client. I'm not certain how apimachinery folks communicate these changes, and what policies are in place nowadays. /cc @sttts |
Would be much easier to review if the conversion from stopCh to context would be separate. |
It would be harder to implemented the same logic with a stop channel - channels panic if closed more than once unlike the |
I don't argue against a context :) just that we have two changes mixed into one commit here. |
Would like us to move to contexts much more throughout the code base. |
I understand and do not argue that it's two changes in 1 commit. My point is that it would not be easier to review. But if you insist I'll do it in another branch as an experiment. I'll post a link. |
Two commits in this PR would be fine. In general, change sgtm. |
/retest |
@@ -144,7 +145,7 @@ func Run(c *config.CompletedConfig) error { | |||
} | |||
} | |||
|
|||
run := func(stop <-chan struct{}) { | |||
run := func(runCtx context.Context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same nit as above. s/runCtx/ctx/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It clashes with ctx, err := CreateControllerContext(c, rootClientBuilder, clientBuilder, runCtx.Done())
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that in general, variables of type context.Context
prefer to be named ctx
. Naming a "ControllerContext
" struct variable ctx
is probably worth changing so that readers don't get confused.
In fact, looking at the definition of ControllerContext
, I don't think it is named correctly. It looks more like a config struct. (-:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed.
One more comment. /lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would have been nicer to add new functionality to leaderelect without requiring clients to change in the same PR. Each controller could be updated in a follow-up PR so that you don't get caught up in all the approvals needed.
Since the work is already done here, I won't push for this to be split up. See my comment about how I think it could have been done, though.
@@ -144,7 +145,7 @@ func Run(c *config.CompletedConfig) error { | |||
} | |||
} | |||
|
|||
run := func(stop <-chan struct{}) { | |||
run := func(runCtx context.Context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that in general, variables of type context.Context
prefer to be named ctx
. Naming a "ControllerContext
" struct variable ctx
is probably worth changing so that readers don't get confused.
In fact, looking at the definition of ControllerContext
, I don't think it is named correctly. It looks more like a config struct. (-:
le.maybeReportTransition() | ||
if !succeeded { | ||
glog.V(4).Infof("failed to acquire lease %v", desc) | ||
return | ||
} | ||
le.config.Lock.RecordEvent("became leader") | ||
glog.Infof("successfully acquired lease %v", desc) | ||
close(stop) | ||
}, le.config.RetryPeriod, JitterFactor, true, stop) | ||
once.Do(func() { close(tmpStop) }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the meaning of tmpStop in this function? cancellation propagation?
JitterUntil
may call this anonymous func multiple times... aha
Okay, tmpStop signals to JitterUntil that it has succeeded. I suggest renaming it to done
. I also think that the entire wait
library tends to result in unreadable code like this, when we could have just written a for loop.
glog.Fatalf("error running controllers: %v", err) | ||
} | ||
} | ||
|
||
runCtx, cancel := context.WithCancel(context.Background()) | ||
defer cancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defer cancel()
is a pretty common pattern with context.WithCancel
. I think the expectation is that after creating the context, you are going to call some blocking Run func. If that func starts goroutines, it should pass the context (or child contexts) to them. So if Run ever exits, we will signal to any stray goroutines to clean up and exit.
@@ -146,28 +146,28 @@ type LeaderElector struct { | |||
} | |||
|
|||
// Run starts the leader election loop | |||
func (le *LeaderElector) Run(stop <-chan struct{}) { | |||
func (le *LeaderElector) Run(ctx context.Context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also create an overload like this:
func (le *LeaderElector) Run() { le.Run(context.TODO()) }
Then updating clients to use the new thing would be much more graceful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go doesn't support overloading. We discussed adding WithContext style functions to maintain compatibility but the package is documented as having an alpha API. #57932 (review)
If people feel strongly we can do this, but I wouldn't fight for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops - I am still learning quite a bit about golang.
I am not concerned with backward compatibility, but I thought the author's life would have been easier by temporarily preserving backward compatibility.
/hold cancel |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ash2k, mikedanese The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This is too late for 1.11 given the potential for regression. I'm marking at as 1.12 to be explicit with that. |
/retest Review the full test history for this PR. Silence the bot with an |
1 similar comment
/retest Review the full test history for this PR. Silence the bot with an |
[MILESTONENOTIFIER] Milestone Pull Request Labels Incomplete @ash2k @cheftako @deads2k @fgrzadkowski @mikedanese @pmorie Action required: This pull request requires label changes. If the required changes are not made within 3 days, the pull request will be moved out of the v1.12 milestone. priority: Must specify exactly one of |
Automatic merge from submit-queue (batch tested with PRs 65256, 64236, 64919, 64879, 57932). If you want to cherry-pick this change to another branch, please follow the instructions here. |
Automatic merge from submit-queue (batch tested with PRs 60012, 63692, 63977, 63960, 64008). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Use Dial with context **What this PR does / why we need it**: `net/http/Transport.Dial` field is deprecated: ```go // DialContext specifies the dial function for creating unencrypted TCP connections. // If DialContext is nil (and the deprecated Dial below is also nil), // then the transport dials using package net. DialContext func(ctx context.Context, network, addr string) (net.Conn, error) // Dial specifies the dial function for creating unencrypted TCP connections. // // Deprecated: Use DialContext instead, which allows the transport // to cancel dials as soon as they are no longer needed. // If both are set, DialContext takes priority. Dial func(network, addr string) (net.Conn, error) ``` This PR switches all `Dial` usages to `DialContext`. Fixes #63455. **Special notes for your reviewer**: Also related: kubernetes/kubernetes#59287 kubernetes/kubernetes#58532 kubernetes/kubernetes#815 kubernetes/community#1166 kubernetes/kubernetes#58677 kubernetes/kubernetes#57932 **Release note**: ```release-note HTTP transport now uses `context.Context` to cancel dial operations. k8s.io/client-go/transport/Config struct has been updated to accept a function with a `context.Context` parameter. This is a breaking change if you use this field in your code. ``` /sig api-machinery /kind enhancement /cc @sttts Kubernetes-commit: ddf551c24b7d88454f8332ce6855e53281440958
What this PR does / why we need it:
Adds ability to cancel leader election. Useful in integration tests where the whole app is started and stopped in each test.
Special notes for your reviewer:
I used the
context
package - it is impossible/hard to achieve the same behaviour with just channels without spawning additional goroutines but it is trivial withcontext
. Seeacquire()
andrenew()
methods.Release note:
/kind enhancement
/sig api-machinery