Antiaffinity pods scheduled to the same node during scheduler leader-election #65257

DylanBLE · 2018-06-20T03:45:25Z

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:
Two pods with AntiAffinity label are scheduled to the same node.

What you expected to happen:
Two pods with AntiAffinity label are scheduled to different nodes.

How to reproduce it (as minimally and precisely as possible):
na

Anything else we need to know?:
The bug is caused by scheduler during leader switch.
Here is what happened:
Suppose there are two schedulers, SA(active), SB(standby). Two pods with antifinity PA, PB. Two nodes NA, NB.

SA failed to renew the lease.
SB became active scheduler.
SA's main thread didn't quit ontime, it scheduled PA to NA and then quited.
SB scheduled PB to NA because it didn't know PA has already been scheduled.
The final state is that PA and PB were both on node NA.

Here is the log of SA:
SA: tw-node2221, SB: tw-node2222.
SA lost election at 16:03:35 but continued to schedule pods until found conflicts in cache and then quited.

Environment:

Kubernetes version (use kubectl version):
v1.5.6
Cloud provider or hardware configuration:
OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a):
4.4.64-1.el7.elrepo.x86_64
Install tools:
Others:

The text was updated successfully, but these errors were encountered:

DylanBLE · 2018-06-20T03:47:23Z

/sig api-machinery

DylanBLE · 2018-06-20T03:49:05Z

I would like to add Antifinity Predict to Kubelet Admit to fix this issue.

hzxuzhonghu · 2018-06-20T06:02:41Z

This can happen when the original leader takes too long(eg. longer than LeaseDuration ) to tryAcquireOrRenew, but in the meantime the leadership has been acquired by another candidate.

hzxuzhonghu · 2018-06-20T06:16:50Z

#65094 This can reduce this probability at some degree. But can not eliminate more than one schedulers working concurrently during some transition time.

hzxuzhonghu · 2018-06-20T06:34:38Z

To solve this completely, we have to prevent overlap of two leaders:

add a filed in LeaderElector to record the renew time of leader.
candidate does not call Update until at least LeaseDuration time have passed since last renew time.
limit tryAcquireOrRenew not takes more than LeaseDuration - RetryPeriod time

hzxuzhonghu · 2018-06-20T06:35:34Z

cc @k8s-mirror-api-machinery-bugs

krmayankk · 2018-06-20T07:26:07Z

/sig-scheduling

DylanBLE · 2018-06-20T07:36:35Z

@hzxuzhonghu Thanks for the information.
I'm wondering if I can add scheduler's antiaffinity predict to kubelet's Admit function.
It will evict the pod with existing antiaffinity.

wenjiaswe · 2018-06-25T20:28:54Z

/remove-sig api-machinery
/sig scheduling

k8s-ci-robot · 2018-06-25T20:29:27Z

@wenjiaswe: Those labels are not set on the issue: sig/api-machinery

In response to this:

/remove-sig api-machinery
/sig scheduling

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Automatic merge from submit-queue (batch tested with PRs 65094, 65533, 63522, 65694, 65702). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. set leader election client and renew timeout **What this PR does / why we need it**: set leader-election client timeout set timeout for tryAcquireOrRenew **Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*: Fixes #65090 #65257 **Special notes for your reviewer**: **Release note**: ```release-note NONE ```

k82cn · 2018-07-26T13:21:45Z

I'm wondering if I can add scheduler's antiaffinity predict to kubelet's Admit function.

We can not do that; AntiAffinity also supports "zone/topologyKey", but kubelet should not knows the status of other nodes. That's why we did not include Pod Affinity/Anti-Affinity in kubelet admit.

DylanBLE · 2018-07-28T03:26:20Z

@k82cn That makes sense for me.

fejta-bot · 2018-10-26T03:48:07Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

DylanBLE · 2018-10-26T05:47:05Z

/close

k8s-ci-robot · 2018-10-26T05:47:12Z

@DylanBLE: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jun 20, 2018

k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 20, 2018

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 20, 2018

hzxuzhonghu mentioned this issue Jun 20, 2018

set leader election client and renew timeout #65094

Merged

DylanBLE mentioned this issue Jun 21, 2018

Add PodAntiAffinity predict to kubelet Admit #65303

Closed

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. and removed sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Jun 25, 2018

k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 25, 2018

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 26, 2018

k8s-ci-robot closed this as completed Oct 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Antiaffinity pods scheduled to the same node during scheduler leader-election #65257

Antiaffinity pods scheduled to the same node during scheduler leader-election #65257

DylanBLE commented Jun 20, 2018 •

edited

Loading

DylanBLE commented Jun 20, 2018

DylanBLE commented Jun 20, 2018

hzxuzhonghu commented Jun 20, 2018

hzxuzhonghu commented Jun 20, 2018

hzxuzhonghu commented Jun 20, 2018 •

edited

Loading

hzxuzhonghu commented Jun 20, 2018

krmayankk commented Jun 20, 2018 •

edited

Loading

DylanBLE commented Jun 20, 2018

wenjiaswe commented Jun 25, 2018 •

edited

Loading

k8s-ci-robot commented Jun 25, 2018

k82cn commented Jul 26, 2018

DylanBLE commented Jul 28, 2018

fejta-bot commented Oct 26, 2018

DylanBLE commented Oct 26, 2018

k8s-ci-robot commented Oct 26, 2018

Antiaffinity pods scheduled to the same node during scheduler leader-election #65257

Antiaffinity pods scheduled to the same node during scheduler leader-election #65257

Comments

DylanBLE commented Jun 20, 2018 • edited Loading

DylanBLE commented Jun 20, 2018

DylanBLE commented Jun 20, 2018

hzxuzhonghu commented Jun 20, 2018

hzxuzhonghu commented Jun 20, 2018

hzxuzhonghu commented Jun 20, 2018 • edited Loading

hzxuzhonghu commented Jun 20, 2018

krmayankk commented Jun 20, 2018 • edited Loading

DylanBLE commented Jun 20, 2018

wenjiaswe commented Jun 25, 2018 • edited Loading

k8s-ci-robot commented Jun 25, 2018

k82cn commented Jul 26, 2018

DylanBLE commented Jul 28, 2018

fejta-bot commented Oct 26, 2018

DylanBLE commented Oct 26, 2018

k8s-ci-robot commented Oct 26, 2018

DylanBLE commented Jun 20, 2018 •

edited

Loading

hzxuzhonghu commented Jun 20, 2018 •

edited

Loading

krmayankk commented Jun 20, 2018 •

edited

Loading

wenjiaswe commented Jun 25, 2018 •

edited

Loading