Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Antiaffinity pods scheduled to the same node during scheduler leader-election #65257

Closed
DylanBLE opened this issue Jun 20, 2018 · 15 comments
Closed
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.

Comments

@DylanBLE
Copy link
Contributor

DylanBLE commented Jun 20, 2018

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:
Two pods with AntiAffinity label are scheduled to the same node.

What you expected to happen:
Two pods with AntiAffinity label are scheduled to different nodes.

How to reproduce it (as minimally and precisely as possible):
na

Anything else we need to know?:
The bug is caused by scheduler during leader switch.
Here is what happened:
Suppose there are two schedulers, SA(active), SB(standby). Two pods with antifinity PA, PB. Two nodes NA, NB.

  1. SA failed to renew the lease.
  2. SB became active scheduler.
  3. SA's main thread didn't quit ontime, it scheduled PA to NA and then quited.
  4. SB scheduled PB to NA because it didn't know PA has already been scheduled.
  5. The final state is that PA and PB were both on node NA.

Here is the log of SA:
SA: tw-node2221, SB: tw-node2222.
SA lost election at 16:03:35 but continued to schedule pods until found conflicts in cache and then quited.
image

Environment:

  • Kubernetes version (use kubectl version):
    v1.5.6
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
    4.4.64-1.el7.elrepo.x86_64
  • Install tools:
  • Others:
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jun 20, 2018
@DylanBLE
Copy link
Contributor Author

/sig api-machinery

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 20, 2018
@DylanBLE
Copy link
Contributor Author

I would like to add Antifinity Predict to Kubelet Admit to fix this issue.

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 20, 2018
@hzxuzhonghu
Copy link
Member

This can happen when the original leader takes too long(eg. longer than LeaseDuration ) to tryAcquireOrRenew, but in the meantime the leadership has been acquired by another candidate.

@hzxuzhonghu
Copy link
Member

#65094 This can reduce this probability at some degree. But can not eliminate more than one schedulers working concurrently during some transition time.

@hzxuzhonghu
Copy link
Member

hzxuzhonghu commented Jun 20, 2018

To solve this completely, we have to prevent overlap of two leaders:

  1. add a filed in LeaderElector to record the renew time of leader.

  2. candidate does not call Update until at least LeaseDuration time have passed since last renew time.

  3. limit tryAcquireOrRenew not takes more than LeaseDuration - RetryPeriod time

@hzxuzhonghu
Copy link
Member

@krmayankk
Copy link

krmayankk commented Jun 20, 2018

/sig-scheduling

@DylanBLE
Copy link
Contributor Author

@hzxuzhonghu Thanks for the information.
I'm wondering if I can add scheduler's antiaffinity predict to kubelet's Admit function.
It will evict the pod with existing antiaffinity.

@wenjiaswe
Copy link
Contributor

wenjiaswe commented Jun 25, 2018

/remove-sig api-machinery
/sig scheduling

@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. and removed sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Jun 25, 2018
@k8s-ci-robot
Copy link
Contributor

@wenjiaswe: Those labels are not set on the issue: sig/api-machinery

In response to this:

/remove-sig api-machinery
/sig scheduling

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 25, 2018
k8s-github-robot pushed a commit that referenced this issue Jul 3, 2018
Automatic merge from submit-queue (batch tested with PRs 65094, 65533, 63522, 65694, 65702). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

set leader election client and renew timeout

**What this PR does / why we need it**:

set leader-election client timeout

set timeout for tryAcquireOrRenew

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #65090 #65257

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
@k82cn
Copy link
Member

k82cn commented Jul 26, 2018

I'm wondering if I can add scheduler's antiaffinity predict to kubelet's Admit function.

We can not do that; AntiAffinity also supports "zone/topologyKey", but kubelet should not knows the status of other nodes. That's why we did not include Pod Affinity/Anti-Affinity in kubelet admit.

@DylanBLE
Copy link
Contributor Author

@k82cn That makes sense for me.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 26, 2018
@DylanBLE
Copy link
Contributor Author

/close

@k8s-ci-robot
Copy link
Contributor

@DylanBLE: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants