kernel memcg notification enabled via experimental flag#38258
kernel memcg notification enabled via experimental flag#38258k8s-github-robot merged 2 commits intokubernetes:masterfrom
Conversation
efea36c to
451fe67
Compare
|
Jenkins GCI GKE smoke e2e failed for commit 451fe67. Full PR test history. The magic incantation to run this job again is |
|
Jenkins GKE smoke e2e failed for commit 451fe67. Full PR test history. The magic incantation to run this job again is |
|
Jenkins Kubemark GCE e2e failed for commit 451fe67. Full PR test history. The magic incantation to run this job again is |
|
Jenkins GCI GCE e2e failed for commit 451fe67. Full PR test history. The magic incantation to run this job again is |
|
Jenkins GCE e2e failed for commit 451fe67. Full PR test history. The magic incantation to run this job again is |
|
Jenkins GCE etcd3 e2e failed for commit 451fe67. Full PR test history. The magic incantation to run this job again is |
|
Jenkins unit/integration failed for commit 451fe67. Full PR test history. The magic incantation to run this job again is |
|
Jenkins GCE Node e2e failed for commit 451fe67. Full PR test history. The magic incantation to run this job again is |
|
Jenkins CRI GCE Node e2e failed for commit 451fe67. Full PR test history. The magic incantation to run this job again is |
|
lgtm |
| if obj.EvictionPressureTransitionPeriod == zeroDuration { | ||
| obj.EvictionPressureTransitionPeriod = metav1.Duration{Duration: 5 * time.Minute} | ||
| } | ||
| if obj.ExperimentalKernelMemcgNotification == nil { |
There was a problem hiding this comment.
ExperimentalKernelMemcgNotification is a bool, not a *bool.
There was a problem hiding this comment.
yeah, i just fixed a push with that. i was still messing w/ generators and just threw this up to make people aware i was working on it.
| fs.DurationVar(&s.EvictionPressureTransitionPeriod.Duration, "eviction-pressure-transition-period", s.EvictionPressureTransitionPeriod.Duration, "Duration for which the kubelet has to wait before transitioning out of an eviction pressure condition.") | ||
| fs.Int32Var(&s.EvictionMaxPodGracePeriod, "eviction-max-pod-grace-period", s.EvictionMaxPodGracePeriod, "Maximum allowed grace period (in seconds) to use when terminating pods in response to a soft eviction threshold being met. If negative, defer to pod specified value.") | ||
| fs.StringVar(&s.EvictionMinimumReclaim, "eviction-minimum-reclaim", s.EvictionMinimumReclaim, "A set of minimum reclaims (e.g. imagefs.available=2Gi) that describes the minimum amount of resource the kubelet will reclaim when performing a pod eviction if that resource is under pressure.") | ||
| fs.BoolVar(&s.ExperimentalKernelMemcgNotification, "experimental-kernel-memcg-notification", s.ExperimentalKernelMemcgNotification, "If enabled, the kubelet will integrate with the kernel memcg notification to determine if memory eviction thresholds are crossed rather than polling.") |
There was a problem hiding this comment.
I wonder if this would be better guarded by a feature gate than a flag; with a feature gate we don't have a new flag that needs to be removed later. Just a thought.
There was a problem hiding this comment.
i think its fair to assume that not all kernels are equal and there could be some distros that require it to be off so a flag not tied to feature gate seems right in the long run?
There was a problem hiding this comment.
Yeah true. This is the kind of switch that might need to be in for the long haul, due to the kernel issue. A flag is probably the appropriate solution.
There was a problem hiding this comment.
Should we put a note in this flag that it may cause soft lockups due to an unidentified kernel issue? I've heard some worries that we don't want people trying to "make Kubernetes faster" by turning this on without realizing it could cause other problems.
There was a problem hiding this comment.
we haven't proven that it causes soft lockups have we?
There was a problem hiding this comment.
We did, in some sense. Soft lockups are kernel problem, so proper fix is beyond our reach, but we did prove that this triggers soft lockup on debian in CVM. @dchen1107
451fe67 to
cd09953
Compare
|
i am having some trouble updating generated-protobuf and will finish this up in the morning. |
|
Jenkins verification failed for commit cd099531c7dd8ca0d3ca6da1747b2ad161f2385b. Full PR test history. The magic incantation to run this job again is |
|
@derekwaynecarr looks like running |
|
@dims -- yeah, i had docker issues last night that prevented me from running the full hack/update suite. i just fixed that up locally, and the generation stuff is all running now, should have a second commit for this pr soon with generated stuff. |
cd09953 to
bcca039
Compare
|
All things green. |
|
We don't want to start merging things until we are more confident in the root cause. |
|
@mtaufen - i am still biased to take the gate since we know it caused problems with older ubuntu trusty images. |
|
LGTM cc/ @fabioy we should make sure --experimental-kernel-memcg-notification is false in GKE production. |
|
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
|
Automatic merge from submit-queue (batch tested with PRs 38318, 38258) |
|
@derekwaynecarr This merged, can you please prepare a cherry pick to the 1.5 branch. |
|
Ah perfect, will get that merged. |
|
Commit found in the "release-1.5" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked. |
Kubelet integrates with kernel memcg notification API if and only if enabled via experimental flag.