Skip to content

Commit 1c68f50

Browse files
jepioAkihiroSuda
authored andcommitted
cgroup2: monitor OOMKill instead of OOM to prevent missing container OOM events
With the cgroupv2 configuration employed by Kubernetes, the pod cgroup (slice) and container cgroup (scope) will both have the same memory limit applied. In that situation, the kernel will consider an OOM event to be triggered by the parent cgroup (slice), and increment 'oom' there. The child cgroup (scope) only sees an oom_kill increment. Since we monitor child cgroups for oom events, check the OOMKill field so that we don't miss events. This is not visible when running containers through docker or ctr, because they set the limits differently (only container level). An alternative would be to not configure limits at the pod level - that way the container limit will be hit and the OOM will be correctly generated. An interesting consequence is that when spawning a pod with multiple containers, the oom events also work correctly, because: a) if one of the containers has no limit, the pod has no limit so OOM events in another container report correctly. b) if all of the containers have limits then the pod limit will be a sum of container events, so a container will be able to hit its limit first. Signed-off-by: Jeremi Piotrowski <[email protected]> (cherry picked from commit 7275411) Signed-off-by: Akihiro Suda <[email protected]>
1 parent 9d0acfe commit 1c68f50

1 file changed

Lines changed: 3 additions & 3 deletions

File tree

pkg/oom/v2/v2.go

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -70,15 +70,15 @@ func (w *watcher) Run(ctx context.Context) {
7070
continue
7171
}
7272
lastOOM := lastOOMMap[i.id]
73-
if i.ev.OOM > lastOOM {
73+
if i.ev.OOMKill > lastOOM {
7474
if err := w.publisher.Publish(ctx, runtime.TaskOOMEventTopic, &eventstypes.TaskOOM{
7575
ContainerID: i.id,
7676
}); err != nil {
7777
logrus.WithError(err).Error("publish OOM event")
7878
}
7979
}
80-
if i.ev.OOM > 0 {
81-
lastOOMMap[i.id] = i.ev.OOM
80+
if i.ev.OOMKill > 0 {
81+
lastOOMMap[i.id] = i.ev.OOMKill
8282
}
8383
}
8484
}

0 commit comments

Comments
 (0)