Update OOMKilled event handling#12714
Merged
fuweid merged 6 commits intocontainerd:mainfrom Jan 7, 2026
Merged
Conversation
f92208a to
b61129b
Compare
mikebrow
reviewed
Dec 19, 2025
Member
mikebrow
left a comment
There was a problem hiding this comment.
couple comments on the events.go changes
b00a96e to
fafcfc8
Compare
mikebrow
reviewed
Dec 20, 2025
fafcfc8 to
5f04a54
Compare
Member
Author
|
Hi @mikebrow The output is from integration/remote package. Since we create a lot of pods and containers, it outputs a lot. Let me see if I can reduce it in the followup. And I also update the critest oomkilled testcase for systemd cgroup driver. please take a look when you have chance. Thanks |
5f04a54 to
93af93e
Compare
Contributor
Member
Author
I will update package later and then update in containerd. |
Member
Author
|
May I have review on this? Thanks! |
The OOM handling code is intended to live under pkg/oom/v2. However, the cgroupv2 package still needs further refinement, such as exporting the cgroup path and allowing callers to query specific stats instead of returning all of them. Until that work is complete, introduce the OOM package as experimental and place it under containerd-shim-runc-v2. Signed-off-by: Wei Fu <[email protected]>
We should always send oom event before exit event. Signed-off-by: Wei Fu <[email protected]>
Signed-off-by: Wei Fu <[email protected]>
The test was validated locally by running 100 pods for 100 rounds without observing any failures. Due to limited resources in the CI environment, the test parameters were reduced to 8 pods and 10 rounds. ```bash FOCUS=TestOOMEventMonitor CGROUP_DRIVER=cgroupfs taskset -c 0,1 make cri-integration | tee /tmp/log ``` Signed-off-by: Wei Fu <[email protected]>
Signed-off-by: Wei Fu <[email protected]>
Signed-off-by: Wei Fu <[email protected]>
dmcgowan
approved these changes
Jan 7, 2026
This was referenced Apr 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
cmd/containerd-shim-runc-v2: add experimental OOM package
The OOM handling code is intended to live under pkg/oom/v2. However, the
cgroupv2 package still needs further refinement, such as exporting the
cgroup path and allowing callers to query specific stats instead of
returning all of them.
Until that work is complete, introduce the OOM package as experimental
and place it under containerd-shim-runc-v2.
cmd/containerd-shim-runc-v2: use experimental OOM package
We should always send oom event before exit event.
internal/cri/server: check if OOM event occurred before update status
cri-integration: add stress test for TestOOMEventMonitor
The test was validated locally by running 100 pods for 100 rounds without
observing any failures. Due to limited resources in the CI environment,
the test parameters were reduced to 8 pods and 10 rounds.
FOCUS=TestOOMEventMonitor CGROUP_DRIVER=cgroupfs taskset -c 0,1 make cri-integration | tee /tmp/log*: skip critest OOMKilled testcase for systemd cgroup
With the systemd cgroup driver, the container runtime uses a scope unit to
manage the cgroup path. According to the scope unit documentation:
We cannot rely on CollectMode=inactive-or-failed to preserve the cgroup path.
So there is a race condition between containerd and systemd garbage collection.
If systemd GC removes the scope unit’s cgroup before containerd reads it,
containerd loses the opportunity to inspect the cgroup and determine the OOM status.
So we disable the OOMKilled testcase.
In theory, this could be mitigated by inspecting the unit logs (e.g.
journalctl -u XXX.scope) and searching for the "OOMKilled" keyword.However, this approach depends on journalctl and systemd logging behavior,
so it should be avoided.
Example journal output:
Ref: https://www.freedesktop.org/software/systemd/man/latest/systemd.scope.html