fix a failed test TestRunOOMExitCode#37194
Conversation
d37fc44 to
1824e3d
Compare
There was a problem hiding this comment.
The reason might be that the current usage is above the value we're trying to use as a limit, and as kernel can't guarantee this limit to be enforced, it replies with EBUSY. I have yet to check that though, and don't have an idea why it works with other kernels.
There was a problem hiding this comment.
I was able to repro this on my laptop (Ubuntu 18.10, vanilla kernel 4.16.12) so it's not specific to RHEL:
kir@kd:~/git/rpm$ time docker run -m 4MB busybox sh -c 'x=aaaa; while true; do x=$x$x$x$x; done'; echo $?
docker: Error response from daemon: OCI runtime create failed: container_linux.go:296: starting container process caused "process_linux.go:398: container init caused "process_linux.go:365: setting cgroup config for procHooks process caused \"failed to write 4194304 to memory.limit_in_bytes: write /sys/fs/cgroup/memory/docker/c766d5121094498b1d88a37ebddf67aa7379b177cd9eb38efa9bcc4aa75aa842/memory.limit_in_bytes: device or resource busy\""": unknown.
ERRO[0000] error waiting for container: context canceledreal 0m0.732s
user 0m0.011s
sys 0m0.003s
125
There was a problem hiding this comment.
Hmm, I was correct earlier; kernel code tries to set the limit, if the limit can't be set because current usage is more than the new limit, or because kernel can't free any pages it returns EBUSY (meaning the limit is too low). Code is mm/memcontrol.c, function mem_cgroup_resize_limit.
So
(1) this is not kernel version specific (although recent kernels appear to try harder to release memory to set the limit)
(2) this is correct kernel behavior, not a bug
(3) it seems that docker needs more than 4MB to start a container.
I'd suggest to
- remove the comment about rhel 7.4 et al kernels replacing it with something like
it appears that 8MB of memory is the minimum needed to reliably start a container
- raise the limit to 8MB
There was a problem hiding this comment.
Hmm, I was correct earlier; kernel code tries to set the limit, if the limit can't be set because current usage is more than the new limit, or because kernel can't free any pages it returns EBUSY (meaning the limit is too low). Code is mm/memcontrol.c, function mem_cgroup_resize_limit.
So
(1) this is not kernel version specific (although recent kernels appear to try harder to release memory to set the limit)
(2) this is correct kernel behavior, not a bug
(3) it seems that docker needs more than 4MB to start a container.
I'd suggest to
- remove the comment about rhel 7.4 et al kernels replacing it with something like
it appears that 8MB of memory is the minimum needed to reliably start a container
- raise the limit to 8MB
|
Also: this test, as well as TestEventsOOMDisableFalse / True, are currently being skipped by the CI due to the CI machines not having cgroup / swap enabled. IIRC when we enabled it first ppc64le, there was an issue that might be a ppc64le specific issue, and we never got around to the other architectures. It's probably worth testing enabling this on janky so the tests can check these failures in the future. |
Ah, yes, didn't that relate to "swap" being enabled on some machines (therefore not resulting in OOM, but instead memory being swapped to disk)? |
|
So, there's memory limit and memory+swap limit, by default if you set memory limit to say 8M the mem+swap limit is set to 16M (I believe by dockerd, not the kernel). Even if there's no swap available, OOM killer can be initiated. That's just from the top of my head, maybe I'm not aware of some nuances. |
Signed-off-by: Anda Xu <[email protected]>
1824e3d to
1d9973c
Compare
Codecov Report
@@ Coverage Diff @@
## master #37194 +/- ##
==========================================
- Coverage 34.62% 34.59% -0.04%
==========================================
Files 605 605
Lines 44765 44765
==========================================
- Hits 15499 15485 -14
- Misses 27166 27178 +12
- Partials 2100 2102 +2 |
👍 Fixed. |
These tests were skipped because of an issue with the test, which was fixed by moby#37194 Signed-off-by: Christopher Jones <[email protected]>
Signed-off-by: Anda Xu [email protected]
- What I did
TestRunOOMExitCodeMain problem is very likely to be a centos based kernel bug. When docker-runc tries to write memory limit in /sys/fs/cgroup/memory/docker//memory.limit_in_bytes it returns a "device or resource busy" error. There seems to be conflict with kernel process. Related link reporting a similar error - https://unix.stackexchange.com/questions/412040/cgroups-memory-limit-write-error-device-or-resource-busy
- How I did it
- How to verify it
- Description for the changelog
- A picture of a cute animal (not mandatory but encouraged)
cc @tonistiigi