Skip to content

fix a failed test TestRunOOMExitCode#37194

Merged
thaJeztah merged 1 commit intomoby:masterfrom
AntaresS:fix-test-failure
Jun 5, 2018
Merged

fix a failed test TestRunOOMExitCode#37194
thaJeztah merged 1 commit intomoby:masterfrom
AntaresS:fix-test-failure

Conversation

@AntaresS
Copy link
Copy Markdown
Contributor

@AntaresS AntaresS commented Jun 1, 2018

Signed-off-by: Anda Xu [email protected]

- What I did

  • fix the failed test - TestRunOOMExitCode
    • the original test fails on rhel_7.4, rhel_7.3-dm and rhel_7.4-dm in engine 18.03. When memory limit set lower than 7MB it will raise an error of "device or resource busy" from docker-runc. Here is a failure snippet from test
01:18:52 FAIL: /go/src/github.com/docker/docker/integration-cli/docker_cli_run_unix_test.go:617: DockerSuite.TestRunOOMExitCode
01:18:52 
01:18:52 /go/src/github.com/docker/docker/integration-cli/docker_cli_run_unix_test.go:630:
01:18:52     ...open /go/src/github.com/docker/docker/integration-cli/docker_cli_run_unix_test.go: no such file or directory
01:18:52 ... value *errors.errorString = &errors.errorString{s:"wrong exit code for OOM container: expected 137, got 125 (output: \"/usr/bin/docker: Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused \\\"process_linux.go:402: container init caused \\\\\\\"process_linux.go:367: setting cgroup config for procHooks process caused \\\\\\\\\\\\\\\"failed to write 4194304 to memory.limit_in_bytes: write /sys/fs/cgroup/memory/docker/c31458a08eb2d786394076fd50f84106afc6d7d708053de7fe44a4e214a09391/memory.limit_in_bytes: device or resource busy\\\\\\\\\\\\\\\"\\\\\\\"\\\": unknown.\\ntime=\\\"2018-06-01T01:18:52Z\\\" level=error msg=\\\"error waiting for container: context canceled\\\" \\n\")"} ("wrong exit code for OOM container: expected 137, got 125 (output: \"/usr/bin/docker: Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused \\\"process_linux.go:402: container init caused \\\\\\\"process_linux.go:367: setting cgroup config for procHooks process caused \\\\\\\\\\\\\\\"failed to write 4194304 to memory.limit_in_bytes: write /sys/fs/cgroup/memory/docker/c31458a08eb2d786394076fd50f84106afc6d7d708053de7fe44a4e214a09391/memory.limit_in_bytes: device or resource busy\\\\\\\\\\\\\\\"\\\\\\\"\\\": unknown.\\ntime=\\\"2018-06-01T01:18:52Z\\\" level=error msg=\\\"error waiting for container: context canceled\\\" \\n\")")

Main problem is very likely to be a centos based kernel bug. When docker-runc tries to write memory limit in /sys/fs/cgroup/memory/docker//memory.limit_in_bytes it returns a "device or resource busy" error. There seems to be conflict with kernel process. Related link reporting a similar error - https://unix.stackexchange.com/questions/412040/cgroups-memory-limit-write-error-device-or-resource-busy

- How I did it

- How to verify it

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

cc @tonistiigi

@AntaresS
Copy link
Copy Markdown
Contributor Author

AntaresS commented Jun 1, 2018

cc @thaJeztah @tiborvass

@AntaresS AntaresS force-pushed the fix-test-failure branch from d37fc44 to 1824e3d Compare June 1, 2018 23:25
Copy link
Copy Markdown
Contributor

@tiborvass tiborvass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we know why?

/cc @kolyshkin

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason might be that the current usage is above the value we're trying to use as a limit, and as kernel can't guarantee this limit to be enforced, it replies with EBUSY. I have yet to check that though, and don't have an idea why it works with other kernels.

Copy link
Copy Markdown
Contributor

@kolyshkin kolyshkin Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to repro this on my laptop (Ubuntu 18.10, vanilla kernel 4.16.12) so it's not specific to RHEL:

kir@kd:~/git/rpm$ time docker run -m 4MB busybox sh -c 'x=aaaa; while true; do x=$x$x$x$x; done'; echo $?
docker: Error response from daemon: OCI runtime create failed: container_linux.go:296: starting container process caused "process_linux.go:398: container init caused "process_linux.go:365: setting cgroup config for procHooks process caused \"failed to write 4194304 to memory.limit_in_bytes: write /sys/fs/cgroup/memory/docker/c766d5121094498b1d88a37ebddf67aa7379b177cd9eb38efa9bcc4aa75aa842/memory.limit_in_bytes: device or resource busy\""": unknown.
ERRO[0000] error waiting for container: context canceled

real 0m0.732s
user 0m0.011s
sys 0m0.003s
125

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I was correct earlier; kernel code tries to set the limit, if the limit can't be set because current usage is more than the new limit, or because kernel can't free any pages it returns EBUSY (meaning the limit is too low). Code is mm/memcontrol.c, function mem_cgroup_resize_limit.

So
(1) this is not kernel version specific (although recent kernels appear to try harder to release memory to set the limit)
(2) this is correct kernel behavior, not a bug
(3) it seems that docker needs more than 4MB to start a container.

I'd suggest to

  1. remove the comment about rhel 7.4 et al kernels replacing it with something like

it appears that 8MB of memory is the minimum needed to reliably start a container

  1. raise the limit to 8MB

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I was correct earlier; kernel code tries to set the limit, if the limit can't be set because current usage is more than the new limit, or because kernel can't free any pages it returns EBUSY (meaning the limit is too low). Code is mm/memcontrol.c, function mem_cgroup_resize_limit.

So
(1) this is not kernel version specific (although recent kernels appear to try harder to release memory to set the limit)
(2) this is correct kernel behavior, not a bug
(3) it seems that docker needs more than 4MB to start a container.

I'd suggest to

  1. remove the comment about rhel 7.4 et al kernels replacing it with something like

it appears that 8MB of memory is the minimum needed to reliably start a container

  1. raise the limit to 8MB

@tophj-ibm
Copy link
Copy Markdown
Contributor

Also: this test, as well as TestEventsOOMDisableFalse / True, are currently being skipped by the CI due to the CI machines not having cgroup / swap enabled.

IIRC when we enabled it first ppc64le, there was an issue that might be a ppc64le specific issue, and we never got around to the other architectures. It's probably worth testing enabling this on janky so the tests can check these failures in the future.

@thaJeztah
Copy link
Copy Markdown
Member

IIRC when we enabled it first ppc64le, there was an issue that might be a ppc64le specific issue

Ah, yes, didn't that relate to "swap" being enabled on some machines (therefore not resulting in OOM, but instead memory being swapped to disk)?

@kolyshkin
Copy link
Copy Markdown
Contributor

So, there's memory limit and memory+swap limit, by default if you set memory limit to say 8M the mem+swap limit is set to 16M (I believe by dockerd, not the kernel). Even if there's no swap available, OOM killer can be initiated. That's just from the top of my head, maybe I'm not aware of some nuances.

Signed-off-by: Anda Xu <[email protected]>
@AntaresS AntaresS force-pushed the fix-test-failure branch from 1824e3d to 1d9973c Compare June 4, 2018 21:29
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 4, 2018

Codecov Report

Merging #37194 into master will decrease coverage by 0.03%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #37194      +/-   ##
==========================================
- Coverage   34.62%   34.59%   -0.04%     
==========================================
  Files         605      605              
  Lines       44765    44765              
==========================================
- Hits        15499    15485      -14     
- Misses      27166    27178      +12     
- Partials     2100     2102       +2

@AntaresS
Copy link
Copy Markdown
Contributor Author

AntaresS commented Jun 4, 2018

I'd suggest to
remove the comment about rhel 7.4 et al kernels replacing it with something like
it appears that 8MB of memory is the minimum needed to reliably start a container raise the limit to 8MB

👍

Fixed.

Copy link
Copy Markdown
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thaJeztah thaJeztah merged commit ca25df3 into moby:master Jun 5, 2018
tophj-ibm added a commit to tophj-ibm/moby that referenced this pull request Jun 6, 2018
These tests were skipped because of an issue with the test, which was
fixed by moby#37194

Signed-off-by: Christopher Jones <[email protected]>
@thaJeztah thaJeztah changed the title fix a failed test fix a failed test TestRunOOMExitCode Jul 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants