Skip to content

[TESTING DO NOT MERGE] Enable OOM killer tests on ppc64le#37220

Closed
tophj-ibm wants to merge 2 commits intomoby:masterfrom
tophj-ibm:test-ppc64le-ci-oom-tests
Closed

[TESTING DO NOT MERGE] Enable OOM killer tests on ppc64le#37220
tophj-ibm wants to merge 2 commits intomoby:masterfrom
tophj-ibm:test-ppc64le-ci-oom-tests

Conversation

@tophj-ibm
Copy link
Copy Markdown
Contributor

These tests were skipped because of an issue with the test, which may have been
fixed by #37194

Signed-off-by: Christopher Jones [email protected]

These tests were skipped because of an issue with the test, which was
fixed by moby#37194

Signed-off-by: Christopher Jones <[email protected]>
Signed-off-by: Christopher Jones <[email protected]>
@tophj-ibm
Copy link
Copy Markdown
Contributor Author

note that PR was only for TestRunOOMExitCode, but that raises that it might have been a similar problem with -m

@tophj-ibm
Copy link
Copy Markdown
Contributor Author

still seeing failures, a seg fault, and something in runc?

16:01:10 ----------------------------------------------------------------------
16:01:10 FAIL: docker_cli_events_unix_test.go:81: DockerSuite.TestEventsOOMDisableTrue
16:01:10 
16:01:10 docker_cli_events_unix_test.go:118:
16:01:10     c.Fatalf("%v", errRun)
16:01:10 ... Error: wrong exit code for OOM container: expected 137, got 139 (output: "")
16:01:10 
16:01:10 
16:01:10 ----------------------------------------------------------------------



16:09:55 ----------------------------------------------------------------------
16:09:55 FAIL: docker_cli_run_unix_test.go:617: DockerSuite.TestRunOOMExitCode
16:09:55 
16:09:55 docker_cli_run_unix_test.go:631:
16:09:55     c.Assert(err, check.IsNil)
16:09:55 ... value *errors.errorString = &errors.errorString{s:"wrong exit code for OOM container: expected 137, got 125 (output: \"/usr/local/cli/docker: Error response from daemon: cannot start a stopped process: unknown.\\ntime=\\\"2018-06-06T21:09:55Z\\\" level=error msg=\\\"error waiting for container: context canceled\\\" \\n\")"} ("wrong exit code for OOM container: expected 137, got 125 (output: \"/usr/local/cli/docker: Error response from daemon: cannot start a stopped process: unknown.\\ntime=\\\"2018-06-06T21:09:55Z\\\" level=error msg=\\\"error waiting for container: context canceled\\\" \\n\")")
16:09:55 
16:09:56 
16:09:56 ----------------------------------------------------------------------

@thaJeztah
Copy link
Copy Markdown
Member

ping @kolyshkin

@kolyshkin
Copy link
Copy Markdown
Contributor

So, the test case expects a container that hit its memory limit to be killed by the kernel OOM killer (via SIGKILL, which results in exit code 138 (128 + signal number, KILL is 9).

Exit code of 139 means process was killed by SIGSEGV (128+signal number, SEGV is 11). I see it can easily happen in a situation when a memory allocation was denied, but the result was not checked, resulting in say null pointer dereference.

Now I think the test case should make sure the container is actually started and is operable, and only then create an OOM situation.

@arm64b
Copy link
Copy Markdown
Contributor

arm64b commented Jun 12, 2018

Can we test it manually on PPC platform like we did on PR #36201, it will be easy to narrow down the root cause on that platform?

@tophj-ibm
Copy link
Copy Markdown
Contributor Author

@arm64b yep, I'm now testing locally. I couldn't reproduce it on my own.

@tophj-ibm
Copy link
Copy Markdown
Contributor Author

closing because i don't need the ci to test anymore, will reopen when i figure out the problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants