Fix flaky OOM tests#20627
Conversation
If cgroup swap memory limit isn't enabled, then the -m flag doesn't work and the container that is created for both of these tests is very large. Because we are trying to run the containers out of memory, this takes a very long time and causes the tests to fail most of the time. Follow-up to moby#17913 Signed-off-by: Christopher Jones <[email protected]>
|
LGTM thank you very much @tophj-ibm |
|
Nice one @tophj-ibm! LGTM |
|
LGTM |
|
@tiborvass, @estesp, @calavera: Do we know this is true: "If cgroup swap memory limit isn't enabled, then the -m flag doesn't work.."? My understanding was that you don't have to have swap limits enabled to use the -m flag. (This test was passing before, and is now skipped for x86.) |
|
The original comment "If cgroup swap memory limit isn't enabled, then the -m flag doesn't work." is incorrect, though the fix itself is correct. Memory limiting, i.e. -m, works independently of swap limiting, though the resulting behavior is affected by swap limiting as the script running in the container will utilize swap space to avoid violating the container's memory limit. The issue with the OOM tests is that the OOM killer doesn't kick in until the script exceeds its allowed memory+swap. When the memsw cgroup is disabled, swap is limited only by the system swap size, which can be many GB. On systems with memsw and a large swap size, the shell script used by the OOM tests will timeout before exceeding the system swap size. That being said, the fix is correct because, when the memsw cgroup is enabled, --memory-swap defaults to double the value of -m/--memory, e.g. -m=10mb results in --memory-swap defaulting to 20mb. The tests are relying on --memory-swap being set to a low value. This subtle default behavior is currently not documented in the man pages, only the online docs for run mention the default behavior of --memory-swap. I'll tackle updating the man page for docker run as part of #17215. |
If cgroup swap memory limit isn't enabled, then
the -m flag doesn't work and the container that is created for
both of these tests is very large. Because we are trying to run the
containers out of memory, this takes a very long time and causes the
tests to fail most of the time.
Follow-up to #17913
Signed-off-by: Christopher Jones [email protected]