Skip to content

failed to adjust OOM score for shim: invalid argument error -- for docker:dind in Kubernetes #4837

@skaegi

Description

@skaegi

Description
We're running docker:dind in a Kubernetes pod for CI. With the Docker 20 version of dind (which now uses the v2 shim) this is now broken and errors with a message similar to...

docker: Error response from daemon: io.containerd.runc.v2: failed to adjust OOM score for shim: set shim OOM score: write /proc/211/oom_score_adj: invalid argument

The valid range for oom_score_adj is between -1000 and 1000. By default Kubernetes uses 1000 for BestEffort. What's happening is the logic in https://github.com/containerd/containerd/blob/master/runtime/v2/shim/util_unix.go#L62 in this set-up sets the value to 1001 resulting in the invalid argument. (NOTE: This is not reproducible with Docker Desktop where it uses -500 for BestEffort!)

When the oom_score_adj is already set to 1000 / best effort it does not make sense to add 1. We should consider having a check for that case in AdjustOOMScore


Steps to reproduce the issue:

  1. kubectl run mydind --privileged --image docker:dind --- or specifically docker:20.10.0-dind
  2. kubectl exec mydind -- docker run hello-world
  3. kubectl exec mydind -- sh -c 'echo 1001 > /proc/1/oom_score_adj'
  4. kubectl delete pod mydind

Describe the results you received:
Container fails with ...

$ kubectl exec mydind -- docker run hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
0e03bdcc26d7: Pulling fs layer
0e03bdcc26d7: Verifying Checksum
0e03bdcc26d7: Download complete
0e03bdcc26d7: Pull complete
Digest: sha256:1a523af650137b8accdaed439c17d684df61ee4d74feac151b5b337bd29e7eec
Status: Downloaded newer image for hello-world:latest
docker: Error response from daemon: io.containerd.runc.v2: failed to adjust OOM score for shim: set shim OOM score: write /proc/211/oom_score_adj: invalid argument
: exit status 1: unknown.
time="2020-12-13T17:17:00Z" level=error msg="error waiting for container: context canceled"
command terminated with exit code 125

Output of containerd --version:

containerd github.com/containerd/containerd v1.4.3 269548fa27e0089a8b8278fc4fc781d7f65a939b

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions