Skip to content

[release/1.0] cmd/containerd-shim: aggressive memory reclamation#2058

Merged
crosbymichael merged 1 commit intocontainerd:release/1.0from
stevvooe:cherry-pick-#2055
Jan 25, 2018
Merged

[release/1.0] cmd/containerd-shim: aggressive memory reclamation#2058
crosbymichael merged 1 commit intocontainerd:release/1.0from
stevvooe:cherry-pick-#2055

Conversation

@stevvooe
Copy link
Copy Markdown
Member

To avoid having the shim hold on to too much memory, we've made a few
adjustments to favor more aggressive reclamation of memory from the
operating system. Typically, this would be negligible, on the order of a
few megabytes, but this is impactful when running several containers.

The first fix is to lower the threshold used to determine when to run
the garbage collector. The second runs runtime/debug.FreeOSMemory at a
regular interval.

Under test, this result in a sustained memory usage of around 3.7 MB.

Signed-off-by: Stephen J Day [email protected]
(cherry picked from commit 0e8f084)
Signed-off-by: Stephen J Day [email protected]

To avoid having the shim hold on to too much memory, we've made a few
adjustments to favor more aggressive reclamation of memory from the
operating system. Typically, this would be negligible, on the order of a
few megabytes, but this is impactful when running several containers.

The first fix is to lower the threshold used to determine when to run
the garbage collector. The second runs `runtime/debug.FreeOSMemory` at a
regular interval.

Under test, this result in a sustained memory usage of around 3.7 MB.

Signed-off-by: Stephen J Day <[email protected]>
(cherry picked from commit 0e8f084)
Signed-off-by: Stephen J Day <[email protected]>
Copy link
Copy Markdown
Contributor

@mlaventure mlaventure left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@codecov-io
Copy link
Copy Markdown

Codecov Report

Merging #2058 into release/1.0 will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff              @@
##           release/1.0    #2058   +/-   ##
============================================
  Coverage        50.57%   50.57%           
============================================
  Files               81       81           
  Lines             7163     7163           
============================================
  Hits              3623     3623           
  Misses            2847     2847           
  Partials           693      693
Flag Coverage Δ
#linux 50.57% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0f46dd5...f042dc5. Read the comment docs.

@Random-Liu
Copy link
Copy Markdown
Member

Random-Liu commented Jan 25, 2018

@stevvooe Thanks for the fix.
@yanxuean @miaoyq we should pay attention to containerd-shim cpu usage after this change. The aggressive garbage collection may introduce higher cpu usage.

@yanxuean
Copy link
Copy Markdown
Member

Ok, we will test this.

@yanxuean
Copy link
Copy Markdown
Member

yanxuean commented Jan 25, 2018

@stevvooe @Random-Liu @crosbymichael
The pr do reduce memory usage, from 8M to 4.6M.
It also do introduce higher cpu usage. from 0.1 to 0.3.

the records:
1.before merging #2058
node-e2e-test print

Jan 23 15:03:51.222: INFO: Still running...58.656282885s left
Jan 23 15:04:49.887: INFO: 36 pods are running on node ubuntu
Jan 23 15:04:49.888: INFO: Resource usage:
container cpu(cores) memory_working_set(MB) memory_rss(MB)
"kubelet" 0.045      63.37                  46.21
"runtime" 0.836      953.63                 483.70

the ps output:(It is 0.3% with top command)

cloud@ubuntu:containerd$ ps --cols 120 -eo "pid,pcpu,vsz,rss,args" | grep contain
14971  7.2 2278496 97360 containerd
14982  4.6 1293556 54836 cri-containerd
16795  0.0  10484  2072 grep --color=auto contain
18647  0.1  12228  8604 containerd-shim -namespace k8s.io -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/k8
18708  0.1  12228  8044 containerd-shim -namespace k8s.io -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/k8
19180  0.1  12484  8220 containerd-shim -namespace k8s.io -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/k8
19297  0.1  12228  8172 containerd-shim -namespace k8s.io -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/k8
19807  0.1  12292  8300 containerd-shim -namespace k8s.io -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/k8
19991  0.1  12228  8544 containerd-shim -namespace k8s.io -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/k8

the top output:

top - 16:04:45 up 7 days,  3:36,  9 users,  load average: 2.01, 1.48, 0.80
Tasks: 309 total,   1 running, 308 sleeping,   0 stopped,   0 zombie
%Cpu(s): 14.6 us, 14.5 sy,  0.1 ni, 70.6 id,  0.0 wa,  0.0 hi,  0.1 si,  0.0 st
KiB Mem:   8175960 total,  8108792 used,    67168 free,   233320 buffers
KiB Swap:        0 total,        0 used,        0 free.  5785412 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
17598 root      20   0 2278424  90188  17768 S   4.0  1.1   0:14.36 containerd
21179 root      20   0  413696 105264  51168 S   2.6  1.3   0:11.28 kubelet
21312 root      20   0  376336  64840  14044 S   2.6  0.8   0:09.94 cadvisor
17606 root      20   0 1200796  46388  20728 S   1.7  0.6   0:07.13 cri-containerd
21227 root      20   0 10.660g 284088  67176 S   1.7  3.5   0:14.77 e2e_node.test
    9 root      20   0       0      0      0 S   0.3  0.0   1:24.64 rcuos/0
21158 root      20   0  621892  88668  59348 S   0.3  1.1   0:01.46 e2e_node.test
21297 root      20   0   12228   8288   3712 S   0.3  0.1   0:00.35 containerd-shim
22883 root      20   0   12228   8228   3712 S   0.3  0.1   0:00.33 containerd-shim
23148 root      20   0   12228   8288   3712 S   0.3  0.1   0:00.33 containerd-shim
23288 root      20   0   12228   8576   3712 S   0.3  0.1   0:00.34 containerd-shim
23389 root      20   0   12228   7936   3328 S   0.3  0.1   0:00.34 containerd-shim
23462 root      20   0   12228   8380   3712 S   0.3  0.1   0:00.34 containerd-shim
23479 root      20   0   12228   8116   3392 S   0.3  0.1   0:00.34 containerd-shim
23629 root      20   0   12228   8292   3712 S   0.3  0.1   0:00.33 containerd-shim
23653 root      20   0   12228   8572   3712 S   0.3  0.1   0:00.33 containerd-shim
23958 root      20   0   12228   8152   3520 S   0.3  0.1   0:00.33 containerd-shim
24029 root      20   0   12228   8148   3776 S   0.3  0.1   0:00.33 containerd-shim
24170 root      20   0   12228   8628   3776 S   0.3  0.1   0:00.34 containerd-shim

  1. after merging [release/1.0] cmd/containerd-shim: aggressive memory reclamation #2058
    node-e2e-test print
Jan 25 15:14:12.645: INFO: Still running...58.006477252s left
Jan 25 15:15:10.661: INFO: 36 pods are running on node ubuntu
Jan 25 15:15:10.661: INFO: Resource usage:
container cpu(cores) memory_working_set(MB) memory_rss(MB)
"kubelet" 0.043      56.92                  45.30
"runtime" 0.977      229.07                 192.64

the ps output:

root@ubuntu:containerd# ps --cols 80 -eo "pid,pcpu,vsz,rss,args" | grep contain
16546  0.0  10484  1984 grep --color=auto contain
20101  3.3 2286620 81440 containerd
20109  1.6 1102508 41860 cri-containerd
23746  0.3  10116  4524 containerd-shim -namespace k8s.io -workdir /var/lib/cont
23807  0.3  10116  4828 containerd-shim -namespace k8s.io -workdir /var/lib/cont
24354  0.2 425784 21412 docker-containerd --config /var/run/docker/containerd/co
25210  0.3  10116  4652 containerd-shim -namespace k8s.io -workdir /var/lib/cont
25341  0.3  10116  4288 containerd-shim -namespace k8s.io -workdir /var/lib/cont
25343  0.3  10116  4556 containerd-shim -namespace k8s.io -workdir /var/lib/cont

the top output:

top - 15:06:09 up 7 days,  2:37,  9 users,  load average: 2.75, 1.40, 0.87
Tasks: 310 total,   2 running, 307 sleeping,   0 stopped,   1 zombie
%Cpu(s): 17.7 us, 18.8 sy,  0.0 ni, 63.2 id,  0.0 wa,  0.0 hi,  0.1 si,  0.1 st
KiB Mem:   8175960 total,  8009440 used,   166520 free,   317420 buffers
KiB Swap:        0 total,        0 used,        0 free.  5861992 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
20101 root      20   0 2278424  79692  18084 S   3.0  1.0   0:06.81 containerd
23685 root      20   0  489172 102244  51812 S   2.7  1.3   0:05.31 kubelet
23824 root      20   0  507884  61168  15272 S   2.3  0.7   0:03.93 cadvisor
20109 root      20   0 1102508  41860  19984 S   1.3  0.5   0:03.61 cri-containerd
23730 root      20   0 10.741g 299656  66736 S   1.0  3.7   0:11.82 e2e_node.test
28616 root      20   0   10116   4668   3736 S   1.0  0.1   0:00.38 containerd-shim
26169 root      20   0   10116   4576   3736 S   0.7  0.1   0:00.34 containerd-shim
26201 root      20   0   10116   4724   3800 S   0.7  0.1   0:00.37 containerd-shim
26319 root      20   0   10116   4352   3608 S   0.7  0.1   0:00.36 containerd-shim
26843 root      20   0   10116   4632   3736 S   0.7  0.1   0:00.32 containerd-shim
27328 root      20   0   10116   4372   3736 S   0.7  0.1   0:00.36 containerd-shim
27523 root      20   0   10116   4584   3736 S   0.7  0.1   0:00.33 containerd-shim
27729 root      20   0   10116   4868   3736 S   0.7  0.1   0:00.38 containerd-shim
28451 root      20   0   10116   4660   3736 S   0.7  0.1   0:00.36 containerd-shim
28997 root      20   0   10116   4628   3800 S   0.7  0.1   0:00.38 containerd-shim
30250 root      20   0   10116   4564   3736 S   0.7  0.1   0:00.35 containerd-shim

@crosbymichael
Copy link
Copy Markdown
Member

LGTM

@crosbymichael crosbymichael merged commit 3f98e5d into containerd:release/1.0 Jan 25, 2018
@Random-Liu
Copy link
Copy Markdown
Member

Random-Liu commented Jan 25, 2018

@yanxuean @miaoyq Let's do this:

  1. Get Avoid containerd access as much as possible. cri#571 merged, and get newest data. containerd/containerd-shim cpu/memory should become much better.
  2. Update containerd to this PR, and verify this change make memory usage even lower, or instead introduce much cpu overhead.

If 2) doesn't show any significant cpu usage difference, we are good. But if 2) shows that the cpu usage of containerd-shim becomes significant, we may need to come back and tune the value here.

@thaJeztah
Copy link
Copy Markdown
Member

Looks like this was in the 1.0.2 release could someone set the 1.0.2 milestone on this one? 🤗 🙏

@estesp estesp added this to the 1.0.2 milestone Mar 16, 2018
@estesp
Copy link
Copy Markdown
Member

estesp commented Mar 16, 2018

Seems fine, but is the milestone reliable to report everything that went into a particular release? I don’t necessarily think so as it seems like for many fixes we simply set the cherry pick label and merge without setting the milestone (especially early in a cycle where we haven’t decided what/when about the next point release).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants