cri: Reduce the cpu usage of the function redirectLogs in cri by payall4u · Pull Request #5286 · containerd/containerd

payall4u · 2021-03-30T12:32:03Z

I try to reduce the cpu usage of the function redirectLogs of cri.

Signed-off-by: Zhiyu Li [email protected]

k8s-ci-robot · 2021-03-30T12:32:12Z

Hi @payall4u. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

theopenlab-ci · 2021-03-30T12:43:58Z

Build succeeded.

containerd-build-arm64 : FAILURE in 6m 43s (non-voting)

theopenlab-ci · 2021-03-30T13:15:09Z

Build succeeded.

containerd-build-arm64 : FAILURE in 5m 43s (non-voting)

kzys · 2021-03-31T20:25:21Z

How about using bytes.Buffer? Would it be slower than fastJoin()?

Sure, bytes.Buffer is better. I've changed it.

kzys · 2021-03-31T20:25:46Z

How did you pick 100?

theopenlab-ci · 2021-04-01T04:00:55Z

Build succeeded.

containerd-build-arm64 : SUCCESS in 5m 42s (non-voting)

kzys · 2021-04-01T16:13:58Z

The name fastBuffer is a bit odd. It is faster then the previous implementation, but once we replace the previous implementation with this one, we cannot compare...

That being said, I don't have a better suggestion. Both buf and lineBuffer are taken.

lineBuffer better. Sorry, I'm too cautious and made me want to find a meaningful name.

theopenlab-ci · 2021-04-01T16:52:19Z

Build succeeded.

containerd-build-arm64 : SUCCESS in 6m 09s (non-voting)

kzys

LGTM!

estesp · 2021-04-01T18:11:42Z

/ok-to-test

payall4u · 2021-04-02T06:13:04Z

/test

k8s-ci-robot · 2021-04-02T06:13:18Z

@payall4u: The /test command needs one or more targets.
The following commands are available to trigger jobs:

/test pull-containerd-build
/test pull-containerd-node-e2e

Use /test all to run all jobs.

Details

In response to this:

/test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

theopenlab-ci · 2021-04-02T07:26:49Z

Build succeeded.

containerd-build-arm64 : SUCCESS in 5m 52s (non-voting)

theopenlab-ci · 2021-04-05T03:49:15Z

Build succeeded.

containerd-build-arm64 : SUCCESS in 5m 49s (non-voting)

theopenlab-ci · 2021-04-08T02:49:46Z

Build succeeded.

containerd-build-arm64 : SUCCESS in 6m 19s (non-voting)

payall4u · 2021-04-08T06:09:53Z

ping @fuweid @mxpv

fuweid · 2021-04-09T03:35:28Z

@payall4u the CI issue has been fixed. would you rebase the master? sorry for any inconvenient.

Signed-off-by: Zhiyu Li <[email protected]>

theopenlab-ci · 2021-04-09T03:53:51Z

Build succeeded.

containerd-build-arm64 : SUCCESS in 6m 15s (non-voting)

estesp

LGTM

kzys

LGTM and CI is green.

@estesp or @fuweid Can you merge this into master?

mikebrow

age old issue.. should I optimize for cpu or memory. There should be a switch to choose IMO.

I may be reading your graph wrong, but this looks to be a pretty big hit to the memory footprint / pod? Do I read that right?

payall4u · 2021-04-15T01:23:39Z

age old issue.. should I optimize for cpu or memory. There should be a switch to choose IMO.

I may be reading your graph wrong, but this looks to be a pretty big hit to the memory footprint / pod? Do I read that right?

I'm sorry. I should add more information on the graph. The flame graph is for cpu usage. It's used to illustrate where the problem is.

estesp

LGTM

mikebrow · 2021-04-15T15:33:59Z

age old issue.. should I optimize for cpu or memory. There should be a switch to choose IMO.
I may be reading your graph wrong, but this looks to be a pretty big hit to the memory footprint / pod? Do I read that right?

I'm sorry. I should add more information on the graph. The flame graph is for cpu usage. It's used to illustrate where the problem is.

Yes, I was reading the part in your usage that showed after this fix virtual and resident memory footprint was (may be) up by a very large margin.. Some cloud products want to optimize for hosting a large number of small pods/containers.

before

after

IOW I was asking what is the memory footprint of this change.

payall4u · 2021-04-16T02:22:30Z

age old issue.. should I optimize for cpu or memory. There should be a switch to choose IMO.
I may be reading your graph wrong, but this looks to be a pretty big hit to the memory footprint / pod? Do I read that right?

I'm sorry. I should add more information on the graph. The flame graph is for cpu usage. It's used to illustrate where the problem is.

Yes, I was reading the part in your usage that showed after this fix virtual and resident memory footprint was (may be) up by a very large margin.. Some cloud products want to optimize for hosting a large number of small pods/containers.

before

after

IOW I was asking what is the memory footprint of this change.

You are right. CRI had to keep a buffer for every container, which length equal to config.MaxContainerLogLineSize + 46. Sure, maybe it's better to add a switch (or some things else) to choose saving cpu or memory.

I was reading the part in your usage that showed after this fix virtual and resident memory footprint was (may be) up by a very large margin

I will find out what happened here soon.

mikebrow · 2021-04-16T18:15:06Z

age old issue.. should I optimize for cpu or memory. There should be a switch to choose IMO.
I may be reading your graph wrong, but this looks to be a pretty big hit to the memory footprint / pod? Do I read that right?

I'm sorry. I should add more information on the graph. The flame graph is for cpu usage. It's used to illustrate where the problem is.

Yes, I was reading the part in your usage that showed after this fix virtual and resident memory footprint was (may be) up by a very large margin.. Some cloud products want to optimize for hosting a large number of small pods/containers.
before

after

IOW I was asking what is the memory footprint of this change.

You are right. CRI had to keep a buffer for every container, which length equal to config.MaxContainerLogLineSize + 46. Sure, maybe it's better to add a switch (or some things else) to choose saving cpu or memory.

I was reading the part in your usage that showed after this fix virtual and resident memory footprint was (may be) up by a very large margin

I will find out what happened here soon.

Thanks! Performance tuning options across platforms like ours have varied greatly over the decades. JVMs, for example, have some interesting patterns to consider. If we wanted to vary tuning by pod/container scope, I think we would need to work something out with the scheduler guys in k8s, maybe annotation hints or new fields in CRI. Keeping to containerd scope, we could add something to the config.toml and/or add params.

Cheers, Mike

payall4u · 2021-05-24T07:06:02Z

@mikebrow Hi, Mike. Sorry for the late reply. This should be caused by the golang version. The top results with VIRT are obtained from containerd 1.3.4 (compiled by go1.13) and containerd 1.4 (compiled by go1.15) . The binary will alloc a lot of memory from golang1.14. We can learn from strace and pmap:

$ strace -f -e memory ./containerd-master
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f04b1a92000
mmap(NULL, 131072, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f04b1a72000
mmap(NULL, 1048576, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f04b1972000
mmap(NULL, 8388608, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f04b1172000
mmap(NULL, 67108864, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f04ad172000
mmap(NULL, 536870912, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f048d172000
# ...

$ pmap -p $(pidof containerd) -x
# ...
00007f048ae41000   36036      36      36 rw---   [ anon ]
00007f048ae41000       0       0       0 rw---   [ anon ]
00007f048d172000  263680       0       0 -----   [ anon ]
00007f048d172000       0       0       0 -----   [ anon ]
00007f049d2f2000       4       4       4 rw---   [ anon ]
00007f049d2f2000       0       0       0 rw---   [ anon ]
00007f049d2f3000  293564       0       0 -----   [ anon ]
00007f049d2f3000       0       0       0 -----   [ anon ]
# ...

If we play on containerd 1.3.4, it won't call mmap for the large memory.
It cause by runtime/mpagealloc#L299

BTW we can add a param with a default value to limit the cri log buffer length.
And if the log of container has a long line, containerd cri may oom.
Should I open a issue for this soon?

k8s-ci-robot added the needs-ok-to-test label Mar 30, 2021

payall4u mentioned this pull request Mar 30, 2021

Optimize cri redirect logs to reduce the CPU usage. #5285

Closed

5 tasks

payall4u force-pushed the optimize-cri-redirect-logs branch from 53611df to fea6c39 Compare March 30, 2021 12:35

payall4u changed the title ~~optimize cri redirect logs~~ Reduce the cpu usage of the function redirectLogs in cri. Mar 30, 2021

payall4u changed the title ~~Reduce the cpu usage of the function redirectLogs in cri.~~ Reduce the cpu usage of the function redirectLogs in cri Mar 30, 2021

payall4u changed the title ~~Reduce the cpu usage of the function redirectLogs in cri~~ cri: Reduce the cpu usage of the function redirectLogs in cri Mar 30, 2021

payall4u force-pushed the optimize-cri-redirect-logs branch from fea6c39 to afd9156 Compare March 30, 2021 13:07

kzys reviewed Mar 31, 2021

View reviewed changes

Comment thread pkg/cri/io/logger.go Outdated

Copy link
Copy Markdown

Member

kzys Mar 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you pick 100?

payall4u force-pushed the optimize-cri-redirect-logs branch from afd9156 to 3e5f608 Compare April 1, 2021 03:53

kzys approved these changes Apr 1, 2021

View reviewed changes

payall4u force-pushed the optimize-cri-redirect-logs branch from 3e5f608 to 92482fa Compare April 1, 2021 16:44

kzys approved these changes Apr 1, 2021

View reviewed changes

k8s-ci-robot added ok-to-test and removed needs-ok-to-test labels Apr 1, 2021

payall4u force-pushed the optimize-cri-redirect-logs branch from 92482fa to faa8688 Compare April 2, 2021 07:19

payall4u force-pushed the optimize-cri-redirect-logs branch from faa8688 to 9fab104 Compare April 5, 2021 03:42

payall4u force-pushed the optimize-cri-redirect-logs branch from 9fab104 to 4377d43 Compare April 8, 2021 02:42

optimize cri redirect logs

4bc8f69

Signed-off-by: Zhiyu Li <[email protected]>

payall4u force-pushed the optimize-cri-redirect-logs branch from 4377d43 to 4bc8f69 Compare April 9, 2021 03:46

estesp approved these changes Apr 12, 2021

View reviewed changes

kzys approved these changes Apr 14, 2021

View reviewed changes

mikebrow reviewed Apr 15, 2021

View reviewed changes

estesp approved these changes Apr 15, 2021

View reviewed changes

estesp merged commit 4f18131 into containerd:master Apr 15, 2021

payall4u deleted the optimize-cri-redirect-logs branch April 15, 2021 01:35

Conversation

payall4u commented Mar 30, 2021

Uh oh!

k8s-ci-robot commented Mar 30, 2021

Uh oh!

theopenlab-ci Bot commented Mar 30, 2021

Uh oh!

theopenlab-ci Bot commented Mar 30, 2021

Uh oh!

kzys Mar 31, 2021

Choose a reason for hiding this comment

Uh oh!

payall4u Apr 1, 2021

Choose a reason for hiding this comment

Uh oh!

kzys Mar 31, 2021

Choose a reason for hiding this comment

Uh oh!

theopenlab-ci Bot commented Apr 1, 2021

Uh oh!

kzys Apr 1, 2021

Choose a reason for hiding this comment

Uh oh!

payall4u Apr 1, 2021

Choose a reason for hiding this comment

Uh oh!

theopenlab-ci Bot commented Apr 1, 2021

Uh oh!

kzys left a comment

Choose a reason for hiding this comment

Uh oh!

estesp commented Apr 1, 2021

Uh oh!

payall4u commented Apr 2, 2021

Uh oh!

k8s-ci-robot commented Apr 2, 2021

Uh oh!

theopenlab-ci Bot commented Apr 2, 2021

Uh oh!

theopenlab-ci Bot commented Apr 5, 2021

Uh oh!

theopenlab-ci Bot commented Apr 8, 2021

Uh oh!

payall4u commented Apr 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fuweid commented Apr 9, 2021

Uh oh!

theopenlab-ci Bot commented Apr 9, 2021

Uh oh!

estesp left a comment

Choose a reason for hiding this comment

Uh oh!

kzys left a comment

Choose a reason for hiding this comment

Uh oh!

mikebrow left a comment

Choose a reason for hiding this comment

Uh oh!

payall4u commented Apr 15, 2021

Uh oh!

estesp left a comment

Choose a reason for hiding this comment

Uh oh!

mikebrow commented Apr 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

payall4u commented Apr 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mikebrow commented Apr 16, 2021

Uh oh!

payall4u commented May 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

payall4u commented Apr 8, 2021 •

edited

Loading

mikebrow commented Apr 15, 2021 •

edited

Loading

payall4u commented Apr 16, 2021 •

edited

Loading

payall4u commented May 24, 2021 •

edited

Loading