Skip to content

Optimize loading performance for cri recover#6680

Merged
estesp merged 1 commit intocontainerd:mainfrom
linxiulei:fast_load
Apr 11, 2022
Merged

Optimize loading performance for cri recover#6680
estesp merged 1 commit intocontainerd:mainfrom
linxiulei:fast_load

Conversation

@linxiulei
Copy link
Copy Markdown
Contributor

@linxiulei linxiulei commented Mar 14, 2022

This PR tries to decrease time duration to recover when cri service is restarted. Time needed for cri service to load db/sandboxes/containers/images is significantly high when under IO pressure.

Time to complete recover():

  • Without competing IOs + without opt: 21s
  • Without competing IOs + with opt: 14s
  • Competing IOs + without opt: 3m44s
  • Competing IOs + with opt: 33s

Test environment:

  • 250 Containers & 150 Images
  • BFQ scheduler
  • Kernel version: 5.15.15-amd64
  • Disk perf (limited by io.max): 120 riops + 120 wiops
  • Competing IO: fio --filename=file --size=100MB --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=256 --runtime=3600 --numjobs=4 --time_based --group_reporting --name=iops-test-job --eta-newline=1

@k8s-ci-robot
Copy link
Copy Markdown

Hi @linxiulei. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@linxiulei
Copy link
Copy Markdown
Contributor Author

cc: @Random-Liu

@theopenlab-ci

This comment was marked as outdated.

Comment thread services/server/server.go Outdated
@theopenlab-ci

This comment was marked as outdated.

@estesp
Copy link
Copy Markdown
Member

estesp commented Mar 16, 2022

Looks like the restart test doesn't like your changes to the reload/recover flow?

@linxiulei
Copy link
Copy Markdown
Contributor Author

Silly me. The errgroup is supposed to not be reused after Wait()

@theopenlab-ci

This comment was marked as outdated.

@linxiulei
Copy link
Copy Markdown
Contributor Author

[X] CI / Windows Integration (windows-2022) (pull_request) seems flaky and described in #6652

@theopenlab-ci

This comment was marked as outdated.

@theopenlab-ci

This comment was marked as outdated.

@theopenlab-ci

This comment was marked as outdated.

@theopenlab-ci

This comment was marked as outdated.

Comment thread services/server/server_linux.go Outdated
Copy link
Copy Markdown
Member

@fuweid fuweid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebase might fix the CI issue.

@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Apr 6, 2022

Build succeeded.

Comment thread services/server/server_linux.go Outdated
@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Apr 7, 2022

Build succeeded.

Parallelizing them decreases loading duration.

Time to complete recover():
* Without competing IOs + without opt: 21s
* Without competing IOs + with opt: 14s
* Competing IOs + without opt: 3m44s
* Competing IOs + with opt: 33s

Signed-off-by: Eric Lin <[email protected]>
@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Apr 9, 2022

Build succeeded.

Copy link
Copy Markdown
Member

@estesp estesp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@estesp estesp merged commit 40a16a0 into containerd:main Apr 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants