Balanced reading from JBOD by amosbird · Pull Request #16423 · ClickHouse/ClickHouse

amosbird · 2020-10-27T09:55:56Z

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Better read task scheduling for JBOD architecture and MergeTree storage. New setting read_backoff_min_concurrency which serves as the lower limit to the number of reading threads.

Detailed description / Documentation draft:

Disk-aware read scheduling is useful to avoid tail latency issues when dealing with huge data on JBOD array. I've observed a lot of read clustering issues, that is, we concurrently read from one disk for 20 seconds, and then switch to another one for the next 20 seconds.

I've tested it in some production environment with 12 disks JBOD array setup, and the results are very promising.

The baseline takes 573.039 sec, with JBOD task split, it reaches 429.389 sec, with random read task stealing, it gets 185.612 sec.

It works well with current read backoff mechanism.

update

Random stealing incurs reader reinit cost. Now we use a different scheme. First we try if any backoff threads can be resurrected. If no, we steal the next one. Thanks to the pre-balanced workloads, it should have a pretty good uniform distribution in general.

With this steal strategy, the runtime varies from 105 ~ 135 secs.

amosbird · 2020-10-29T01:51:57Z

src/Storages/MergeTree/MergeTreeReadPool.cpp

        }
+
+        /// Before processing next thread, change volume if possible.
+        /// Different threads will likely start reading from different volumes,


It's actually different disks

amosbird · 2020-10-29T01:52:27Z

src/Storages/MergeTree/MergeTreeReadPool.cpp

+
+    {
+        /// Group parts by volume name.
+        /// We try minimize the number of threads concurrently read from the same volume.


It's disk instead of volume.

Fix comment

KochetovNicolai · 2020-10-29T16:22:39Z

Yandex third-party checks is broken in master
Integration tests (asan) is a ci issue

robot-clickhouse added the pr-improvement Pull request with some product improvements label Oct 27, 2020

KochetovNicolai self-assigned this Oct 27, 2020

amosbird force-pushed the jbodread branch 4 times, most recently from 52259fe to 55673c1 Compare October 28, 2020 20:03

Balanced reading from JBOD

f995ef9

amosbird force-pushed the jbodread branch from 55673c1 to f995ef9 Compare October 28, 2020 20:05

Refactor code a little bit. Add comment.

10bad32

amosbird commented Oct 29, 2020

View reviewed changes

Update MergeTreeReadPool.cpp

671d2b7

Fix comment

KochetovNicolai merged commit 1c10669 into ClickHouse:master Oct 29, 2020

This was referenced Dec 18, 2020

20.12.3.3 Less amount of data is returned if "read backoff" is in effect. #18137

Closed

Fix 18137 #18216

Merged

amosbird mentioned this pull request Aug 30, 2021

Speed up part loading for JBOD #28363

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Balanced reading from JBOD#16423

Balanced reading from JBOD#16423
KochetovNicolai merged 3 commits intoClickHouse:masterfrom
amosbird:jbodread

amosbird commented Oct 27, 2020 •

edited by KochetovNicolai

Loading

Uh oh!

amosbird Oct 29, 2020

Uh oh!

amosbird Oct 29, 2020

Uh oh!

KochetovNicolai commented Oct 29, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

amosbird commented Oct 27, 2020 • edited by KochetovNicolai Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

update

Uh oh!

amosbird Oct 29, 2020

Choose a reason for hiding this comment

Uh oh!

amosbird Oct 29, 2020

Choose a reason for hiding this comment

Uh oh!

KochetovNicolai commented Oct 29, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

amosbird commented Oct 27, 2020 •

edited by KochetovNicolai

Loading