Skip to content

Per-query/per-server throttling for remote IO/local IO/BACKUPs#48242

Merged
robot-clickhouse merged 14 commits intoClickHouse:masterfrom
azat:better-throttling
Apr 5, 2023
Merged

Per-query/per-server throttling for remote IO/local IO/BACKUPs#48242
robot-clickhouse merged 14 commits intoClickHouse:masterfrom
azat:better-throttling

Conversation

@azat
Copy link
Copy Markdown
Member

@azat azat commented Mar 30, 2023

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Per-query/per-server throttling for remote IO/local IO/BACKUPs (server settings: max_remote_read_network_bandwidth_for_server, max_remote_write_network_bandwidth_for_server, max_local_read_bandwidth_for_server, max_local_write_bandwidth_for_server, max_backup_bandwidth_for_server, settings: max_remote_read_network_bandwidth, max_remote_write_network_bandwidth, max_local_read_bandwidth, max_local_write_bandwidth, max_backup_bandwidth)

Documentation entry for user-facing changes

Previously it was possible to throttle only network and only globally.

This PR adds ability to throttle not only globally, but on a per-query basis.

Also ability to throttle local IO had been added (this is required for throttling BACKUPs), and also, what was the main purpose of this PR - throttle BACKUPs.

BACKUPs can eat all IO (even with backup_threads=1), and so the queries will be affected, and this does not allow to do BACKUPs on a running server.

@robot-ch-test-poll robot-ch-test-poll added the pr-improvement Pull request with some product improvements label Mar 30, 2023
@azat azat force-pushed the better-throttling branch from df4f0af to 92bcdca Compare March 30, 2023 18:27
@serxa serxa self-assigned this Mar 30, 2023
@azat azat force-pushed the better-throttling branch from 2e0a2e5 to 672dbf7 Compare March 31, 2023 04:01
@azat
Copy link
Copy Markdown
Member Author

azat commented Mar 31, 2023

Stateless tests (asan) [3/4] — fail: 2, passed: 1255, skipped: 9

Stateless tests (release, wide parts enabled) — fail: 1, passed: 5182, skipped: 6

Stress test (ubsan) — Killed by signal (in clickhouse-server.log)

/build/src/Storages/StorageMergeTree.cpp:2088:80: runtime error: signed integer overflow: 9223372036854775807 + 1 cannot be represented in type 'long'
    #0 0x30c2ae7b in DB::StorageMergeTree::backupData(DB::BackupEntriesCollector&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::optional<absl::lts_20211102::InlinedVector<std::__1::shared_ptr<DB::IAST>, 7ul, std::__1::allocator<std::__1::shared_ptr<DB::IAST>>>> const&) build_docker/./src/Storages/StorageMergeTree.cpp:2088:80
    #1 0x2dd641b5 in DB::BackupEntriesCollector::makeBackupEntriesForTableData(DB::QualifiedTableName const&) build_docker/./src/Backups/BackupEntriesCollector.cpp:703:18
    #2 0x2dd5c2a7 in DB::BackupEntriesCollector::makeBackupEntriesForTablesData() build_docker/./src/Backups/BackupEntriesCollector.cpp:676:9
    #3 0x2dd5848d in DB::BackupEntriesCollector::run() build_docker/./src/Backups/BackupEntriesCollector.cpp:119:5
    #4 0x2dd84da3 in DB::BackupsWorker::doBackup(std::__1::shared_ptr<DB::ASTBackupQuery> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, DB::BackupInfo const&, DB::BackupSettings, std::__1::shared_ptr<DB::IBackupCoordination>, std::__1::shared_ptr<DB::Context const> const&, std::__1::shared_ptr<DB::Context>, bool) build_docker/./src/Backups/BackupsWorker.cpp:359:59
    #5 0x2dd82405 in DB::BackupsWorker::startMakingBackup(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::Context const> const&) build_docker/./src/Backups/BackupsWorker.cpp:248:13
    #6 0x2dd81d0a in DB::BackupsWorker::start(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::Context>) build_docker/./src/Backups/BackupsWorker.cpp:179:16
    #7 0x2f4d0d5f in DB::InterpreterBackupQuery::execute() build_docker/./src/Interpreters/InterpreterBackupQuery.cpp:39:30
    #8 0x2fa80e87 in DB::executeQueryImpl(char const*, char const*, std::__1::shared_ptr<DB::Context>, bool, DB::QueryProcessingStage::Enum, DB::ReadBuffer*) build_docker/./src/Interpreters/executeQuery.cpp:713:36
    #9 0x2fa7c64b in DB::executeQuery(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::shared_ptr<DB::Context>, bool, DB::QueryProcessingStage::Enum) build_docker/./src/Interpreters/executeQuery.cpp:1160:30
    #10 0x30e689af in DB::TCPHandler::runImpl() build_docker/./src/Server/TCPHandler.cpp:420:24
    #11 0x30e8c3f5 in DB::TCPHandler::run() build_docker/./src/Server/TCPHandler.cpp:2037:9
    #12 0x3231f15a in Poco::Net::TCPServerConnection::start() build_docker/./base/poco/Net/src/TCPServerConnection.cpp:43:3
    #13 0x32320135 in Poco::Net::TCPServerDispatcher::run() build_docker/./base/poco/Net/src/TCPServerDispatcher.cpp:115:20
    #14 0x3258c004 in Poco::PooledThread::run() build_docker/./base/poco/Foundation/src/ThreadPool.cpp:188:14
    #15 0x325877f4 in Poco::ThreadImpl::runnableEntry(void*) build_docker/./base/poco/Foundation/src/Thread_POSIX.cpp:335:27
    #16 0x7fdade5a5608 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x8608) (BuildId: 7b4536f41cdaa5888408e82d0836e33dcf436466)
    #17 0x7fdade4ca132 in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x11f132) (BuildId: 1878e6b475720c7c51969e69ab2d276fae6d1dee)

SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /build/src/Storages/StorageMergeTree.cpp:2088:80 in 

@azat azat force-pushed the better-throttling branch from 672dbf7 to 1c86ade Compare April 3, 2023 13:25
Copy link
Copy Markdown
Member

@serxa serxa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good. But I'm still a little bit worried about disabling native copy even for files and disks because I'm not closely familiar with this code. Let's ask for some help with this case from @vitlibar.

Vitaly, here is the place, where native copy is disabled for files with throttler enabled (also note, the same logic applies for Disk in this PR). Might it lead to a degradation of backup in some corner cases?

@azat azat force-pushed the better-throttling branch from 1c86ade to 4a315cd Compare April 3, 2023 16:39
@azat
Copy link
Copy Markdown
Member Author

azat commented Apr 4, 2023

@azat azat force-pushed the better-throttling branch 2 times, most recently from 4248aff to c8da660 Compare April 4, 2023 10:34
@vitlibar vitlibar self-assigned this Apr 4, 2023
@azat azat force-pushed the better-throttling branch from c8da660 to 6eb3d1c Compare April 4, 2023 14:34
azat added 7 commits April 5, 2023 09:39
Right now those two settings are applied only at start:
- max_remote_read_network_bandwidth_for_server
- max_remote_write_network_bandwidth_for_server

Signed-off-by: Azat Khuzhin <[email protected]>
This will be required later, for per-query bandwidth limiting.

Signed-off-by: Azat Khuzhin <[email protected]>
Controlled with:
- max_remote_read_network_bandwidth
- max_remote_write_network_bandwidth

Signed-off-by: Azat Khuzhin <[email protected]>
Server settings:
- max_local_read_bandwidth_for_server
- max_local_write_bandwidth_for_server

Query settings:
- max_local_read_bandwidth
- max_local_write_bandwidth

This is the preparation for adding ability to throttle BACKUPs

Signed-off-by: Azat Khuzhin <[email protected]>
Server settings:
- backup_read_bandwidth_for_server
- backup_write_bandwidth_for_server

Query settings:
- backup_read_bandwidth
- backup_write_bandwidth

Signed-off-by: Azat Khuzhin <[email protected]>
There is no need in separate read/write throttling, because you cannot
write faster then read anyway, and plus this makes the code less cleaner

(and also it will allow avoid implementing throttling backups to S3,
since it does not use common S3 writer).

Signed-off-by: Azat Khuzhin <[email protected]>
azat added 6 commits April 5, 2023 09:40
    $ yes | head | clickhouse-obfuscator --structure 'id String' --seed foo --input-format TSV --output-format TSV
    Poco::Exception. Code: 1000, e.code() = 0, Null pointer: _pInstance in file "/src/ch/clickhouse/base/poco/Util/include/Poco/Util/Application.h", line 446, Stack trace (when copying this message, always include the lines below):

    0. /src/ch/clickhouse/contrib/llvm-project/libcxx/include/exception:134: std::exception::capture() @ 0x15605622 in /src/ch/clickhouse/.cmake-debug/programs/clickhouse
    1. /src/ch/clickhouse/contrib/llvm-project/libcxx/include/exception:112: std::exception::exception[abi:v15000]() @ 0x156055ed in /src/ch/clickhouse/.cmake-debug/programs/clickhouse
    2. /src/ch/clickhouse/base/poco/Foundation/src/Exception.cpp:27: Poco::Exception::Exception(String const&, int) @ 0x2ebd2d80 in /src/ch/clickhouse/.cmake-debug/programs/clickhouse
    3. /src/ch/clickhouse/base/poco/Foundation/src/Exception.cpp:132: Poco::LogicException::LogicException(String const&, int) @ 0x2ebd3667 in /src/ch/clickhouse/.cmake-debug/programs/clickhouse
    4. /src/ch/clickhouse/base/poco/Foundation/src/Exception.cpp:134: Poco::NullPointerException::NullPointerException(String const&, int) @ 0x2ebd3da7 in /src/ch/clickhouse/.cmake-debug/programs/clickhouse
    5. /src/ch/clickhouse/base/poco/Foundation/src/Bugcheck.cpp:42: Poco::Bugcheck::nullPointer(char const*, char const*, int) @ 0x2ebc5851 in /src/ch/clickhouse/.cmake-debug/programs/clickhouse
    6. /src/ch/clickhouse/base/poco/Util/include/Poco/Util/Application.h:446: Poco::Util::Application::instance() @ 0x1ee3e6cd in /src/ch/clickhouse/.cmake-debug/programs/clickhouse
    7. /src/ch/clickhouse/src/Interpreters/Context.cpp:358: DB::ContextSharedPart::ContextSharedPart() @ 0x263b0cca in /src/ch/clickhouse/.cmake-debug/programs/clickhouse
    8. /src/ch/clickhouse/contrib/llvm-project/libcxx/include/__memory/unique_ptr.h:714: std::__unique_if<DB::ContextSharedPart>::__unique_single std::make_unique[abi:v15000]<DB::ContextSharedPart>() @ 0x26387b6e in /src/ch/clickhouse/.cmake-debug/programs/clickhouse
    9. /src/ch/clickhouse/src/Interpreters/Context.cpp:652: DB::Context::createShared() @ 0x26361a3c in /src/ch/clickhouse/.cmake-debug/programs/clickhouse
    10. /src/ch/clickhouse/programs/obfuscator/Obfuscator.cpp:1293: mainEntryClickHouseObfuscator(int, char**) @ 0x1ef6eb26 in /src/ch/clickhouse/.cmake-debug/programs/clickhouse
    11. /src/ch/clickhouse/programs/main.cpp:481: main @ 0x155e569a in /src/ch/clickhouse/.cmake-debug/programs/clickhouse
    12. ? @ 0x7ffff7dd2790 in ?
    13. __libc_start_main @ 0x7ffff7dd284a in ?
    14. _start @ 0x155e51ae in /src/ch/clickhouse/.cmake-debug/programs/clickhouse
     (version 23.3.1.2537)

Signed-off-by: Azat Khuzhin <[email protected]>
@azat azat force-pushed the better-throttling branch from 6eb3d1c to f0f58de Compare April 5, 2023 07:43
@robot-clickhouse robot-clickhouse merged commit 3d2c279 into ClickHouse:master Apr 5, 2023
@azat azat deleted the better-throttling branch April 5, 2023 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-improvement Pull request with some product improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants