-
Notifications
You must be signed in to change notification settings - Fork 8.3k
DB::Exception: Cannot schedule a task #6833
Description
Describe the bug or unexpected behaviour
While running a 3 node clickhouse cluster with replicated tables, we saw a large number of the included stack trace. It seems that we reached a resource/config limit, but it's not clear which one (not cpu, memory or disk). There did appear to be a large number of inter cluster node tcp connections, and a significant number of open files (~3 million per cluster node).
Mostly just looking to see if there is an explanation for runaway tcp connections, and what can be expected in terms of inter node connections/communications.
We did see a large number of waiting tcp connections to each node in the cluster (~20k).
How to reproduce
-
Which ClickHouse server version to use
19.13.2.19 -
Which interface to use, if matters
HTTP -
Non-default settings, if any
The uncompressed cache is enabled as the queries typically return a small result set (couple hundred rows). -
Queries to run that lead to unexpected result
select data from table where date = '2019-09-04'
NOTE: These queries target a distributed table.
Also the queried table contains about 220 million rows.
Expected behavior
A clear and concise description of what you expected to happen.
Error message and/or stacktrace
(version 19.13.2.19)
2019.09.05 12:24:19.973783 [ 3189 ] {45412c11-a1ac-4e24-a9e8-e2bff828d1c1} <Error> executeQuery: Code: 439, e.displayText() = DB::Exception: Cannot schedule a task (version 19.13.2.19) (from 10.2.0.119:29106) (in query: select data from table where date = '2019-09-04'), Stack trace:
0. clickhouse-server(StackTrace::StackTrace()+0x30) [0x6f28600]
1. clickhouse-server(DB::Exception::Exception(std::string const&, int)+0x1f) [0x316365f]
2. clickhouse-server(void ThreadPoolImpl<std::thread>::scheduleImpl<void>(std::function<void ()>, int, std::optional<unsigned long>)::{lambda()#1}::operator()() const+0x7e) [0x318fabe]
3. clickhouse-server(void ThreadPoolImpl<std::thread>::scheduleImpl<void>(std::function<void ()>, int, std::optional<unsigned long>)+0x562) [0x3190ce2]
4. clickhouse-server(ThreadPoolImpl<std::thread>::scheduleOrThrow(std::function<void ()>, int, unsigned long)+0x4e) [0x3190fae]
5. clickhouse-server(ThreadFromGlobalPool::ThreadFromGlobalPool<void (DB::ParallelInputsProcessor<DB::UnionBlockInputStream::Handler>::*)(std::shared_ptr<DB::ThreadGroupStatus>, unsigned long), DB::ParallelInputsProcessor<DB::UnionBlockInputStream::Handler>*, std::shared_ptr<DB::ThreadGroupStatus>, unsigned long&>(void (DB::ParallelInputsProcessor<DB::UnionBlockInputStream::Handler>::*&&)(std::shared_ptr<DB::ThreadGroupStatus>, unsigned long), DB::ParallelInputsProcessor<DB::UnionBlockInputStream::Handler>*&&, std::shared_ptr<DB::ThreadGroupStatus>&&, unsigned long&)+0x15d) [0x5dbac4d]
6. clickhouse-server(DB::UnionBlockInputStream::readImpl()+0x2a0) [0x5dbb280]
7. clickhouse-server(DB::IBlockInputStream::read()+0x238) [0x5c53178]
8. clickhouse-server(DB::copyData(DB::IBlockInputStream&, DB::IBlockOutputStream&, std::atomic<bool>*)+0x58) [0x5c72fb8]
9. clickhouse-server(DB::executeQuery(DB::ReadBuffer&, DB::WriteBuffer&, bool, DB::Context&, std::function<void (std::string const&)>, std::function<void (std::string const&)>)+0x614) [0x5ebeb44]
10. clickhouse-server(DB::HTTPHandler::processQuery(Poco::Net::HTTPServerRequest&, HTMLForm&, Poco::Net::HTTPServerResponse&, DB::HTTPHandler::Output&)+0x16dc) [0x31b19fc]
11. clickhouse-server(DB::HTTPHandler::handleRequest(Poco::Net::HTTPServerRequest&, Poco::Net::HTTPServerResponse&)+0x443) [0x31b4bb3]
12. clickhouse-server(Poco::Net::HTTPServerConnection::run()+0x2af) [0x6a76f2f]
13. clickhouse-server(Poco::Net::TCPServerConnection::start()+0xf) [0x6a6dc7f]
14. clickhouse-server(Poco::Net::TCPServerDispatcher::run()+0x166) [0x6a6e046]
15. clickhouse-server(Poco::PooledThread::run()+0x77) [0x70fd527]
16. clickhouse-server(Poco::ThreadImpl::runnableEntry(void*)+0x38) [0x70f96e8]
17. clickhouse-server() [0x76cb73f]
18. /lib64/libpthread.so.0(+0x7dd5) [0x7ff26c96edd5]
19. /lib64/libc.so.6(clone+0x6d) [0x7ff26c39602d]
Additional context
If the problem is too many tcp connections, would chproxy be a recommended solution?