Fix possible permanent "Cannot schedule a task" error#9154
Conversation
Fails on upstream/master too |
alexey-milovidov
left a comment
There was a problem hiding this comment.
Ok but I don't understand what is the fix.
|
No logic in ThreadPool was changed. It should work with unhandled exceptions regardless to the change. |
In case of exception it will shutdown it, and in some cases the global pool can be terminated, which will lead to "cannot schedule task" permanently for every new connection/query d83f249 has some details, if this is not enough, ping me, I can try to get back with reproducer |
We don't use GlobalThreadPool directly, only via ThreadFromGlobalPool. |
|
The proper fix will be:
|
Ok. Does |
So the exception need to be re-thrown somehow, this can be done from the
I'm suggesting to not shutdown the loop on exceptions, for now this can done only if this is the global pool, but I would prefer not to add this condition, since this will make it less readable and the exception will be thrown from the |
580b5b4 to
ff82acf
Compare
Decided to make it for global pool only (at least for now) |
|
@azat Is it possible to create a test case for that? What is happening with 00974_query_profiler ? @alexey-milovidov any chances to finish the review? |
It is easy to cover this with unit test, but this is not that interesting since this is the problem of the callers. |
Otherwise GlobalThreadPool can be terminated (for example due to an
exception from the ParallelInputsHandler::onFinish/onFinishThread, from
ParallelAggregatingBlockInputStream::Handler::onFinish/onFinishThread,
since writeToTemporaryFile() can definitelly throw) and the server will
not accept new connections (or/and execute queries) anymore.
Here is possible stacktrace (it is a bit inaccurate, due to
optimizations I guess, and it had been obtained with the
DB::tryLogCurrentException() in the catch block of the
ThreadPoolImpl::worker()):
2020.02.16 22:30:40.415246 [ 45909 ] {} <Error> ThreadPool: Unhandled exception in the ThreadPool(10000,1000,10000) the loop will be shutted down: Code: 241, e.displayText() = DB::Exception: Memory limit (total) exceeded: would use 279.40 GiB (attempt to allocate chunk of 4205536 bytes), maximum: 279.40 GiB, Stack trace (when copying this message, always include the lines below):
1. Common/Exception.cpp:35: DB::Exception::Exception(...)
...
6. Common/Allocator.h:102: void DB::PODArrayBase<8ul, 4096ul, Allocator<false, false>, 15ul, 16ul>::reserve<>(unsigned long) (.part.0)
7. Interpreters/Aggregator.cpp:1040: void DB::Aggregator::writeToTemporaryFileImpl<...>(...)
8. Interpreters/Aggregator.cpp:719: DB::Aggregator::writeToTemporaryFile(...)
9. include/memory:4206: DB::Aggregator::writeToTemporaryFile(...)
10. DataStreams/ParallelInputsProcessor.h:223: DB::ParallelInputsProcessor<DB::ParallelAggregatingBlockInputStream::Handler>::thread(...)
Refs: ClickHouse#6833 (comment)
(Reference to particular comment, since I'm not sure about the initial issue)
ff82acf to
a15b2da
Compare
So I tried to do this just for fun, but because of compression (for temporary files in aggregator) this approach cannot be used.
I have no idea, but after rebase it does not fails anymore (maybe some issue that has been fixed in master already? anyway logs wasn't verbose enough to investigate this) |
|
@azat query profiler test was temporary broken in master a few weeks ago, then fixed. |
|
The original issue
should not affect versions 20.3+, because |
It was found on 20.2.1.2337-23 (even though AFAIK there wasn't official release), and AFAICS the processors has been enabled: |
|
@azat But how it's related to |
Hm, query that triggers it had been initiated by the Distributed engine, and I guess it was possible in pre #8929 (v20.2.1.2425-testing-181-g1e8389eceb) (But I guess it is not important for stable releases) |
|
Sorry, this PR introduced race condition: (I will fix it) |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Fix possible permanent "Cannot schedule a task" error (due to unhandled exception in
ParallelAggregatingBlockInputStream::Handler::onFinish/onFinishThread)Refs: #6833