Check max_execution_time in the pipeline and pulling executors#31636
Check max_execution_time in the pipeline and pulling executors#31636KochetovNicolai merged 14 commits intoClickHouse:masterfrom
Conversation
|
Looking at the failures:
I'm investigating the true failures:
|
…ystatus Otherwise the query status would keep a pointer to the executor which is dying at that very moment
de3fe0e to
cbe3a47
Compare
|
Current failures:
I'll review this tomorrow plus any other thing that appears. Comments are welcome at any time. |
Co-authored-by: Azat Khuzhin <[email protected]>
|
02008_materialize_column failures because it's broken in master (currently reverted and being reapplied in #31693) |
| if (!executor) | ||
| executor = std::make_shared<PipelineExecutor>(pipeline.processors, pipeline.process_list_element); | ||
|
|
||
| if (!executor->checkTimeLimit()) |
There was a problem hiding this comment.
Note: if we throw exception here, executor is not stopped (until the moment of ~PullingPipelineExecutor)
There was a problem hiding this comment.
I understand that's not an issue, right? If nobody catches the exception the executor destructor will be called, if somebody does then it will need to handle it and eventually the executor will be cancelled.
There was a problem hiding this comment.
It is still not very cool. E.g. executor itself may continue working (e.g. in case of bug)., and we may wait for a long time in destructor.
| auto settings = process_list_element->context.lock()->getSettings(); | ||
| limits.max_execution_time = settings.max_execution_time; | ||
| overflow_mode = settings.timeout_overflow_mode; |
There was a problem hiding this comment.
I think it may be more reasonable to add timeout check logic into process_list_element.
There was a problem hiding this comment.
Makes sense to me. I'll add a checkTimeLimit() to process_list_element and use it from the PipeLineExecutor
|
Ey @KochetovNicolai does this need any changes or improvements? I want to build on top of this to be able to cancel scalar subqueries (#1576) but I'd rather keep things separate as otherwise I end up with large PRs that are hard to review. |
| bool continuing = process_list_element->checkTimeLimit(); | ||
| // We call cancel here so that all processors are notified and tasks waken up | ||
| // so that the "break" is faster and doesn't wait for long events | ||
| if (!continuing) | ||
| cancel(); |
There was a problem hiding this comment.
well, I don't like we don't call cancel in case of exception from process_list_element->checkTimeLimit().
What can we do:
- add softer check to
process_list_element, something likebool isTimeLimitExceeded()which newer throws PipelineExecutor::checkTimeLimituses thisisTimeLimitExceededversion and just cancel. So, it's like a soft check.- In other places like
PipelineExecutor::finalizeExecutionwe call regularprocess_list_element->checkTimeLimit()and throw exception if needed.
There was a problem hiding this comment.
What do you think of having something like:
bool PipelineExecutor::checkTimeLimitSoft()
{
try
{
return checkTimeLimit();
}
catch (...)
{
cancel();
return false;
}
}Then calling checkTimeLimitSoft inside the PullingPipelineExecutor and PullingAsyncPipelineExecutor and in the executeStepImpl loop, and checkTimeLimit and the start and end of the execution. I think that fits what you were saying right?
There was a problem hiding this comment.
Yes, the idea is the same.
But let's try to avoid try/catch(...) with ignoring exception if possible :)
|
Sorry for not commenting for so long |
|
We need just to check time limit for every iteration of executeImpl. But avoid calling |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Improve the
max_execution_timechecks. Fixed some cases when timeout checks do not happen and query could run too long.Detailed description / Documentation draft:
timeout_overflow_modeis set tothrow(default) thePipelineExecutoris the most likely to cancel the query mentioned in JOIN + GROUP BY doesn't respect timeouts or KILL requests #26554.timeout_overflow_modeis set tobreakboth pulling executors will stop pulling data and finish the query with whatever data was already output.num_queries_incrementfromQueryStatussince it wasn't used.The checks inside of
PipelineExecutordidn't seem excesive so I don't expect it to affect performance, but let's see what the tests say. Iis a better way or place to control the timeouts? cc @KochetovNicolaiCloses #26554 : the KILL part was implemented in #26675 for 21.9+ and this implements the timeout checks.
Closes #31657