Handle KILL requests while running pipeline executors#26675
Handle KILL requests while running pipeline executors#26675KochetovNicolai merged 2 commits intoClickHouse:masterfrom
Conversation
|
TSAN found a valid issue as QueryState::is_cancelled was being written / read by 2 threads at the same time. GRPC should be fine (Call::want_to_cancel was already atomic), and PipelineExecutor::cancel() should be fine too (as everything is either atomic or accessed under mutex, or both). |
Looking at stderr.log it's pointing to an issue in rocksdb: And looking at the logs, in stress_run_logs.tar.gz > stress_test_run_0.txt is complaining about a data race in Both seem completely unrelated to this change. I'm not sure what should be done here. Open tickets for both? |
|
Reported both issues. The failing integration tests look flaky |
|
Rebased on top of master/HEAD |
|
It's generally fine.
Actually, there is one |
The problem I had with that is that we need to call cancel() over the wrapper not the internal executor so that the whole process is cancelled. For example,
Maybe it could be done when creating the PipelineExecutor and removing it on finish()? If it's passed correctly (not NULL for external queries) then we could handle registration and deregistration there (give QueryStatus an array of pointers to PipelineExecutor and have it call to cancel() on all of them). Would that be safe? Another question related to this indirectly. Any ideas on how to improve timeout handling on the PipelineExecutor? Currently it's only checked very rarely (at the end) which can be hours in some extreme cases. |
Well, we have to call
Yes, I had the same idea. To use RAII or something.
Right now, timeout is checked in |
Looking at the code it seems that if the internal PipelineExecutor is cancelled, the PullingAsyncPipelineExecutor will be cancelled either in the next pull or in the destructor (if an exception is thrown) so it looks safe. I'm going to try and change this into a automatic registration in the PipelineExecutor (constructor and destructor maybe?) and keep a list of executors in the QueryStatus and see what happens. |
Yes. But also maybe we need changes from #26828 |
6e5d740 to
89fa2fd
Compare
|
Rebased on top of master (includes #26828) |
|
Rebased again on top of master, which should remove some of the failures in CI (01921_datatype_date32). The others seem unrelated too. |
|
Integration tests (thread) - flacky |
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Improve handling of KILL QUERY requests.
Detailed description / Documentation draft:
Related to #26554. It addresses the KILL not working for those queries, but not the timeout not triggering.
Things that I tried before deciding in favor of using a callback:
process_list_entrywas killed in the handler loop. This worked well, except that it only works for the handlers that have external loops you can use to check the flag (TCPHandler, GRPCServer), leaving the HTTP handler not detecting it.cancelQuery. This allows anything to be called in addition to the current behaviour (calling cancel on input and output streams) which isn't enough in some cases.