Parallelize query processing right after reading FROM ...#48727
Parallelize query processing right after reading FROM ...#48727alexey-milovidov merged 22 commits intomasterfrom
Conversation
- mostly simple or external storages
+ avoid parallelization after SourceFromSingleChunk, SystemOne
see test_dictionaries_redis/test_long.py::test_redis_dict_long
|
https://s3.amazonaws.com/clickhouse-test-reports/48727/bb60f10035f0ce7538b98bab02c219dff474fdae/integration_tests__asan__[1/6].html |
test_storage_mysql/test.py:test_settings_connection_wait_timeout
|
Affected :
Note: P.S. Can miss something. Probably, need to add some tests/performance tests in addition |
+ disable parallelization for storage Null
|
Ok. Many performance tests use zeros/zeros_mt to check single/multithread performance. |
Rather then changing tests it much simpler to avoid parallelization after |
|
It'd be nice to understand the reason for failure with dictionaries. If output is parallelized for dictionaries, query returns correct result with Incrementing SELECT count(), uniqExact(date), uniqExact(id) FROM redis_dict") settings max_threads=2 1001 | 2 | 1000 SELECT count(), uniqExact(date), uniqExact(id) FROM redis_dict") settings max_threads=3 1002 | 2 | 1000 ... The reason is that a row is generated somewhere for each stream w/o data. But I didn't figure out yet how/where |
| { | ||
| auto pipe = read(column_names, storage_snapshot, query_info, context, processed_stage, max_block_size, num_streams); | ||
|
|
||
| /// parallelize processing if not yet |
There was a problem hiding this comment.
- Should we do it here, or is it better to do it inside
InterpreterSelectQuery? - Are there any potential troubles with mutations and StorageFromMergeTreeDataPart?
There was a problem hiding this comment.
- Should we do it here, or is it better to do it inside
InterpreterSelectQuery?
I think it's ok to do it here with the following considerations ...
num_streams is provided by InterpreterSelectQuery as a recommendation, i.e. how many threads are available for data processing. The reading step has the following choices:
-
(a) it knows the amount of data it can read, and it's not much data, so it creates only the necessary number of data streams based on parameters passed to
IStorage::read()i.e.max_block_size/storage_limits. In this case, we don't want to adjust the number of streams andparallelizeOutputAfterReading()can returnfalse -
(b) it's either an unknown amount of data or known amount of data (but enough to utilize all available threads) -> in both cases output is parallelized by
num_streams
- Are there any potential troubles with mutations and StorageFromMergeTreeDataPart?
The generic thing about this change affects only storage which will use default plan step to read from storage – ReadFromStorageStep. Sophisticated engines use specialized steps to read from its storage, like ReadFromMergeTree in MergeTree case.
StorageFromMergeTreeDataPart is not affected since it uses the ReadFromMergeTree step, which overrides read() method where resize() is added.
|
Just curious: how does it not fail lots of tests in CI? When I tried |
Probably they were fixed in #48525 ? |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Query processing is parallelized right after reading from a data source. Affected data sources are mostly simple or external storages like table functions
url,file.