Parallel parsing data formats#6553
Conversation
|
Do we really have to disable parallel parsings for so many tests? For several-line examples that are usually tested, parallel parsing doesn't make any difference. And I'm not sure what difference it makes for big files, so maybe we just have to leave it always on. If we do have to disable it for the tests, we can disable it globally in the configuration of the server that is used for the tests. |
It will make the order of data non-deterministic. The task №2 for @nikitamikhaylov is to make an option for order-preserving parallel parsing of data formats. |
17a7d05 to
504566a
Compare
a326910 to
d47d4cd
Compare
dbms/src/IO/ReadHelpers.h
Outdated
There was a problem hiding this comment.
My mind just goes blank when I see this function and how it's used. At the very least, it should have a sane name, and no bool flag that modifies behavior.
66ce5bd to
5a34e39
Compare
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
For changelog. Remove if this is non-significant change.
Category (leave one):
Changelog entry (up to few sentences, not needed for non-significant PRs):
(!) This feature enabled by default. (!)
Parallel parsing is carried out thanks to ParallelParsingBlockInputStream class. There are 3 diffrent roles: Segmantator, Parser and Reader. Only Parser is multithreaded. So, how this class works. Segmentator cut the original file (or smth from ReadBuffer) into small pieces (you can control it with min_chunk_size_for_parallel_parsing setting). Then many parsers (also you can use max_threads_for_parallel_parsing setting for tuning) turn these pieces into Blocks. After that Blocks will be inserted into table without any shuffling (because it is order-preserving parallel parsing).
Old PR:
#5372