Skip to content

Improve checksum checks for async INSERT into Distributed on the sender#18853

Merged
alexey-milovidov merged 9 commits intoClickHouse:masterfrom
azat:dist-send-checksums
Jan 15, 2021
Merged

Improve checksum checks for async INSERT into Distributed on the sender#18853
alexey-milovidov merged 9 commits intoClickHouse:masterfrom
azat:dist-send-checksums

Conversation

@azat
Copy link
Copy Markdown
Member

@azat azat commented Jan 8, 2021

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

  • Check per-block checksum of the distributed batch on the sender before sending (without reading the file twice, the checksums will be verified while reading), this will avoid stuck of the INSERT on the receiver (on truncated .bin file on the sender)
  • Avoid reading .bin files twice for batched INSERT (it was required to calculate rows/bytes to take squashing into account, now this information included into the header, backward compatible is preserved).

Refs: #18369

@robot-clickhouse robot-clickhouse added the pr-improvement Pull request with some product improvements label Jan 8, 2021
@azat azat marked this pull request as draft January 8, 2021 00:34
@alexey-milovidov alexey-milovidov self-assigned this Jan 8, 2021
@azat azat changed the title Check per-block checksum of the distributed batch on the sender before sending Improve checksum checks for async INSERT into Distributed on the sender Jan 10, 2021
azat added 4 commits January 10, 2021 18:17
…e sending

This is already done for distributed_directory_monitor_batch_inserts=1,
so let's do the same for the non batched mode, since otherwise in case
the file will be truncated the receiver will just stuck (since it will
wait for the block, but the sender will not send it).
… on sending

Before this patch StorageDistributedDirectoryMonitor reading .bin files
in batch mode, just to calculate number of bytes/rows, this is very
ineffective, let's just store them in the header (rows/bytes).
@azat azat force-pushed the dist-send-checksums branch 2 times, most recently from f070693 to b7a5bd3 Compare January 10, 2021 17:26
azat added 2 commits January 10, 2021 21:23
Buffer for reading from a compressed file with just checking checksums
of the compressed blocks, without any decompression, so result can be
proxied.
Before this patch the DirectoryMonitor was checking the compressed file
by reading it one more time (since w/o this receiver may stuck on
truncated file), while this is ineffective and we can just check the
checksums before sending.

But note that this may decrease batch size that is used for sending over
network.
@azat azat force-pushed the dist-send-checksums branch from b7a5bd3 to 2565d2a Compare January 10, 2021 18:23
@azat azat marked this pull request as ready for review January 10, 2021 18:24
@alexey-milovidov
Copy link
Copy Markdown
Member

alexey-milovidov commented Jan 11, 2021

Fuzzer has found unrelated issue #18887

Since now CompressedReadBufferBase read checksum and header at once it
will expect 25 (16+9) bytes not 16 (only checksum)
@azat
Copy link
Copy Markdown
Member Author

azat commented Jan 13, 2021

Here are some numbers, in general, they shows that CheckingCompressedReadBuffer has almost zero overhead.

patched

create table data (key Int, value String) engine=Null();
create table dist (key Int, value String) engine=Distributed(test_shard_localhost, currentDatabase(), data);
system stop distributed sends dist;

-- insert into Null() directly, just foward blocks to Null engine.
insert into dist select number, reinterpretAsString(number) from system.numbers limit 1e9 settings prefer_localhost_replica=1;
-- 0 rows in set. Elapsed: 16.888 sec. Processed 1.00 billion rows, 8.00 GB (59.23 million rows/s., 473.85 MB/s.)

-- async INSERT, with storing blocks on disk
insert into dist select number, reinterpretAsString(number) from system.numbers limit 1e9 settings prefer_localhost_replica=0;
-- 0 rows in set. Elapsed: 44.010 sec. Processed 1.00 billion rows, 8.00 GB (22.73 million rows/s., 181.83 MB/s.)

-- distributed_directory_monitor_batch_inserts=0
system flush distributed dist;
-- pending *.bin size is 7.5G
-- 0 rows in set. Elapsed: 22.445 sec.

-- distributed_directory_monitor_batch_inserts=1
system flush distributed dist;
-- first run
-- 0 rows in set. Elapsed: 50.845 sec.;
-- second run
-- 0 rows in set. Elapsed: 39.750 sec.;

upstream

-- distributed_directory_monitor_batch_inserts=0
system flush distributed dist;
-- 0 rows in set. Elapsed: 21.459 sec.

-- distributed_directory_monitor_batch_inserts=1
system flush distributed dist;
-- 0 rows in set. Elapsed: 45.488 sec.

And I believe that batching mode overhead can be eliminated by removing extra file reading.

NOTE: INSERT was done using the same version as system flush distributed dist, this is required to fill metadata correctly

@azat
Copy link
Copy Markdown
Member Author

azat commented Jan 13, 2021

And I believe that batching mode overhead can be eliminated by removing extra file reading.

Confirmed, w/o splitting by block header it is 2x faster (since does not requires extra file reading after other patches in this series)

Before this patch batched mode of the DirectoryMonitor is 2x slower then
non-batched, after it should be more or less the same as non-batched.
@azat
Copy link
Copy Markdown
Member Author

azat commented Jan 15, 2021

AST fuzzer — Logical error: 'Block structure mismatch in PipelineExecuting stream: different columns:

#19007

@alexey-milovidov alexey-milovidov merged commit ca825f3 into ClickHouse:master Jan 15, 2021
@azat azat deleted the dist-send-checksums branch January 15, 2021 18:13
azat added a commit to azat/ClickHouse that referenced this pull request Jan 19, 2021
- the sender will got ATTEMPT_TO_READ_AFTER_EOF (added in
  946c275) when the client just go
  away, i.e. server had been restarted, and this is incorrect to mark the
  file as broken in this case.

- since ClickHouse#18853 the file will be checked on the sender locally, and
  in case the file was truncated CANNOT_READ_ALL_DATA will be thrown.
  But before ClickHouse#18853 the sender will not receive
  ATTEMPT_TO_READ_AFTER_EOF from the client in case of file was truncated
  on the sender, since the client will just wait for more data, IOW just hang.

- and I don't see how ATTEMPT_TO_READ_AFTER_EOF can be received while
  reading local file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-improvement Pull request with some product improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants