Fix distributed send that are scheduled by INSERT query#10486
Fix distributed send that are scheduled by INSERT query#10486alexey-milovidov merged 10 commits intoClickHouse:masterfrom
Conversation
2f368e4 to
133f94a
Compare
71a68e9 to
23bdd0b
Compare
7fb9dc5 to
4db6729
Compare
check failures looks unrelated |
Before this patch each INSERT query re-schedule distributed send, thus each time it resets the timer, while this is not the expected behaviour, since in on frequent INSERT distributed sends will not be triggered at all. Fix this by not resetting the timer.
Since in this case it will be scheduled from the DistributedBlockOutputStream with the distributed_directory_monitor_max_sleep_time_ms, and this will overwrite timer that was set by the DistributedBlockOutputStream, not good.
This also fixes hardlink code (when one file should be sent to multiple servers, i.e. internal_replication == false) of writeToShard() with distributed_storage_policy (i.e. when StorageDistributed::getPath() will path to different filesystems). Plus also cleanup DistributedBlockOutputStream::writeToShard() a little.
…tion After first INSERT for disk2 there should not be created any per-shard directory, so find(1) should report an error, like: find: '/disk2/data/test/dist_foo/default@127%2E0%2E0%2E2:9000': No such file or directory" But sometimes output can be fixed, and output of wc(1) will goes first and python's int() will parse it and not fail, but if find(1) stderr will goes first the int() will fail to parse. And here is an example of such mixing: $ docker run --name alpine --rm -it alpine top $ docker exec alpine sh -c 'echo foo >&2 | wc -c' foo 0 $ docker exec alpine sh -c 'echo foo >&2 | wc -c' 0 foo
|
Rebased against upstream/master to fix conflicts (previous HEAD was 4db67291ce40ac710bdb0a4c1984562f8c22c027 - for the test report) |
|
Check failures looks unrelated |
|
@alexey-milovidov any news on this one? |
…sleep_time_ms > 5min
In this case error_count can be decreased before checking it for
rescheduling send.
And actually this can be a problem not only when
distributed_directory_monitor_{max_,}sleep_time_ms > 5min, because all
threads can be occupated and the real timeout between sends will be > 5min.
Is there any chance that I can get P.S. That commit goes before this PR merge, so it was broken before. |
|
Unfortunately, data from this task has gone. |
|
Looks like the |
|
It's very interesting, how it was possible... |
Actually it is not, in a nut shell the problem is in racy check in
So it is better to backport that patch, although looks like poco is not under active development (and myabe you even tried to do this) ... |
|
Ok, I will disable this test for unbundled build.
I don't remember. |
Fix distributed send that are scheduled by INSERT query (cherry picked from commit ddc8416)
* Merge pull request #10268 from ClickHouse/max-rows-to-sort Added failing tests about "max_rows_to_sort" setting. (cherry picked from commit f7b1263) * Merge pull request #10486 from azat/dist-send-on-INSERT Fix distributed send that are scheduled by INSERT query (cherry picked from commit ddc8416) * Merge pull request #10516 from azat/dist-GROUP_BY-sharding_key-fixes Disable GROUP BY sharding_key optimization by default (and fix for WITH ROLLUP/CUBE/TOTALS) (cherry picked from commit 33d491e) * Merge pull request #10560 from Enmk/DateTime64_fixes Fixed comparing DateTime64 in WHERE against String value (cherry picked from commit 67efc7f) * Merge pull request #10563 from azat/SELECT-ALIAS-CAST Fix SELECT of column ALIAS which default expression type different from column type (cherry picked from commit 15e38c8) * Merge pull request #10569 from zhang2014/fix/ISSUES-10551 ISSUES-10551 add backward compatibility for create bloom filter index (cherry picked from commit db4c235) * Merge pull request #10587 from vitlibar/database-with-dictionary-init-in-constructor Move initialization of DatabaseWithDictionaries to constructor. (cherry picked from commit 2528e72) * Merge pull request #10588 from excitoon-favorites/fixmutations Fixed handling condition variable for synchronous mutations (cherry picked from commit 992e589) * Merge pull request #10611 from azat/optimize_skip_unused_shards-LowCardinality Fix optimize_skip_unused_shards with LowCardinality (cherry picked from commit 229f666) * Merge pull request #10641 from ClickHouse/storage-buffer-nullptr-dereference Fix nullptr dereference in StorageBuffer (cherry picked from commit 884c2aa) * Merge pull request #10659 from ClickHouse/fix_multiple_simultaneous_alters Fix alter-mutations assignment (cherry picked from commit 5eebe8c) * Merge pull request #10711 from ClickHouse/h3-range-check Range check in function h3EdgeAngle (cherry picked from commit d0b61f9) * Merge pull request #10741 from hczhcz/patch-0422 Fix OrNull and OrDefault (cherry picked from commit 699ef4f) * Merge pull request #10757 from ClickHouse/fix-parallel-mv Fix parallel MV (cherry picked from commit ef1c7da) * Merge pull request #10791 from oandrew/key-condition-source-type Use src_type for conversion in KeyCondition (cherry picked from commit a133389) * Merge pull request #10798 from ClickHouse/fix-date-lut-msan-ubsan Fix UBSan and MSan report in DateLUT (cherry picked from commit a2f220f) * Merge pull request #10821 from ClickHouse/fix-odbc-bridge-clickhouse Fix the issue with ODBC bridge and identifier_quoting_style = None #7984 (cherry picked from commit 5115ac2) * Merge pull request #10826 from azat/block-sort-fix Fix columns order after Block::sortColumns() (cherry picked from commit 75607db) * Merge pull request #10834 from ClickHouse/fix-msan-failure-cache-dictionary Fix MSan failure in cache dictionary (cherry picked from commit 6cba2da) * Merge pull request #10847 from ClickHouse/fix_alter_rename_and_constraints Fix constraints after column rename (cherry picked from commit 377ef65) * Merge pull request #10849 from ClickHouse/fix_optimize_and_alter_hangs Fix mutations and OPTIMIZE hangs when replica becomes inactive (cherry picked from commit 623b2e5) * Merge pull request #10859 from ClickHouse/fix_watch_livelock_with_database_atomic Fix livelock with WATCH queries and DatabaseAtomic (cherry picked from commit a47bf09) * Merge pull request #10870 from azat/fix-SIGSEGV-in-hash-table-for-string Fix SIGSEGV in StringHashTable (if such key does not exist) (cherry picked from commit 0433d9c) * Destructive IAggregateFunction::insertResultInto and ColumnAggregateFunction::convertToValues (#10890) * Destructive IAggregateFunction::insertResultInto and ColumnAggregateFunction::convertToValues * Try fix build. * Try fix build. * Fix build. * Make convertToValues static. * fix build. * Remove const casts. * Added comment. * Fix build. * Fix build. * Add test. * Fix test. (cherry picked from commit f653058) * Merge pull request #10895 from ClickHouse/fix_multiple_renames Fix error in multiple rename commands in a single query. (cherry picked from commit be4037a) * Merge pull request #10920 from ClickHouse/fix-notNullIn-with-null Fix not null in with null (cherry picked from commit 1bceb48) * Merge pull request #10940 from azat/dist-send-partially-written-read-fix Avoid sending partially written files by the DistributedBlockOutputStream (cherry picked from commit 0d76091) * Merge pull request #10952 from ClickHouse/fix-distributed-queries-incompatibility-19.16-20.1 Fix incompatibility of two-level aggregation between 19.16 and 20.1 (cherry picked from commit c957154) * Merge pull request #10964 from ObjatieGroba/bugfix_raw_column_size Fix incorrect Column byte size (cherry picked from commit b6b1947) * Merge pull request #10968 from ClickHouse/fix_database_atomic_terminate_called Fix exception from destructor in DatabaseAtomic (cherry picked from commit 103f66e) * Merge pull request #10970 from ClickHouse/try-fix-use-after-free-mergetree Try to fix use-after-free error in MergeTree (cherry picked from commit 421eb6d) * Merge pull request #10980 from azat/database-atomic-fixes Database atomic fixes (cherry picked from commit 6f1d522) Co-authored-by: alexey-milovidov <[email protected]> Co-authored-by: Vitaly Baranov <[email protected]> Co-authored-by: alesapin <[email protected]> Co-authored-by: Anton Popov <[email protected]> Co-authored-by: tavplubix <[email protected]> Co-authored-by: Nikolai Kochetov <[email protected]>
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Fixes: #10263 (after that PR dist send via INSERT had been postponing on each INSERT)
Fixes: #8756 (that PR breaks distributed sends with all of the following conditions met (unlikely setup for now I guess):
internal_replication == false, multiple local shards (activates the hardlinking code) anddistributed_storage_policy(makeslink(2)fails onEXDEV))