Skip to content

Fix distributed send that are scheduled by INSERT query#10486

Merged
alexey-milovidov merged 10 commits intoClickHouse:masterfrom
azat:dist-send-on-INSERT
May 11, 2020
Merged

Fix distributed send that are scheduled by INSERT query#10486
alexey-milovidov merged 10 commits intoClickHouse:masterfrom
azat:dist-send-on-INSERT

Conversation

@azat
Copy link
Copy Markdown
Member

@azat azat commented Apr 24, 2020

Changelog category (leave one):

  • Bug Fix

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Fixes: #10263 (after that PR dist send via INSERT had been postponing on each INSERT)
Fixes: #8756 (that PR breaks distributed sends with all of the following conditions met (unlikely setup for now I guess): internal_replication == false, multiple local shards (activates the hardlinking code) and distributed_storage_policy (makes link(2) fails on EXDEV))

@azat azat marked this pull request as draft April 24, 2020 23:10
@azat azat force-pushed the dist-send-on-INSERT branch from 2f368e4 to 133f94a Compare April 25, 2020 22:45
@azat azat marked this pull request as ready for review April 25, 2020 23:03
@azat azat force-pushed the dist-send-on-INSERT branch 2 times, most recently from 71a68e9 to 23bdd0b Compare April 26, 2020 13:55
@azat azat marked this pull request as draft April 26, 2020 18:48
@azat azat force-pushed the dist-send-on-INSERT branch from 7fb9dc5 to 4db6729 Compare April 27, 2020 00:02
@azat azat marked this pull request as ready for review April 27, 2020 00:02
@azat
Copy link
Copy Markdown
Member Author

azat commented Apr 28, 2020

ClickHouse build check — 16/17 builds are OK
Performance — 1 faster, 6 slower, 144 unstable

check failures looks unrelated

azat added 9 commits May 3, 2020 14:46
Before this patch each INSERT query re-schedule distributed send, thus
each time it resets the timer, while this is not the expected behaviour,
since in on frequent INSERT distributed sends will not be triggered at
all.

Fix this by not resetting the timer.
Since in this case it will be scheduled from the
DistributedBlockOutputStream with the
distributed_directory_monitor_max_sleep_time_ms, and this will overwrite
timer that was set by the DistributedBlockOutputStream, not good.
This also fixes hardlink code (when one file should be sent to multiple
servers, i.e. internal_replication == false) of writeToShard() with
distributed_storage_policy (i.e. when StorageDistributed::getPath() will
path to different filesystems).

Plus also cleanup DistributedBlockOutputStream::writeToShard() a little.
…tion

After first INSERT for disk2 there should not be created any per-shard
directory, so find(1) should report an error, like:

  find: '/disk2/data/test/dist_foo/default@127%2E0%2E0%2E2:9000': No such file or directory"

But sometimes output can be fixed, and output of wc(1) will goes first
and python's int() will parse it and not fail, but if find(1) stderr
will goes first the int() will fail to parse.

And here is an example of such mixing:

  $ docker run --name alpine --rm -it alpine top
  $ docker exec alpine sh -c 'echo foo >&2 | wc -c'
  foo
  0
  $ docker exec alpine sh -c 'echo foo >&2 | wc -c'
  0
  foo
@azat azat force-pushed the dist-send-on-INSERT branch from 4db6729 to 14fccf5 Compare May 3, 2020 11:48
@azat
Copy link
Copy Markdown
Member Author

azat commented May 3, 2020

Rebased against upstream/master to fix conflicts (previous HEAD was 4db67291ce40ac710bdb0a4c1984562f8c22c027 - for the test report)

@azat
Copy link
Copy Markdown
Member Author

azat commented May 3, 2020

Check failures looks unrelated

@azat
Copy link
Copy Markdown
Member Author

azat commented May 5, 2020

@alexey-milovidov any news on this one?

…sleep_time_ms > 5min

In this case error_count can be decreased before checking it for
rescheduling send.

And actually this can be a problem not only when
distributed_directory_monitor_{max_,}sleep_time_ms > 5min, because all
threads can be occupated and the real timeout between sends will be > 5min.
@azat azat mentioned this pull request May 11, 2020
16 tasks
@alexey-milovidov alexey-milovidov added the pr-bugfix Pull request with bugfix, not backported by default label May 11, 2020
@alexey-milovidov alexey-milovidov merged commit ddc8416 into ClickHouse:master May 11, 2020
@alexey-milovidov alexey-milovidov self-assigned this May 11, 2020
@alexey-milovidov
Copy link
Copy Markdown
Member

alexey-milovidov commented May 11, 2020

@azat azat deleted the dist-send-on-INSERT branch May 11, 2020 09:21
@azat
Copy link
Copy Markdown
Member Author

azat commented May 11, 2020

Looks like it has broke this test: https://clickhouse-test-reports.s3.yandex.net/10779/bd26547f9908d1b0658d75c3090eda20f5f2e2f9/functional_stateless_tests_(unbundled)/test_run.txt.out.log

Is there any chance that I can get /var/lib/clickhouse/data from that run (since I cannot reproduce the failure) ?

P.S. That commit goes before this PR merge, so it was broken before.

@alexey-milovidov
Copy link
Copy Markdown
Member

Unfortunately, data from this task has gone.

@azat
Copy link
Copy Markdown
Member Author

azat commented May 11, 2020

Looks like the tmp was a file (this is the only reason for such error in Poco::File::createDirectory)

@alexey-milovidov
Copy link
Copy Markdown
Member

It's very interesting, how it was possible...
I will try to find similar failures for recent commits.

@azat
Copy link
Copy Markdown
Member Author

azat commented May 11, 2020

It's very interesting, how it was possible...

Actually it is not, in a nut shell the problem is in racy check in Poco::File::createDirectory in the upstream poco, here is details:

So it is better to backport that patch, although looks like poco is not under active development (and myabe you even tried to do this) ...

@alexey-milovidov
Copy link
Copy Markdown
Member

Ok, I will disable this test for unbundled build.

and myabe you even tried to do this

I don't remember.

abyss7 pushed a commit to abyss7/ClickHouse that referenced this pull request May 18, 2020
Fix distributed send that are scheduled by INSERT query

(cherry picked from commit ddc8416)
nikitamikhaylov pushed a commit that referenced this pull request May 23, 2020
* Merge pull request #10268 from ClickHouse/max-rows-to-sort

Added failing tests about "max_rows_to_sort" setting.

(cherry picked from commit f7b1263)

* Merge pull request #10486 from azat/dist-send-on-INSERT

Fix distributed send that are scheduled by INSERT query

(cherry picked from commit ddc8416)

* Merge pull request #10516 from azat/dist-GROUP_BY-sharding_key-fixes

Disable GROUP BY sharding_key optimization by default (and fix for WITH ROLLUP/CUBE/TOTALS)

(cherry picked from commit 33d491e)

* Merge pull request #10560 from Enmk/DateTime64_fixes

Fixed comparing DateTime64 in WHERE against String value

(cherry picked from commit 67efc7f)

* Merge pull request #10563 from azat/SELECT-ALIAS-CAST

Fix SELECT of column ALIAS which default expression type different from column type

(cherry picked from commit 15e38c8)

* Merge pull request #10569 from zhang2014/fix/ISSUES-10551

ISSUES-10551 add backward compatibility for create bloom filter index

(cherry picked from commit db4c235)

* Merge pull request #10587 from vitlibar/database-with-dictionary-init-in-constructor

Move initialization of DatabaseWithDictionaries to constructor.

(cherry picked from commit 2528e72)

* Merge pull request #10588 from excitoon-favorites/fixmutations

Fixed handling condition variable for synchronous mutations

(cherry picked from commit 992e589)

* Merge pull request #10611 from azat/optimize_skip_unused_shards-LowCardinality

Fix optimize_skip_unused_shards with LowCardinality

(cherry picked from commit 229f666)

* Merge pull request #10641 from ClickHouse/storage-buffer-nullptr-dereference

Fix nullptr dereference in StorageBuffer

(cherry picked from commit 884c2aa)

* Merge pull request #10659 from ClickHouse/fix_multiple_simultaneous_alters

Fix alter-mutations assignment

(cherry picked from commit 5eebe8c)

* Merge pull request #10711 from ClickHouse/h3-range-check

Range check in function h3EdgeAngle

(cherry picked from commit d0b61f9)

* Merge pull request #10741 from hczhcz/patch-0422

Fix OrNull and OrDefault

(cherry picked from commit 699ef4f)

* Merge pull request #10757 from ClickHouse/fix-parallel-mv

Fix parallel MV

(cherry picked from commit ef1c7da)

* Merge pull request #10791 from oandrew/key-condition-source-type

Use src_type for conversion in KeyCondition

(cherry picked from commit a133389)

* Merge pull request #10798 from ClickHouse/fix-date-lut-msan-ubsan

Fix UBSan and MSan report in DateLUT

(cherry picked from commit a2f220f)

* Merge pull request #10821 from ClickHouse/fix-odbc-bridge-clickhouse

Fix the issue with ODBC bridge and identifier_quoting_style = None #7984

(cherry picked from commit 5115ac2)

* Merge pull request #10826 from azat/block-sort-fix

Fix columns order after Block::sortColumns()

(cherry picked from commit 75607db)

* Merge pull request #10834 from ClickHouse/fix-msan-failure-cache-dictionary

Fix MSan failure in cache dictionary

(cherry picked from commit 6cba2da)

* Merge pull request #10847 from ClickHouse/fix_alter_rename_and_constraints

Fix constraints after column rename

(cherry picked from commit 377ef65)

* Merge pull request #10849 from ClickHouse/fix_optimize_and_alter_hangs

Fix mutations and OPTIMIZE hangs when replica becomes inactive

(cherry picked from commit 623b2e5)

* Merge pull request #10859 from ClickHouse/fix_watch_livelock_with_database_atomic

Fix livelock with WATCH queries and DatabaseAtomic

(cherry picked from commit a47bf09)

* Merge pull request #10870 from azat/fix-SIGSEGV-in-hash-table-for-string

Fix SIGSEGV in StringHashTable (if such key does not exist)

(cherry picked from commit 0433d9c)

* Destructive IAggregateFunction::insertResultInto and ColumnAggregateFunction::convertToValues (#10890)

* Destructive IAggregateFunction::insertResultInto and ColumnAggregateFunction::convertToValues

* Try fix build.

* Try fix build.

* Fix build.

* Make convertToValues static.

* fix build.

* Remove const casts.

* Added comment.

* Fix build.

* Fix build.

* Add test.

* Fix test.

(cherry picked from commit f653058)

* Merge pull request #10895 from ClickHouse/fix_multiple_renames

Fix error in multiple rename commands in a single query.

(cherry picked from commit be4037a)

* Merge pull request #10920 from ClickHouse/fix-notNullIn-with-null

Fix not null in with null

(cherry picked from commit 1bceb48)

* Merge pull request #10940 from azat/dist-send-partially-written-read-fix

Avoid sending partially written files by the DistributedBlockOutputStream

(cherry picked from commit 0d76091)

* Merge pull request #10952 from ClickHouse/fix-distributed-queries-incompatibility-19.16-20.1

Fix incompatibility of two-level aggregation between 19.16 and 20.1

(cherry picked from commit c957154)

* Merge pull request #10964 from ObjatieGroba/bugfix_raw_column_size

Fix incorrect Column byte size

(cherry picked from commit b6b1947)

* Merge pull request #10968 from ClickHouse/fix_database_atomic_terminate_called

Fix exception from destructor in DatabaseAtomic

(cherry picked from commit 103f66e)

* Merge pull request #10970 from ClickHouse/try-fix-use-after-free-mergetree

Try to fix use-after-free error in MergeTree

(cherry picked from commit 421eb6d)

* Merge pull request #10980 from azat/database-atomic-fixes

Database atomic fixes

(cherry picked from commit 6f1d522)

Co-authored-by: alexey-milovidov <[email protected]>
Co-authored-by: Vitaly Baranov <[email protected]>
Co-authored-by: alesapin <[email protected]>
Co-authored-by: Anton Popov <[email protected]>
Co-authored-by: tavplubix <[email protected]>
Co-authored-by: Nikolai Kochetov <[email protected]>
@azat azat mentioned this pull request May 26, 2020
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-bugfix Pull request with bugfix, not backported by default

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants