Support system replicas queries for distributed by zhang2014 · Pull Request #4935 · ClickHouse/ClickHouse

zhang2014 · 2019-04-08T07:18:01Z

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Category :

Improvement

Short description :

Support SYSTEM SYNC REPLICA for distributed storage
Support SYSTEM START|STOP REPLICATED SENDS for distributed storage

zhang2014 · 2019-04-08T07:23:10Z

dbms/src/Storages/StorageMaterializedView.cpp

    target_table->checkPartitionCanBeDropped(partition);
 }

+ActionLock StorageMaterializedView::getActionLock(StorageActionBlockType type)


For the materialized view, maybe we should push down ActionLock ?

alesapin

Using of same queries for Replicated and Distributed tables is confusing. It would be better to add another query, for example SYSTEM SYNC DISTRIBUTED, SYSTEM STOP DISTRIBUTED SENDS, etc.

Also I'm not sure that this type of queries for Distributed tables are useful. It only triggers additional findFile and fails in case of exception in this function. But StorageDistributedDirectoryMonitor already triggers this function as frequent as possible in background thread and sleeps only in case of exceptions in findFiles. Which problem these queries solve? Maybe exponent in backoff calculation is too big?

alesapin · 2019-04-18T16:06:43Z

dbms/src/Storages/Distributed/DirectoryMonitor.cpp

+                    std::chrono::milliseconds{Int64(default_sleep_time.count() * std::exp2(error_count))},
+                    std::chrono::milliseconds{max_sleep_time});
+                tryLogCurrentException(getLoggerName().data());
+            }


Need to write something to log about it.

alesapin · 2019-04-18T16:10:12Z

dbms/src/Storages/Distributed/DirectoryMonitor.cpp

+void StorageDistributedDirectoryMonitor::syncReplicaSends()
+{
+    if (quit || monitor_blocker.isCancelled())
+        throw Exception("Cancelled sync distributed sync replica sends.", ErrorCodes::ABORTED);


Unclear message.

zhang2014 · 2019-04-22T15:28:12Z

Using of same queries for Replicated and Distributed tables is confusing. It would be better to add another query, for example SYSTEM SYNC DISTRIBUTED, SYSTEM STOP DISTRIBUTED SENDS, etc.

Done

Also I'm not sure that this type of queries for Distributed tables are useful. It only triggers additional findFile and fails in case of exception in this function. But StorageDistributedDirectoryMonitor already triggers this function as frequent as possible in background thread and sleeps only in case of exceptions in findFiles. Which problem these queries solve? Maybe exponent in backoff calculation is too big?

SYNC DISTRIBUTED is more convenient for writing tests, as well as some special scenarios, such as ensuring that no distributed data needs to be sent before data migration.

alesapin · 2019-04-26T07:55:37Z

SYNC DISTRIBUTED is more convenient for writing tests, as well as some special scenarios, such as ensuring that no distributed data needs to be sent before data migration.

We don't wait here https://github.com/yandex/ClickHouse/pull/4935/files#diff-8890e3b1de70b013b79201d37463a0d4R98. findFiles https://github.com/yandex/ClickHouse/blob/master/dbms/src/Storages/Distributed/DirectoryMonitor.cpp#L182 just looks at files in directory at current moment. If some exception will be thrown from findFiles or concurrent insert will happen when SYNC DISTRIBUTED logic of the query will be broken.

zhang2014 · 2019-04-26T10:04:43Z

We don't wait here https://github.com/yandex/ClickHouse/pull/4935/files#diff-8890e3b1de70b013b79201d37463a0d4R98. findFiles https://github.com/yandex/ClickHouse/blob/master/dbms/src/Storages/Distributed/DirectoryMonitor.cpp#L182 just looks at files in directory at current moment.

We will call SYNC DISTRIBUTED after the write is stopped, considering that its more general semantics should ensure that the cluster can reach consistency at the moment of the call, right? for example, When I execute SYNC DISTRIBUTED after INSERT INTO distributed_xxx VALUES(1)(2)(3)(4), then I must be able to get 1234 in the cluster. In fact, I can even directly use SYNC DISTRIBUTED to synchronize the cluster data, just like this:

SYSTEM STOP DISTRIBUTED SENDS;

INSERT INTO distributed_xxx VALUES(1)(2)(3);

SYSTEM SYNC DISTRIBUTED;

INSERT INTO distributed_xxx VALUES(4)(5)(6);
 
SYSTEM SYNC DISTRIBUTED;

If some exception will be thrown from findFiles or concurrent insert will happen when SYNC DISTRIBUTED logic of the query will be broken.

This may not happen in my understanding, as ClickHouse uses a hard link to synchronize blocks of replicas data(https://github.com/yandex/ClickHouse/blob/master/dbms/src/Storages/Distributed/DistributedBlockOutputStream.cpp#L563). at the same time, DirectoryMonitor lock are always acquired when SYNC DISTRIBUTED executed(https://github.com/yandex/ClickHouse/pull/4935/files#diff-8890e3b1de70b013b79201d37463a0d4R97).

alexey-milovidov · 2019-05-09T20:49:55Z

dbms/src/Storages/Distributed/DirectoryMonitor.h


    static ConnectionPoolPtr createPool(const std::string & name, const StorageDistributed & storage);

+    void  syncReplicaSends();


Misleading method name, because it's not about replicas.
(Distributed table may look at shards without replicas at all.)

alexey-milovidov · 2019-05-09T20:52:00Z

dbms/src/Storages/Distributed/DirectoryMonitor.h


    static ConnectionPoolPtr createPool(const std::string & name, const StorageDistributed & storage);

+    void  syncReplicaSends();


Double whitespace.

alexey-milovidov · 2019-05-09T20:53:21Z

dbms/src/Storages/Distributed/DirectoryMonitor.cpp

+            throw Exception("Cancelled sync distributed sends.", ErrorCodes::ABORTED);
+
+        std::unique_lock lock{mutex};
+        findFiles();


The method findFiles must be renamed.

alexey-milovidov · 2019-05-09T20:54:57Z

Do you really need this command? In my opinion, it's much better to use synchronous distributed inserts (the setting insert_distributed_sync = 1)

alexey-milovidov

.

zhang2014 · 2019-05-29T03:01:53Z

Do you really need this command? In my opinion, it's much better to use synchronous distributed inserts (the setting insert_distributed_sync = 1)

@alexey-milovidov

Motivation(from my friend):

Currently their clickhouse cluster is deployed in japan, us and china, and he needs clickhouse to provide a regular ability to synchronize between different nodes.

For now I recommend using the distributed engine replica and distributed_directory_monitor_batch_inserts, but this has some limitations (the synchronization interval cannot be controlled), and this pr is to solve this problem. just like what i said here

zhang2014 · 2019-05-29T03:30:48Z

This is the plan I gave him(translated from google translation):

For example, allow a delay between 20s for different regions:
    1. Configure distributed_directory_monitor_batch_inserts = 1, load_balancing=IN_ORDER (users.xml)
    2. Create distributed, corresponding to the China, Japanese, and United States computer rooms, for example:
    ```
        China computer room configuration:
        <cluster_name_xxx>
            <shared>
                <replica>China</replica>
                <replica>Japan</replica>
                <replica>United States</replica>
            </shared>
        </cluster_name_xxx>
        Japanese computer room configuration:
        <cluster_name_xxx>
            <shared>
                <replica>Japan</replica>
                <replica>China</replica>
                <replica>United States</replica>
            </shared>
        </cluster_name_xxx>
        US computer room configuration:
        <cluster_name_xxx>
            <shared>
                <replica>United States</replica>
                <replica>Japan</replica>
                <replica>China</replica>
            </shared>
        </cluster_name_xxx>
    ```
    3. Create a corresponding distributed, mergetree, for example:
    ```
        CREATE TABLE local(...)ENGINE = MergeTree(...)
        CREATE TABLE distributed(...)ENGINE = Distributed(cluster_name_xxx, default, local)
    ```
    4. The computer room in each area can be written into the distributed table of its own computer room.
    5. If https://github.com/yandex/ClickHouse/pull/4935 is available, you can turn off timing synchronization in each instance of the machine room.
    ```
      SYSTEM STOP DISTRIBUTED SENDS distributed
      --- Simultaneously start timing tasks based on tolerable delays between zones, for example every 20 seconds:
      SYSTEM FLUSH DISTRIBUTED distributed;
    ```
plan description:
    Each machine room can be queried immediately (in milliseconds) for its own computer room. It can be checked after 20 seconds for other computer rooms (regardless of transmission time and cost); About high availability: According to the configuration of the equipment room, select in order Available nodes, for example, in the Chinese computer room (assuming that distributed and local are distributed at different nodes, some nodes in the same room are available). If the replica of the Chinese machine room is available, then the Chinese computer room has priority access to the local table of the Chinese computer room (at the time of inquiry). Once the Chinese computer room is unavailable, visit the local table of the Japanese computer room (at the time of inquiry), and visit the local table of the US computer room once the Chinese computer room and the Japanese computer room are unavailable (at the time of inquiry)

zhang2014 force-pushed the feature/support_system_replicas branch 2 times, most recently from 39006fe to c13b82f Compare April 8, 2019 07:19

zhang2014 changed the title ~~support system replicas queries for distributed~~ Support system replicas queries for distributed Apr 8, 2019

zhang2014 commented Apr 8, 2019

View reviewed changes

KochetovNicolai added can be tested pr-improvement Pull request with some product improvements labels Apr 16, 2019

KochetovNicolai assigned alesapin Apr 16, 2019

alesapin reviewed Apr 18, 2019

View reviewed changes

alexey-milovidov added pr-feature Pull request with new product feature and removed pr-improvement Pull request with some product improvements labels May 9, 2019

alexey-milovidov reviewed May 9, 2019

View reviewed changes

alexey-milovidov requested changes May 9, 2019

View reviewed changes

zhang2014 mentioned this pull request May 13, 2019

fix data race in rename query #5247

Merged

zhang2014 added 3 commits May 29, 2019 10:43

support replicas system queries for distributed

c44f608

fix review

1a33840

Rename system sync distributed to system flush distributed

80788cd

zhang2014 force-pushed the feature/support_system_replicas branch from ce78f90 to 80788cd Compare May 29, 2019 02:44

alexey-milovidov approved these changes Jun 14, 2019

View reviewed changes

Update test.py

2b242f2

alesapin merged commit fd3abbe into ClickHouse:master Jun 17, 2019

BayoNet mentioned this pull request Jul 3, 2019

DOCAPI-7090: SYSTEM DISTRIBUTED queries description. #5848

Merged

BayoNet mentioned this pull request Aug 1, 2019

DOCAPI-7090: SYSTEM DISTRIBUTED docs. EN review. RU translation. #6262

Merged


		static ConnectionPoolPtr createPool(const std::string & name, const StorageDistributed & storage);

		void syncReplicaSends();

Conversation

zhang2014 commented Apr 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhang2014 Apr 8, 2019

Choose a reason for hiding this comment

Uh oh!

alesapin left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alesapin Apr 18, 2019

Choose a reason for hiding this comment

Uh oh!

alesapin Apr 18, 2019

Choose a reason for hiding this comment

Uh oh!

zhang2014 commented Apr 22, 2019

Uh oh!

alesapin commented Apr 26, 2019

Uh oh!

zhang2014 commented Apr 26, 2019

Uh oh!

alexey-milovidov May 9, 2019

Choose a reason for hiding this comment

Uh oh!

alexey-milovidov May 9, 2019

Choose a reason for hiding this comment

Uh oh!

alexey-milovidov May 9, 2019

Choose a reason for hiding this comment

Uh oh!

alexey-milovidov commented May 9, 2019

Uh oh!

alexey-milovidov left a comment

Choose a reason for hiding this comment

Uh oh!

zhang2014 commented May 29, 2019

Motivation(from my friend):

Uh oh!

zhang2014 commented May 29, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zhang2014 commented Apr 8, 2019 •

edited

Loading

alesapin left a comment •

edited

Loading