Skip to content

Inconsistent behaviour of Distributed table engine and remote table function when called with cluster and with list of addresses. #4211

@code-of-kpp

Description

@code-of-kpp

Here is the relevant part of the config:

    <remote_servers>                                                                                                                                                              
        <bad>                                                                                                                     
            <shard>                                                                                                               
                <replica>                                                                                                         
                    <host>127.0.0.2</host>                                                                                        
                    <port>2</port>                                                                                                
                </replica>                                                                                                        
            </shard>                                                                                                              
        </bad> 
     </remote_servers>   

(I also turned off listening on 0.0.0.0 and ::)

So we have the "bad" cluster:

:) SELECT * FROM system.clusters

SELECT *
FROM system.clusters


┌─cluster─┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name─┬─host_address─┬─port─┬─is_local─┬─user────┬─default_database─┐
│ bad     │         111127.0.0.2127.0.0.220 │ default │                  │
└─────────┴───────────┴──────────────┴─────────────┴───────────┴──────────────┴──────┴──────────┴─────────┴──────────────────┘

1 rows in set. Elapsed: 8.030 sec. 

With default settings Clickhouse will try accessing this cluster 3 times:

:) SELECT hostName() FROM remote("bad", default.tmp)

SELECT hostName()
FROM remote(bad, default.tmp) 

Received exception from server (version 19.1.6):
Code: 279. DB::Exception: Received from click:9000, 172.19.0.18. DB::NetException. DB::NetException: All connection tries failed. Log: 

Code: 210, e.displayText() = DB::NetException: Connection refused (127.0.0.2:2), e.what() = DB::NetException
Code: 210, e.displayText() = DB::NetException: Connection refused (127.0.0.2:2), e.what() = DB::NetException
Code: 210, e.displayText() = DB::NetException: Connection refused (127.0.0.2:2), e.what() = DB::NetException

. 

0 rows in set. Elapsed: 0.003 sec.

Same thing happens if we will specify address(es) manually:

:) SELECT hostName() FROM remote('127.0.0.2', default.tmp)

SELECT hostName()
FROM remote('127.0.0.2', default.tmp) 

Received exception from server (version 19.1.6):
Code: 279. DB::Exception: Received from click:9000, 172.19.0.18. DB::NetException. DB::NetException: All connection tries failed. Log: 

Code: 210, e.displayText() = DB::NetException: Connection refused (127.0.0.2:9000), e.what() = DB::NetException
Code: 210, e.displayText() = DB::NetException: Connection refused (127.0.0.2:9000), e.what() = DB::NetException
Code: 210, e.displayText() = DB::NetException: Connection refused (127.0.0.2:9000), e.what() = DB::NetException

. 

0 rows in set. Elapsed: 0.003 sec.

Let's change this setting:

:) SET connections_with_failover_max_tries = 5

SET connections_with_failover_max_tries = 5

Ok.

0 rows in set. Elapsed: 0.001 sec.

So now, in previous example we will have 5 attempts:

:) SELECT hostName() FROM remote('127.0.0.2', default.tmp)

SELECT hostName()
FROM remote('127.0.0.2', default.tmp) 

Received exception from server (version 19.1.6):
Code: 279. DB::Exception: Received from click:9000, 172.19.0.18. DB::NetException. DB::NetException: All connection tries failed. Log: 

Code: 210, e.displayText() = DB::NetException: Connection refused (127.0.0.2:9000), e.what() = DB::NetException
Code: 210, e.displayText() = DB::NetException: Connection refused (127.0.0.2:9000), e.what() = DB::NetException
Code: 210, e.displayText() = DB::NetException: Connection refused (127.0.0.2:9000), e.what() = DB::NetException
Code: 210, e.displayText() = DB::NetException: Connection refused (127.0.0.2:9000), e.what() = DB::NetException
Code: 210, e.displayText() = DB::NetException: Connection refused (127.0.0.2:9000), e.what() = DB::NetException

. 

0 rows in set. Elapsed: 0.003 sec.

(note 5 Code: 210 lines)

But with cluster these (and some other) settings are ignored:

:) SELECT hostName() FROM remote("bad", default.tmp)

SELECT hostName()
FROM remote(bad, default.tmp) 

Received exception from server (version 19.1.6):
Code: 279. DB::Exception: Received from click:9000, 172.19.0.18. DB::NetException. DB::NetException: All connection tries failed. Log: 

Code: 210, e.displayText() = DB::NetException: Connection refused (127.0.0.2:2), e.what() = DB::NetException
Code: 210, e.displayText() = DB::NetException: Connection refused (127.0.0.2:2), e.what() = DB::NetException
Code: 210, e.displayText() = DB::NetException: Connection refused (127.0.0.2:2), e.what() = DB::NetException

. 

0 rows in set. Elapsed: 0.003 sec.

Specifying settings in query doesn't help:

:) SELECT hostName() FROM remote("bad", default.tmp) SETTINGS connections_with_failover_max_tries = 5

SELECT hostName()
FROM remote(bad, default.tmp) 
SETTINGS connections_with_failover_max_tries = 5

Received exception from server (version 19.1.6):
Code: 279. DB::Exception: Received from click:9000, 172.19.0.18. DB::NetException. DB::NetException: All connection tries failed. Log: 

Code: 210, e.displayText() = DB::NetException: Connection refused (127.0.0.2:2), e.what() = DB::NetException
Code: 210, e.displayText() = DB::NetException: Connection refused (127.0.0.2:2), e.what() = DB::NetException
Code: 210, e.displayText() = DB::NetException: Connection refused (127.0.0.2:2), e.what() = DB::NetException

. 

0 rows in set. Elapsed: 0.003 sec. 

This is reproducible in at least 5 or 6 latest -stable versions, something similar was first noticed in November 2017.
Consequences of this is that ENGINE=Distributed tables do not work and cannot be forced to work across multiple data centers in different parts of the world.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugConfirmed user-visible misbehaviour in official release

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions