Skip to content

Fix excessive connect attempts with skip_unavailable_shards#26658

Merged
alexey-milovidov merged 1 commit intoClickHouse:masterfrom
azat:skip_unavailable_shards-excessive-attempts
Jul 21, 2021
Merged

Fix excessive connect attempts with skip_unavailable_shards#26658
alexey-milovidov merged 1 commit intoClickHouse:masterfrom
azat:skip_unavailable_shards-excessive-attempts

Conversation

@azat
Copy link
Copy Markdown
Member

@azat azat commented Jul 21, 2021

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Fix excessive (x2) connect attempts with skip_unavailable_shards

Detailed description / Documentation draft:
Before this patch the query was sent from
RemoteBlockInputStream::readPrefix() and also from
RemoteBlockInputStream::read().

And since in case of skip_unavailable_shards=1 connection errors are
ignored, it tries to do x2 connect attempts.

Fix this, but removing RemoteBlockInputStream::readPrefix().

Fixes: #26511

P.S. marked as Improvement by intention, since this is not the common case and to avoid backports

Before this patch the query was sent from
RemoteBlockInputStream::readPrefix() and also from
RemoteBlockInputStream::read().

And since in case of skip_unavailable_shards=1 connection errors are
ignored, it tries to do x2 connect attempts.

Fix this, but removing RemoteBlockInputStream::readPrefix().

Fixes: ClickHouse#26511
@robot-clickhouse robot-clickhouse added the pr-improvement Pull request with some product improvements label Jul 21, 2021
@alexey-milovidov alexey-milovidov self-assigned this Jul 21, 2021
@alexey-milovidov
Copy link
Copy Markdown
Member

01149_zookeeper_mutation_stuck_after_replace_partition - flaky.

@alexey-milovidov alexey-milovidov merged commit f958189 into ClickHouse:master Jul 21, 2021
@azat azat deleted the skip_unavailable_shards-excessive-attempts branch July 23, 2021 19:27
@den-crane
Copy link
Copy Markdown
Contributor

den-crane commented Apr 12, 2023

It seems the test is incorrect.
If one shard is available then Clickhouse makes *2 attempts to a bad shard

SELECT *
FROM remote('localhost,255.255.255.255', system.one)
SETTINGS connections_with_failover_max_tries = 1, skip_unavailable_shards = 1, send_logs_level = 'trace'

Connection failed at try №1, reason: Code: 210. DB::NetException: Net Exception: Network is unreachable (255.255.255.255:9000). (NETWORK_ERROR) (version 23.3.1.2823 (official build))

Connection failed at try №1, reason: Code: 210. DB::NetException: Net Exception: Network is unreachable (255.255.255.255:9000). (NETWORK_ERROR) (version 23.3.1.2823 (official build))

The test passes because there is only shard and it's unavailable.

SELECT *
FROM remote('255.255.255.255', system.one)
SETTINGS connections_with_failover_max_tries = 1, skip_unavailable_shards = 1, send_logs_level = 'trace'

Connection failed at try №1, reason: Code: 210. DB::NetException: Net Exception: Network is unreachable (255.255.255.255:9000). (NETWORK_ERROR) (version 23.3.1.2823 (official build))

@den-crane
Copy link
Copy Markdown
Contributor

den-crane commented Apr 12, 2023

SELECT *
FROM remote('10', system.one)
SETTINGS connections_with_failover_max_tries = 1, 
skip_unavailable_shards = 1, 
connect_timeout_with_failover_ms = 2000

Elapsed: 2.023 sec.
SELECT *
FROM remote('10,localhost', system.one)
SETTINGS connections_with_failover_max_tries = 1, 
skip_unavailable_shards = 1, 
connect_timeout_with_failover_ms = 2000

Elapsed: 4.042 sec.

azat added a commit to azat/ClickHouse that referenced this pull request Apr 14, 2023
In ClickHouse#26658 only one excessive connect had been removed, but due to
RemoteQueryExecutor will execute query again in read(), there is one
more, fix this by correctly mark connection in sendQuery()

But note, that there still will be excessive connections due to separate
connections for obtaining structure.

Follow-up for: ClickHouse#26658
Fixes: ClickHouse#48728
Signed-off-by: Azat Khuzhin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-improvement Pull request with some product improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

skip_unavailable_shards=1 makes 6 tries if a shard is unavailable

4 participants