Skip to content

Retries in distributed queries if a server stopped responding during a query. #58380

@alexey-milovidov

Description

@alexey-milovidov

Use case

A cluster has a dynamically changing number of replicas, and some replicas disappear during a running query.

Describe the solution you'd like

If an internal query hasn't returned any blocks of data yet (the query contains ORDER BY or GROUP BY, so it only starts to return the data near the end of its run time) but the connection was closed, reconnect to another replica and send the query again.

Caveats

The progress bar will be slightly wrong.

In some cases, the network connection hangs rather than being reset. It will be more difficult to make a failover in this case, but it is possible if we lower the socket read/write timeout and drop the connection if we don't have process packets for a certain time. Alternatively, we can send "ping" packets during the query run time. Also, we can lower TCP keep-alive.

Additional context

This is especially useful for parallel replicas.

We can also have this option in clickhouse-client for normal queries.

Metadata

Metadata

Assignees

No one assigned

    Labels

    featurewarmup taskThe task for new ClickHouse team members. Low risk, moderate complexity, no urgency.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions