Skip to content

Cluster deployment is slow (0.25.2) #1782

@vitalik-petrov

Description

@vitalik-petrov

Operator version 0.25.2 - Cluster deployment takes too much time
Clickhouse version

When deploying a ClickHouse cluster with three replicas I observe the following in the operator logs:

The host 0-0 is outside of the cluster
...
TIMEOUT reached

The host 0-1 is outside of the cluster
...
TIMEOUT reached

The host 0-2 is outside of the cluster
...
TIMEOUT reached

After these timeouts, the deployment proceeds normally.

I checked the source code to see exactly which query is used to verify a host’s membership in the cluster. In version 0.25.2:

func (s *ClusterSchemer) sqlHostInCluster(cluster string) string {
    return heredoc.Docf(`
        SELECT
            count()
        FROM
            system.clusters
        WHERE
            cluster='%s' AND is_local
        `,
        cluster,
    )
}

The problem in my case is that the query

SELECT count()
FROM system.clusters
WHERE cluster = 'dwh' AND is_local

returns 2

And the query

SELECT *
FROM system.clusters
WHERE cluster = 'dwh' AND is_local = 1;

returns:

chi-jdp-dwh-0-0,127.0.0.1
chi-jdp-dwh-0-0-0.chi-jdp-dwh-0-0.jdp-dev-01.svc.cluster.local,10.42.68.164

It seems that the second entry comes from /etc/hosts:

# Kubernetes-managed hosts file.
127.0.0.1  localhost
::1  localhost ip6-localhost ip6-loopback
fe00::0  ip6-localnet
fe00::0  ip6-mcastprefix
fe00::1  ip6-allnodes
fe00::2  ip6-allrouters
10.42.68.164  chi-jdp-dwh-0-0-0.chi-jdp-dwh-0-0.jdp-dev-01.svc.cluster.local  chi-jdp-dwh-0-0-0

# Entries added by HostAliases.
127.0.0.1  chi-jdp-dwh-0-0

And without HostAliases, a ClickHouse pod cannot be created because of this code in the operator:

// stsSetupHostAliases
func (c *Creator) stsSetupHostAliases(statefulSet *apps.StatefulSet, host *api.Host) {
    // Ensure pod created by this StatefulSet has alias 127.0.0.1
    statefulSet.Spec.Template.Spec.HostAliases = []core.HostAlias{
        {
            IP: "127.0.0.1",
            Hostnames: []string{
                c.nm.Name(interfaces.NamePodHostname, host),
            },
        },
    }
    // Add hostAliases from PodTemplate if any
    if podTemplate, ok := host.GetPodTemplate(); ok {
        statefulSet.Spec.Template.Spec.HostAliases = append(
            statefulSet.Spec.Template.Spec.HostAliases,
            podTemplate.Spec.HostAliases...,
        )
    }
}

Looks like a bug, because I have quite basic ClickHouseInstallation manifest and it takes ~15-20 minutes to deploy.
If there is a possible workaround, please let me know.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions