-
Notifications
You must be signed in to change notification settings - Fork 539
Cluster deployment is slow (0.25.2) #1782
Description
Operator version 0.25.2 - Cluster deployment takes too much time
Clickhouse version
When deploying a ClickHouse cluster with three replicas I observe the following in the operator logs:
The host 0-0 is outside of the cluster
...
TIMEOUT reached
The host 0-1 is outside of the cluster
...
TIMEOUT reached
The host 0-2 is outside of the cluster
...
TIMEOUT reached
After these timeouts, the deployment proceeds normally.
I checked the source code to see exactly which query is used to verify a host’s membership in the cluster. In version 0.25.2:
func (s *ClusterSchemer) sqlHostInCluster(cluster string) string {
return heredoc.Docf(`
SELECT
count()
FROM
system.clusters
WHERE
cluster='%s' AND is_local
`,
cluster,
)
}The problem in my case is that the query
SELECT count()
FROM system.clusters
WHERE cluster = 'dwh' AND is_localreturns 2
And the query
SELECT *
FROM system.clusters
WHERE cluster = 'dwh' AND is_local = 1;returns:
chi-jdp-dwh-0-0,127.0.0.1
chi-jdp-dwh-0-0-0.chi-jdp-dwh-0-0.jdp-dev-01.svc.cluster.local,10.42.68.164
It seems that the second entry comes from /etc/hosts:
# Kubernetes-managed hosts file.
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
fe00::0 ip6-mcastprefix
fe00::1 ip6-allnodes
fe00::2 ip6-allrouters
10.42.68.164 chi-jdp-dwh-0-0-0.chi-jdp-dwh-0-0.jdp-dev-01.svc.cluster.local chi-jdp-dwh-0-0-0
# Entries added by HostAliases.
127.0.0.1 chi-jdp-dwh-0-0
And without HostAliases, a ClickHouse pod cannot be created because of this code in the operator:
// stsSetupHostAliases
func (c *Creator) stsSetupHostAliases(statefulSet *apps.StatefulSet, host *api.Host) {
// Ensure pod created by this StatefulSet has alias 127.0.0.1
statefulSet.Spec.Template.Spec.HostAliases = []core.HostAlias{
{
IP: "127.0.0.1",
Hostnames: []string{
c.nm.Name(interfaces.NamePodHostname, host),
},
},
}
// Add hostAliases from PodTemplate if any
if podTemplate, ok := host.GetPodTemplate(); ok {
statefulSet.Spec.Template.Spec.HostAliases = append(
statefulSet.Spec.Template.Spec.HostAliases,
podTemplate.Spec.HostAliases...,
)
}
}Looks like a bug, because I have quite basic ClickHouseInstallation manifest and it takes ~15-20 minutes to deploy.
If there is a possible workaround, please let me know.