Skip to content

Remote clusters installed with istioctl appear to be incompatible in AWS when you use internal load balancers for remote Pilot #19639

@taylorreece

Description

@taylorreece

Bug description

I followed https://istio.io/docs/setup/install/multicluster/shared-vpn/ to install the Istio control plane in one Kubernetes cluster via istioctl, and the "remote" profile istio in a "remote" cluster that is in the same VPC. Both clusters are EKS clusters in AWS. Because Pilot is in the main cluster, and the Pilot pod may scale away out of existance, I use an internal load balancer to sit in front of Pilot, so the remote cluster can reference Pilot, as recommended here https://istio.io/docs/setup/install/multicluster/shared-vpn/#deployment-considerations .

This worked fine when I deployed Istio with "helm template". I would pass in the AWS load balancer endpoint with --set global.remotePilotAddress=internal-abc-123.us-east-1.elb.amazonaws.com or whatever, and config maps would properly yield discoveryAddress: internal-abc-123.us-east-1.elb.amazonaws.com:15011. However, it appears that istioctl hard-codes discoveryAddress: istio-pilot.istio-system:15011, regardless of if values.global.remotePilotCreateSvcEndpoint=true or false, and regardless of if you pass in a --set values.global.remotePilotAddress argument.

Now, that's fine. I can set values.global.remotePilotCreateSvcEndpoint=true for GCP stacks, as the internal load balancer gets an IP address (as opposed to a FQDN). However, because AWS uses FQDN's, when istioctl tries to create a local Pilot service that points to a remote location, it errors out saying:

Component Base install returned the following errors:
=====================================================
Error: error running kubectl: exit status 1

Error detail:


The Endpoints "istio-pilot" is invalid:  (repeated 1 times)
* subsets[0].addresses[0].ip: Invalid value: "internal-ab47790291d0f11eaa068028d79a51a3-1059610236.us-east-1.elb.amazonaws.com": must be a valid IP address, (e.g. 10.9.8.7) (repeated 1 times)
* subsets[0].addresses[0].ip: Invalid value: "internal-ab47790291d0f11eaa068028d79a51a3-1059610236.us-east-1.elb.amazonaws.com": must be a valid IP address (repeated 1 times)

To combat that for now I'm setting

--set values.global.remotePilotAddress=internal-ab47790291d0f11eaa068028d79a51a3-1059610236.us-east-1.elb.amazonaws.com
--set values.global.remotePilotCreateSvcEndpoint=false

and then applying a hackish

apiVersion: v1
kind: Service
metadata:
    name: istio-pilot
    namespace: istio-system
spec:
  type: ExternalName
  externalName: internal-ab47790291d0f11eaa068028d79a51a3-1059610236.us-east-1.elb.amazonaws.com
  sessionAffinity: None
  ports:
  - name: http-xds
    port: 15010
    protocol: TCP
    targetPort: 15010
  - name: https-xds
    port: 15011
    protocol: TCP
    targetPort: 15011

That seems like a bad long-term solution, though. It would be better if we either:

  1. Allowed values.global.remotePilotAddress to be FQDN's in istioctl rather than just IPs
  2. Caused istioctl to template out the injector-mesh config map to template out like the "helm template" config map, resulting in a discoveryAddress that matched values.global.remotePilotAddress rather than a hard-coded local service.

Thanks for your help!

Expected behavior
I would expect discoveryAddress to be the LoadBalancer's endpoint, as opposed to the istio-pilot.istio-system service that might not exist in the remote cluster.

Steps to reproduce the bug
Create a remote cluster via

istioctl manifest apply \
    --set profile=remote \
    --set autoInjection.enabled=true \
    --set values.global.remotePilotCreateSvcEndpoint=false \
    --set values.global.remotePilotAddress=internal-ab47790291d0f11eaa068028d79a51a3-1059610236.us-east-1.elb.amazonaws.com \
    --set gateways.enabled=false \
    --set values.global.mtls.enabled=true \
    --set values.security.selfSigned=false \
    --set values.global.controlPlaneSecurityEnabled=true

To generate a cluster that hard-codes discoveryAddress: istio-pilot.istio-system:15011 as the discovery address, despite no local istio-pilot service being available. If you set values.global.remotePilotCreateSvcEndpoint=true above, you'll get the error listed in the bug description about IPs vs FQDNs.

Version:

$ istioctl version --remote
client version: 1.4.2
control plane version: 1.4.2

How was Istio installed?: Via istioctl (see above)

Environment where bug was observed (cloud vendor, OS, etc)
AWS

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions