-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Bug Report
The recently merged 992 introduces a breaking change for ingress controllers running in ingress mode. It enforces all outbound request to have the l5d-dst-override set, which is a unrealistically strict requirement that breaks many ingress traffic pattern supported by ingress controllers.
What is the issue?
In our specific case we are using gloo-edge (gateway) as ingress controller, which supports non Kubernetes upstreams (e.g. AWS Lambda Functions, EC2 Instances, ...) and also communicates with other gloo services without adding the l5d-dst-override header. Note, gloo-edge has a nice linkerd plugin which automatically adds the l5d-dst-override header to all request targeting Kubernetes upstreams. However, it does not add that header for other kind of upstreams (e.g. static upstreams, EC2 upstreams, ...) and it also does not add that header for request between gloo services (e.g. envoy XDS request, which are not part of the ingress traffic but are emitted by the gateway which is meshed).
{"timestamp":"[ 995.334310s]","level":"WARN","fields":{"message":"Failed to proxy request: ingress-mode routing requires the l5d-dst-override header","client.addr":"10.176.9.124:43466"},"target":"linkerd_app_core::errors","spans":[{"orig_dst":"10.176.47.220:9977","name":"ingress"}],"threadId":"ThreadId(3)"}
Impact
Our gloo-edge ingress gateway no longer work, they are completely broken and therefore ingress traffic in our cluster does not work.
Expected Behavior
- Outbound
linkerd-proxyin ingress mode must accept requests which have nol5d-dst-overrideheader set and use thehostresp.authorityheader for routing decision (as fallback). This will support use-cases where- the ingress gateway might forward requests to external workloads (running outside Kubernetes)
- the ingress gateway itself communicates with other services (e.g. envoy talking to XDS endpoints)
- ...
How can it be reproduced?
- Install linkerd
edge-21.5.1with linkerd-proxy imageghcr.io/olix0r/l2-proxy:a01b8bd2. - Use gloo-edge gateway meshed in ingress mode and see how it fails to get ready as it fails to contact the
glooandgloo-gatewayservices.
Logs, error output, etc
{"timestamp":"[ 995.334310s]","level":"WARN","fields":{"message":"Failed to proxy request: ingress-mode routing requires the l5d-dst-override header","client.addr":"10.176.9.124:43466"},"target":"linkerd_app_core::errors","spans":[{"orig_dst":"10.176.47.220:9977","name":"ingress"}],"threadId":"ThreadId(3)"}
linkerd check output
Linkerd core checks
===================
kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API
kubernetes-version
------------------
√ is running the minimum Kubernetes API version
√ is running the minimum kubectl version
linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
√ control plane pods are ready
linkerd-config
--------------
√ control plane Namespace exists
√ control plane ClusterRoles exist
√ control plane ClusterRoleBindings exist
√ control plane ServiceAccounts exist
√ control plane CustomResourceDefinitions exist
√ control plane MutatingWebhookConfigurations exist
√ control plane ValidatingWebhookConfigurations exist
√ control plane PodSecurityPolicies exist
linkerd-identity
----------------
√ certificate config is valid
√ trust anchors are using supported crypto algorithm
√ trust anchors are within their validity period
√ trust anchors are valid for at least 60 days
√ issuer cert is using supported crypto algorithm
√ issuer cert is within its validity period
√ issuer cert is valid for at least 60 days
√ issuer cert is issued by the trust anchor
linkerd-webhooks-and-apisvc-tls
-------------------------------
√ proxy-injector webhook has valid cert
√ proxy-injector cert is valid for at least 60 days
√ sp-validator webhook has valid cert
√ sp-validator cert is valid for at least 60 days
linkerd-version
---------------
√ can determine the latest version
√ cli is up-to-date
control-plane-version
---------------------
√ can retrieve the control plane version
‼ control plane is up-to-date
is running version 21.5.1 but the latest edge version is 21.5.2
see https://linkerd.io/2/checks/#l5d-version-control for hints
‼ control plane and cli versions match
control plane running edge-21.5.1 but cli running edge-21.5.2
see https://linkerd.io/2/checks/#l5d-version-control for hints
Environment
- Kubernetes Version: 1.19.6
- Cluster Environment: EKS
- Host OS: Amazon Linux 2
- Linkerd version: edge-21.5.1 with linkerd-proxy image
ghcr.io/olix0r/l2-proxy:a01b8bd2
Possible solution
Revert PR 992.
Additional context
I stumbled over this while testing a proposed fix for #6146 (comment).