Skip to content

PR #992 breaks ingress traffic  #6157

@alex-berger

Description

@alex-berger

Bug Report

The recently merged 992 introduces a breaking change for ingress controllers running in ingress mode. It enforces all outbound request to have the l5d-dst-override set, which is a unrealistically strict requirement that breaks many ingress traffic pattern supported by ingress controllers.

What is the issue?

In our specific case we are using gloo-edge (gateway) as ingress controller, which supports non Kubernetes upstreams (e.g. AWS Lambda Functions, EC2 Instances, ...) and also communicates with other gloo services without adding the l5d-dst-override header. Note, gloo-edge has a nice linkerd plugin which automatically adds the l5d-dst-override header to all request targeting Kubernetes upstreams. However, it does not add that header for other kind of upstreams (e.g. static upstreams, EC2 upstreams, ...) and it also does not add that header for request between gloo services (e.g. envoy XDS request, which are not part of the ingress traffic but are emitted by the gateway which is meshed).

{"timestamp":"[ 995.334310s]","level":"WARN","fields":{"message":"Failed to proxy request: ingress-mode routing requires the l5d-dst-override header","client.addr":"10.176.9.124:43466"},"target":"linkerd_app_core::errors","spans":[{"orig_dst":"10.176.47.220:9977","name":"ingress"}],"threadId":"ThreadId(3)"}

Impact

Our gloo-edge ingress gateway no longer work, they are completely broken and therefore ingress traffic in our cluster does not work.

Expected Behavior

  • Outbound linkerd-proxy in ingress mode must accept requests which have no l5d-dst-override header set and use the host resp. authority header for routing decision (as fallback). This will support use-cases where
    • the ingress gateway might forward requests to external workloads (running outside Kubernetes)
    • the ingress gateway itself communicates with other services (e.g. envoy talking to XDS endpoints)
    • ...

How can it be reproduced?

  • Install linkerd edge-21.5.1 with linkerd-proxy image ghcr.io/olix0r/l2-proxy:a01b8bd2.
  • Use gloo-edge gateway meshed in ingress mode and see how it fails to get ready as it fails to contact the gloo and gloo-gateway services.

Logs, error output, etc

{"timestamp":"[ 995.334310s]","level":"WARN","fields":{"message":"Failed to proxy request: ingress-mode routing requires the l5d-dst-override header","client.addr":"10.176.9.124:43466"},"target":"linkerd_app_core::errors","spans":[{"orig_dst":"10.176.47.220:9977","name":"ingress"}],"threadId":"ThreadId(3)"}

linkerd check output

Linkerd core checks
===================

kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API

kubernetes-version
------------------
√ is running the minimum Kubernetes API version
√ is running the minimum kubectl version

linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
√ control plane pods are ready

linkerd-config
--------------
√ control plane Namespace exists
√ control plane ClusterRoles exist
√ control plane ClusterRoleBindings exist
√ control plane ServiceAccounts exist
√ control plane CustomResourceDefinitions exist
√ control plane MutatingWebhookConfigurations exist
√ control plane ValidatingWebhookConfigurations exist
√ control plane PodSecurityPolicies exist

linkerd-identity
----------------
√ certificate config is valid
√ trust anchors are using supported crypto algorithm
√ trust anchors are within their validity period
√ trust anchors are valid for at least 60 days
√ issuer cert is using supported crypto algorithm
√ issuer cert is within its validity period
√ issuer cert is valid for at least 60 days
√ issuer cert is issued by the trust anchor

linkerd-webhooks-and-apisvc-tls
-------------------------------
√ proxy-injector webhook has valid cert
√ proxy-injector cert is valid for at least 60 days
√ sp-validator webhook has valid cert
√ sp-validator cert is valid for at least 60 days

linkerd-version
---------------
√ can determine the latest version
√ cli is up-to-date

control-plane-version
---------------------
√ can retrieve the control plane version
‼ control plane is up-to-date
    is running version 21.5.1 but the latest edge version is 21.5.2
    see https://linkerd.io/2/checks/#l5d-version-control for hints
‼ control plane and cli versions match
    control plane running edge-21.5.1 but cli running edge-21.5.2
    see https://linkerd.io/2/checks/#l5d-version-control for hints

Environment

  • Kubernetes Version: 1.19.6
  • Cluster Environment: EKS
  • Host OS: Amazon Linux 2
  • Linkerd version: edge-21.5.1 with linkerd-proxy image ghcr.io/olix0r/l2-proxy:a01b8bd2

Possible solution

Revert PR 992.

Additional context

I stumbled over this while testing a proposed fix for #6146 (comment).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions