Skip to content

Upgrading sometimes requires restarting linkerd-destination #6940

@alpeb

Description

@alpeb

The policy controller currently doesn't detect changes in the certificate used for serving the validation webhook (see seanmonstar/warp#659).
This means that when linkerd was installed with webhookFailurePolicy=Fail, an upgrade that doesn't cause changes in the linkerd-destination manifest (and so doesn't cause it to be restarted) will change that cert but the policy controller will still have loaded the old one, making any attempt to persist a Server to fail.

Repro (tested with edge-21.9.3):

$ linkerd install --set webhookFailurePolicy=Fail | k apply -f -
$ linkerd upgrade | k apply -f -
$ k create ns emojivoto
$ cat << EOF | k apply -f -
apiVersion: policy.linkerd.io/v1alpha1
kind: Server
metadata:
  namespace: emojivoto
  name: emoji-grpc
spec:
  podSelector:
    matchLabels:
      app: emoji-svc
  port: grpc
  proxyProtocol: gRPC
EOF

Error from server (InternalError): error when creating "emoji-policy.yml": Internal error occurred: failed calling webhook "linkerd-policy-validator.linkerd.io": Post "https://linkerd-policy-validator.linkerd.svc:443/?timeout=10s": x509: certificate signed by unknown authority (possibly because of "x509: invalid signature: parent certificate cannot sign this kind of certificate" while trying to verify candidate authority certificate "linkerd-policy-validator.linkerd.svc")

Possible solutions:

  • We could add a checksum/config annotation that ties restarts to the destination pod to changes to the destination-rbac.yaml file (after variables replacement) just like the injector does
  • Have the controller watch the cert directory and trigger a restart when changed

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions