Skip to content

Install MWC and VWC During Control Plane Installation#2796

Closed
ihcsim wants to merge 11 commits intomasterfrom
isim/install-webhook-config
Closed

Install MWC and VWC During Control Plane Installation#2796
ihcsim wants to merge 11 commits intomasterfrom
isim/install-webhook-config

Conversation

@ihcsim
Copy link
Contributor

@ihcsim ihcsim commented May 6, 2019

This PR introduces changes to install the proxy injector and profile validator MWC and VWC during the control plane installation.

Highlights

  1. Introduce the chart to install a new Secret named linkerd-ca
    1. In multi-stage install, this is added to the control-plane stage
  2. Change the proxy-injector and sp-validator to:
    1. Mount the linkerd-ca as a read-only volume
    2. Use the CA cert.pem and key.pem to generate a new x509 TLS cert (via the pkg/tls library)
  3. Preserve the cert.pem and key.pem of the linkerd-ca secret during upgrade (similar mechanism as linkerd-config)
  4. Introduce a new function to parse CA pem-encoded cert and key, converting the bytes into tls.CA.

This PR doesn't affect the existing identity mTLS.

There are no CLI flags for users to override the CA trust with custom cert and private key.

Test Cases

  1. CLI inject, CLI manual inject and auto-inject
  2. Profile validation
  3. Upgrade from previous older versions which don't have the new CA secret
  4. In HA mode with multiple proxy injector and profile validator replicas
  5. Multi-stage install

ToDo

  1. Not sure about the CA renewal workflow. Currently, the CA is created during the control plane installation only. CA renewal with install --ignore-cluster won't work for as it will affect the identity TLS assets.
    1. Once the CA is renewed, the proxy injector and profile validator must be restarted to pick up the new CA assets.
  2. Fix the upgrade --from-manifests integration tests.

Fixes #2176

@l5d-bot
Copy link
Collaborator

l5d-bot commented May 6, 2019

@ihcsim ihcsim force-pushed the isim/install-webhook-config branch from 2ee6a45 to 6ef3eea Compare May 6, 2019 20:17
@ihcsim ihcsim force-pushed the isim/install-webhook-config branch from 6ef3eea to c56df00 Compare May 6, 2019 20:35
@l5d-bot
Copy link
Collaborator

l5d-bot commented May 6, 2019

Copy link
Member

@alpeb alpeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good so far, I haven't got into the meaty stuff yet, but here are some minor nits/tiolis ;-)

Copy link
Member

@alpeb alpeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes required minimal impact to the existing webhooks framework, which is awesome 👍

Regarding the certificate renewal, IIUC, the identity trust root suffers from the same problem, as both use tls.GenerateRootCAWithDefaults() which defaults to a year. Something we might wanna tackle simultaneously I guess.

You also need to add to the to-do list the removal of the code for generating the old configs dynamically. From what I see both hooks are still doing that, although from my testing I couldn't see that the webhook configs set during install were being overwritten when the webhooks started.

I also tested HA for the profile injector, using 2 replicas. For some reason only one of the pods gets used. I thought it could be the last injector being started overwriting the config and so grabbing all the work for itself, but like I said above I couldn't see the install config being overwritten...

Copy link
Member

@siggy siggy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

glad to see this moving. i have some questions around multi-stage, and also how this relates to the existing identity system. also, i think @olix0r is best equipped to evaluate the install/upgrade lifecycle of the new certs/keys.

defaultIdentityIssuanceLifetime = 24 * time.Hour
defaultIdentityClockSkewAllowance = 20 * time.Second

caCommonName = "ca.linkerd.cluster.local"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious about this name, should it be namespace-specific? also is there a reason this is different from most *.svc.cluster.local names, or is this convention?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I didn't put much thought into this name. LMK what's most appropriate. I basically follow the identity issuer's name convention; i.e. without the .svc (which makes sense, since there isn't a Service with this name).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think following the identity issuer's name makes sense. Note that the name varies by namespace.


Identity *installIdentityValues

CATrust *caTrustValues
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm a little out of context, can you explain the motivation for separate KeyPEM/CertPem values for the MWC/VWC and the identify system values in installIdentityValues? or, taking this a step further, re-using the linkerd-identity-issuer secret?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my brief chat with @olix0r, we prefer to keep them separated, because they exist for different purposes. installIdentityValues, as we all know, is used for data plane mTLS and consist of fields like issuance life time, clock skew allowances etc. that aren't relevant to the CA. The CA is used to sign certs that will be used by webhooks and taps to authenticate with the k8s API Server.

Signed-off-by: Ivan Sim <[email protected]>
@ihcsim
Copy link
Contributor Author

ihcsim commented May 7, 2019

For some reason only one of the pods gets used.

That's surprising, because the Service in front of the proxy injector replicas should load balance the traffic between the replicas. I'll need to do some more testing on it.

You also need to add to the to-do list the removal of the code for generating the old configs dynamically.

Yeah, good point. I am kinda hesitant at first, because in the past, we did have users encountering outdated MWC after upgrading the version (before linkerd upgrade was introduced). With the introduction of linkerd upgrade, I think we can remove this code as you suggested.

Signed-off-by: Ivan Sim <[email protected]>
@l5d-bot
Copy link
Collaborator

l5d-bot commented May 7, 2019

Signed-off-by: Ivan Sim <[email protected]>
@alpeb
Copy link
Member

alpeb commented May 7, 2019

Re HA, for some reason k8s continues to use only one webhook, but if you kill it, the other one starts receiving requests, so this should be fine 👍

@l5d-bot
Copy link
Collaborator

l5d-bot commented May 7, 2019

@ihcsim
Copy link
Contributor Author

ihcsim commented May 8, 2019

@alpeb Yeah, it's very strange that the k8s api server seems to only send requests to one replica. AFACIT, the Service is the common ClusterIP type.

If anything, it's good that even when some of the replicas entered into a crash loop, auto-inject still works.

k -n linkerd get po -l linkerd.io/control-plane-component=proxy-injector
NAME                                      READY   STATUS             RESTARTS   AGE
linkerd-proxy-injector-5f499c5bff-km9nk   1/2     CrashLoopBackOff   6          6m53s
linkerd-proxy-injector-5f499c5bff-pjjxn   1/2     CrashLoopBackOff   6          8m26s
linkerd-proxy-injector-5f964c49c4-5kq6f   2/2     Running            0          53m


$ k -n linkerd logs linkerd-proxy-injector-5f964c49c4-5kq6f proxy-injector   
...
time="2019-05-08T03:50:44Z" level=info msg="received pod/nginx-24"
time="2019-05-08T03:50:44Z" level=info msg="patch generated for: pod/nginx-24"
time="2019-05-08T03:50:44Z" level=info msg="received admission review request 74b7336b-7144-11e9-ab03-ac2175b64ffe"
time="2019-05-08T03:50:44Z" level=info msg="received pod/nginx-23"
time="2019-05-08T03:50:44Z" level=info msg="patch generated for: pod/nginx-23"
time="2019-05-08T03:50:44Z" level=info msg="received admission review request 74b9e11e-7144-11e9-ab03-ac2175b64ffe"
time="2019-05-08T03:50:44Z" level=info msg="received pod/nginx-18"
time="2019-05-08T03:50:44Z" level=info msg="patch generated for: pod/nginx-18"
time="2019-05-08T03:50:44Z" level=info msg="received pod/nginx-11"
time="2019-05-08T03:50:44Z" level=info msg="patch generated for: pod/nginx-11"
time="2019-05-08T03:50:44Z" level=info msg="received pod/nginx-25"
time="2019-05-08T03:50:44Z" level=info msg="patch generated for: pod/nginx-25"
time="2019-05-08T03:50:44Z" level=info msg="received admission review request 74b96859-7144-11e9-ab03-ac2175b64ffe"
time="2019-05-08T03:50:44Z" level=info msg="received pod/nginx-19"
time="2019-05-08T03:50:44Z" level=info msg="patch generated for: pod/nginx-19"
time="2019-05-08T03:50:44Z" level=info msg="patch generated for: pod/nginx-5"
time="2019-05-08T03:50:44Z" level=info msg="received pod/nginx-9"
time="2019-05-08T03:50:44Z" level=info msg="patch generated for: pod/nginx-9"
time="2019-05-08T03:50:44Z" level=info msg="received pod/nginx-21"
time="2019-05-08T03:50:44Z" level=info msg="patch generated for: pod/nginx-21"
time="2019-05-08T03:50:44Z" level=info msg="received pod/nginx-16"
time="2019-05-08T03:50:44Z" level=info msg="patch generated for: pod/nginx-16"

@l5d-bot
Copy link
Collaborator

l5d-bot commented May 8, 2019

Copy link
Member

@siggy siggy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few more comments.

i hit an issue where the proxy-injector was not injecting proxies. this was on a 2-stage install, but not sure if that's related:

$ bin/linkerd install config | kubectl apply -f -
$ bin/linkerd install control-plane | kubectl apply -f -
$ bin/linkerd check
$ curl https://run.linkerd.io/emojivoto.yml | bin/linkerd inject - | kubectl apply -f -

...

$ bin/linkerd logs --control-plane-component proxy-injector
+ linkerd linkerd-proxy-injector-56459d9d47-6q4r8 › proxy-injector
+ linkerd linkerd-proxy-injector-56459d9d47-6q4r8 › linkerd-proxy
linkerd linkerd-proxy-injector-56459d9d47-6q4r8 proxy-injector time="2019-05-08T10:52:44Z" level=info msg="running version git-82225e97"
linkerd linkerd-proxy-injector-56459d9d47-6q4r8 proxy-injector time="2019-05-08T10:52:44Z" level=info msg="waiting for caches to sync"
linkerd linkerd-proxy-injector-56459d9d47-6q4r8 proxy-injector time="2019-05-08T10:52:45Z" level=info msg="caches synced"
linkerd linkerd-proxy-injector-56459d9d47-6q4r8 proxy-injector time="2019-05-08T10:52:45Z" level=info msg="listening at :8443"
linkerd linkerd-proxy-injector-56459d9d47-6q4r8 proxy-injector time="2019-05-08T10:52:45Z" level=info msg="starting admin server on :9995"
linkerd linkerd-proxy-injector-56459d9d47-6q4r8 proxy-injector 2019/05/08 10:55:11 http: TLS handshake error from 127.0.0.1:43100: remote error: tls: bad certificate
linkerd linkerd-proxy-injector-56459d9d47-6q4r8 proxy-injector 2019/05/08 10:55:12 http: TLS handshake error from 127.0.0.1:43104: remote error: tls: bad certificate
linkerd linkerd-proxy-injector-56459d9d47-6q4r8 proxy-injector 2019/05/08 10:55:12 http: TLS handshake error from 127.0.0.1:43116: remote error: tls: bad certificate
linkerd linkerd-proxy-injector-56459d9d47-6q4r8 proxy-injector 2019/05/08 10:55:13 http: TLS handshake error from 127.0.0.1:43130: remote error: tls: bad certificate

apiVersion: admissionregistration.k8s.io/v1beta1
kind: MutatingWebhookConfiguration
metadata:
name: linkerd-proxy-injector-webhook-config
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to name this (and analogous sp-validator) to include the namespace: linkerd-{{.Namespace}}-proxy-injector-webhook-config

we follow this pattern with ClusterRole[Binding]s because they're global resources, it conveys which control plane they are associated with.

or, does it become a problem to have multiple webhooks installed, particularly proxy injectors?

name: linkerd-proxy-injector
namespace: {{.Namespace}}
---
apiVersion: admissionregistration.k8s.io/v1beta1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not necessarily for this PR, but now that MWC/VWC are part of linkerd install config, we should probably update linkerd check config to validate successful install.

@ihcsim
Copy link
Contributor Author

ihcsim commented May 8, 2019

linkerd linkerd-proxy-injector-56459d9d47-6q4r8 proxy-injector 2019/05/08 10:55:11 http: TLS handshake error from 127.0.0.1:43100: remote error: tls: bad certificate
linkerd linkerd-proxy-injector-56459d9d47-6q4r8 proxy-injector 2019/05/08 10:55:12 http: TLS handshake error from 127.0.0.1:43104: remote error: tls: bad certificate
linkerd linkerd-proxy-injector-56459d9d47-6q4r8 proxy-injector 2019/05/08 10:55:12 http: TLS handshake error from 127.0.0.1:43116: remote error: tls: bad certificate
linkerd linkerd-proxy-injector-56459d9d47-6q4r8 proxy-injector 2019/05/08 10:55:13 http: TLS handshake error from 127.0.0.1:43130: remote error: tls: bad certificate

That's a regression from the change to move the MWC and VWC to the config stage. The CA cert and key are generated during both stages. Prior to the change, this has no effects on the config stage, because all the CA assets are only injected in the same control-plane stage. Will fix.

This ensures that CA secret is in-sync with the MWC and VWC trust anchors.
Since the MWC and VWC are cluster-scoped, they must be created in the 'config'
stage. Moving the CA secret to the 'config' stage is simpler than trying
to sync it with the MWC and VWC trust anchors.

Signed-off-by: Ivan Sim <[email protected]>
@ihcsim
Copy link
Contributor Author

ihcsim commented May 8, 2019

@ver @adleong To recap, this PR creates a new linkerd-ca secret during install. It has two fields, which correspond to the CA cert and private key. These are used by the webhooks to self-sign its leaf certificate. The CA cert is also duplicated in the MWC's and VWC's caBundle, which the k8s api server uses to verify the self-signed certs served by the webhooks. The concern that @siggy raised was around the duplicate CA cert in two resources.

@ver IIUC, you are saying that we don't need to persist the CA in the secret. During install and upgrade, we just (re-)create the CA cert, private key and self-signed certs. The implication is that the MWC and VWC must be re-created during upgrade, to use the new CA assets. Is that the high level idea?

@adleong
Copy link
Member

adleong commented May 8, 2019

Thanks for the recap, @ihcsim. My understanding is that on startup, each instance of the webhook needs to generate a signed certificate. To do this, it needs the CA certificate and CA key. This can be achieved by storing the CA cert and key in a secret (or by storing the CA key in a secret and providing the CA cert in an env var).

If, as @olix0r suggests, the CA key is thrown away after install, how does a new instance of the webhook get a signed certificate? Or are you suggesting that the webhook certificate should be generated and signed during install and all webhook replicas should share that cert?

@olix0r
Copy link
Member

olix0r commented May 8, 2019

I guess my point is that we don't need a CA at all... we only need a self-signed certificate. If we want to generate a root CA to generate the leaf certificates, then I don't see any value in retaining the root certificate.

During upgrade, we can just read the self-signed cert & key, as we do for the issuer certs. If you ever want to create new self-signed certs for these pods, we just generate a new key.

Maybe I'm getting hung up on use of the term "CA" -- are we just generating a self-signed cert? or are we generating a root CA that then signs leaf certificates? If the former, then I don't think there's anything to do but to store the keypair. If the latter, there's no need to store the root CA key; but also there's no real need to have the level of indirection, imo.

@olix0r
Copy link
Member

olix0r commented May 8, 2019

After a little thought, my suggestion is that we generate a key & self-signed cert for each webhook. If all of the webhooks share a single CA and have to access the root key, then that is effectively equivalent to having them all share a keypair (trust-wise). I don't think the level of indirection that a shared CA provides grants us any security benefits.

@adleong
Copy link
Member

adleong commented May 8, 2019

My assumption was that each webhook replica would have a distinct leaf keypair, but after talking with you, I see there's not really any value in that.

So I think where we're landing with this is that at install, we should generate a (self-signed) certificate and key at startup, put the certificate in the webhook config as the "CA", and distribute the certificate and key to the webhook instances.

@l5d-bot
Copy link
Collaborator

l5d-bot commented May 9, 2019

@ihcsim
Copy link
Contributor Author

ihcsim commented May 9, 2019

Superseded by #2806.

@ihcsim ihcsim closed this May 9, 2019
@ihcsim ihcsim deleted the isim/install-webhook-config branch May 21, 2019 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add MutatingWebhookConfiguration objects to install YAML

6 participants