Skip to content

Conversation

@Argannor
Copy link
Contributor

@Argannor Argannor commented Apr 3, 2024

Description of your changes

When running crossplane (and providers/functions) in the host network the need to configure ports arises.

This MR has two parts:

  1. Make crossplane ports (service/webhook, health probe, metrics) configurable
  2. Allow DeploymentRuntimeConfig to override the default ports (metrics, webhook, grpc) in both Deployment and Service.

Additional merge requests would be necessary to

  • Adjust the documentation
  • Adjust the provider and function templates (to introduce configuration options for their ports)
  • Providers / Functions (to introduce configuration options for their ports)

Putting this on draft to gather your feedback. Please let me know what you think about it and how to proceed with the follow ups.

Contributes to #5520

Crossplane Ports

The following configuration options (flags / environment variables) where added, with their defaults set to the current values:

  • --webhook-port=9443
  • --metrics-bind-address=:8080
  • --health-probe-bind-address=:8081

Additionally these options were added to the values of the helm charts defaulting to unset:

  • webhooks.port
  • metrics.port
  • readiness.port

Note that for the configuration options I chose to stick as close as possible to the underlying API for increased flexibility (thus the bind-address choice), while the helm chart is more oriented towards a consistent user experience (thus ports for all 3 options). Let me know what you think about it!

Provider, Functions, Service Ports

The core idea is to allow the user to specify the ports in the DeploymentRuntimeConfig (DRC), by setting spec.deploymentTemplate.spec.template.spec.containers[0].ports and passing args to the container to tell the provider which ports to bind to. An example DRC can be found at https://github.com/jbw976/crossplane/blob/configurable-ports-review/provider-k8s.yaml.

Here's a brief summary of the changes:

  • DeploymentWithOptionalPodScrapeAnnotations uses the port number of a given metrics port if set in the DRC deployment template.
  • DeploymentRuntimeWithAdditionalPorts only adds ports not contained in the DRC deployment template (based on comparing their names).
  • ServiceWithAdditionalPorts only adds ports not contained in the DRC service template (based on comparing their names).
  • Set the Name attribute of webhook ports in services to webhook (to facilitate the name comparison).

Unit tests were added.

Housekeeping

I have:

Need help with this checklist? See the cheat sheet.

@Argannor Argannor requested review from a team and turkenh as code owners April 3, 2024 13:29
@Argannor Argannor requested a review from phisco April 3, 2024 13:29
value: "{{ .Values.webhooks.port }}"
{{- end}}
{{- if and .Values.webhooks.enabled .Values.webhooks.port }}
- name: "WEBHOOK_PORT"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks repeat of above ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, removed the duplicate

value: ":{{ .Values.metrics.port }}"
{{- end}}
{{- if .Values.readiness.port }}
- name: "HealthProbeBindAddress"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HEALTH_PROBE_BIND_ADDRESS to match the env defined in command struct below ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it, thanks

- protocol: TCP
port: 9443
targetPort: 9443
port: {{ .Values.webhooks.port | default "9443" }}
Copy link
Contributor

@ravilr ravilr Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the webhook Service port here itself, can remain the same 9443, without the need for it to be configurable right ?
Only the crossplane pod's webhook port, that is Service's targetPort below, needs configurability..

Changing the Service port (which IMO unnecessary) would require changes in other places like

in the crossplane initializer which it uses to setup WebhookConfiguration resources..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right @ravilr, I didn't think about this before, but the service port can remain the same. I will refactor this again, but I need your input on this, as the override makes it a bit awkward:

  • In the runtime config the user can provide a service port with a port diverging from this
  • The runtime config just embeds the upstream corev1.ServiceSpec so we can't really prevent this from happening

Would you prefer

  1. the runtime_defaults.go#serviceFromRuntimeConfig method to override it if present,
  2. a new default override ServiceWithPort(serviceName string, port int) to be added to the allOverrides method

I'm unsure which one is actually less surprising when reading the code. I think I'm leaning towards the second one

Copy link
Contributor

@ravilr ravilr May 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. In case of crossplane core cli, the newly being added --webhook-port arg appears to be unambiguous already, capturing the webhook server listener port in the crossplane pod's core container. So, i think the crossplane chart can continue hard-coding the webhook Service spec port to 9443 as before.

  2. In case of pkg DeploymentRuntimeConfig, 2nd option from your proposal sounds good to me, since we are already expecting the DRC's serviceTemplate.spec override to be matching on the port name webhook.

cc @phisco @turkenh for the maintainers to weigh in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chart now hard-codes 9443 as the service port and an override was added to ensure the port is always 9443. Test case was adjusted accordingly

@Argannor
Copy link
Contributor Author

Thank you for the review, I'm currently on sick leave and will get back to you as soon as I am able to.

@Argannor
Copy link
Contributor Author

Argannor commented Jun 2, 2024

@ravilr can you advise me on the failed checks please?

  1. Earthly +generate seems to detect changes on generated files, when running make reviewable everything seems fine and no changes occur on my machine.

  2. The e2e test fails for realtime-compositions while trying to delete a resource, I suspect that this has nothing to do with my changes

@ravilr
Copy link
Contributor

ravilr commented Jun 17, 2024

changes LGTM. Thanks. Still requires review from the maintainers. @Argannor do you want to remove the Draft: prefix from the PR title ?

Reg. CI job failures, may be maintainers can help re-kick them to run again.

@jbw976 jbw976 linked an issue Jun 18, 2024 that may be closed by this pull request
@Argannor Argannor changed the title Draft: feat/configurable ports feat/configurable ports Jun 21, 2024
@Argannor Argannor requested review from a team and negz as code owners August 19, 2024 09:13
@Argannor Argannor requested review from jbw976 and lindblombr August 19, 2024 09:13
@Argannor Argannor force-pushed the feat/configurable-ports branch from 39b3b18 to 869d767 Compare August 19, 2024 09:23
@Argannor
Copy link
Contributor Author

Rebased the branch on top of master to resolve merge conflicts. However I can't get the E2E tests to run on my local machine, also it seems like these tests fail on other pull requests as well, therefore I suspect that it's not related to my changes.

Kindly requesting a review from one of the maintainers

@Argannor
Copy link
Contributor Author

While deploying this change in our test cluster, I had to adjust the status.endpoint field of FunctionRevision resources to contain the correct port, as GRPC only does a DNS lookup but does not discover the configured target port.

With these changes and my PR to provider-kubernetes crossplane-contrib/provider-kubernetes#301 I was able to

  • run crossplane, provider-kubernetes and function-patch-and-transform on the host network with custom ports
  • the Kubernetes API server was able to reach the conversion webhook of provider-kubernetes
  • Crossplane was able to communicate with function-patch-and-transform

while running EKS using Calico as the CNI.

Looking forward to your feedback

ports:
- name: readyz
containerPort: 8081
containerPort: {{ .Values.readiness.port | default "8081" }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K8s spec expects this containerPort to be of Integer type: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.31/#containerport-v1-core

unquote the default as it was before, also may be test helm template rendering if .Values.readiness.port | int is required ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unquoted the value and verified that | int is not needed, even if the user specifies the value as a quoted string.

@jbw976
Copy link
Member

jbw976 commented Oct 25, 2024

@Argannor did you happen to have an example of what a DeploymentRuntimeConfig would look like that "overrides the default ports (metrics, webhook, grpc) in both Deployment and Service." You've probably already created one during your testing. I'm particularly interested in what it looks like after the addition of the serviceTemplate.spec that previously has not existed in the DeploymentRuntimeConfig API.

@Argannor
Copy link
Contributor Author

Argannor commented Oct 25, 2024

@jbw976 sure thing here you go:

# K8s provider
apiVersion: pkg.crossplane.io/v1
kind: Provider
metadata:
  name: provider-kubernetes
spec:
  package: "index.docker.io/crossplanecontrib/provider-kubernetes:v0.15.0"
  runtimeConfigRef:
    name: provider-kubernetes
---
apiVersion: pkg.crossplane.io/v1beta1
kind: DeploymentRuntimeConfig
metadata:
  name: provider-kubernetes
spec:
  deploymentTemplate:
    spec:
      selector: { }
      template:
        spec:
          hostNetwork: true
          nodeSelector:
            kubernetes.io/os: linux
          imagePullSecrets:
            - name: crossplane-workaround
          containers:
            - name: package-runtime
              args:
                - --webhook-port=9610
  serviceTemplate:
    spec:
      ports:
        - name: webhook
          port: 9200 # doesn't matter
          targetPort: 9610
          protocol: TCP

And more or less the same for functions as well:

apiVersion: pkg.crossplane.io/v1beta1
kind: Function
metadata:
  name: function-patch-and-transform
spec:
  package: xpkg.upbound.io/crossplane-contrib/function-patch-and-transform:v0.7.0
  runtimeConfigRef:
    name: function-patch-and-transform
---
apiVersion: pkg.crossplane.io/v1beta1
kind: DeploymentRuntimeConfig
metadata:
  name: function-patch-and-transform
spec:
  deploymentTemplate:
    spec:
      selector: { }
      template:
        spec:
          hostNetwork: true
          containers:
            - name: package-runtime
              args:
                - --address=:9612
  serviceTemplate:
    spec:
      ports:
        - name: webhook
          port: 9200 # doesn't matter
          targetPort: 9612
          protocol: TCP

And the values for the crossplane helm chart:

hostNetwork: true
dnsPolicy: "ClusterFirstWithHostNet"
webhooks:
  port: 9600
metrics:
  enabled: true
  port: 9601
readiness:
  port: 9602

Edit: removed irrelevant parameters and added values.yaml

Copy link
Member

@jbw976 jbw976 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Argannor for your diligence on this feature and your patience waiting for it to be merged. And thank you @ravilr for your attention and reviews so far!

I took this for a test spin today and ran into some obstacles, so perhaps you can provide some insight on if these are real issues or are an artifact of the testing environment or manifests/config I'm using 😇

I checked out your feat/configurable-ports branch locally and I'm using earthly +hack to build/deploy Crossplane locally into a kind cluster with the following changes intended to capture the helm values you provided:

       .hack/helm install --create-namespace --namespace crossplane-system crossplane .hack/charts/crossplane-0.0.0-hack.tgz \
         --set "image.pullPolicy=Never,image.repository=crossplane-hack/crossplane,image.tag=hack" \
-        --set "args={--debug}"
+        --set "args={--debug}" \
+        --set "hostNetwork=true,dnsPolicy=ClusterFirstWithHostNet,webhooks.port=9600,metrics.enabled=true,metrics.port=9601,readiness.port=9602"

Crossplane seems to come up OK and is honoring those port configurations:

        ports:
        - containerPort: 9602
          name: readyz
          protocol: TCP
        - containerPort: 9601
          name: metrics
          protocol: TCP
        - containerPort: 9600
          name: webhooks
          protocol: TCP

However, I'm having trouble with the provider and function.

provider-kubernetes

The provider-kubernetes pod is in a CrashLoopBackOff with this in its logs:

crossplane-kubernetes-provider: error: unknown long flag '--webhook-port', try --help
  • Looks like it's missing your changes from crossplane-contrib/provider-kubernetes#301. Is a built provider with those changes published to a public repo somewhere that I could use?
  • Out of curiosity, what is the imagePullSecrets: - name: crossplane-workaround I see in your manifest?
  • the ports with a comment # doesn't matter - can those be dropped from the manifest completely then if the value doesn't matter? 🤔

function-patch-and-transform

The function pod is in a Pending state, and I see this event for the pod:

  Warning  FailedScheduling  4m54s (x5 over 34m)  default-scheduler  0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.

Is it possible that the manifests are specifying the wrong port here? Your manifest specifies webhook, but should it be grpc instead? Or is this an artifact of using a kind cluster for my testing? Not sure what the root cause of this port conflict is and what should be done to fix 🤔

Here's a snippet showing ports and args from kubectl get pod:

spec:
  containers:
  - args:
    - --address=:9612
    env:
    - name: TLS_SERVER_CERTS_DIR
      value: /tls/server
    image: xpkg.upbound.io/crossplane-contrib/function-patch-and-transform:v0.7.0
    imagePullPolicy: IfNotPresent
    name: package-runtime
    ports:
    - containerPort: 8080
      hostPort: 8080
      name: metrics
      protocol: TCP
    - containerPort: 9443
      hostPort: 9443
      name: grpc
      protocol: TCP

Let me know how I can improve my testing environment and config to more accurately evaluate this PR, thank you! 🙇‍♂️

@Argannor
Copy link
Contributor Author

Hi @jbw976, thank you for testing these changes! I only used a "real" cluster to test the changes, as I'm currently unable to run kind on my machine.

Let me try to answer your questions:

provider-kubernetes

  • Unfortunately I don't have the changes in a public registry, but in a private one.
  • That's also the reason for the imagePullSecrets, I host both the modified kubernetes provider and crossplane in a private registry.
  • Honestly never tried to remove the port, depends on whether the manifest will still pass the validation

function-patch-and-transform

  • Regarding the port name, I introduced changes to name that port "webhook" as in the global webhookPortName, to be able to reliably address this port by name. I agree that grpc would be a better fit here, since that matches the actual use and the port delcared in the deployment is called this way too. In the main branch the service port does not have a name at all
  • Regarding the port in use error, I cannot reproduce it since I'm unable to use kind at the moment. I checked the function code and couldn't find any other ports opened by the function except the grpc port itself. So either I'm missing something, or the port is already in use on your machine? Or maybe the fact that the deployment/pod declares the host port prompts kind to try and open them on the host as well, even though they're not used. You can control those via the DeploymentRuntimeConfig as well (see below)
apiVersion: pkg.crossplane.io/v1beta1
kind: DeploymentRuntimeConfig
metadata:
  name: function-patch-and-transform
spec:
  deploymentTemplate:
    spec:
      selector: { }
      template:
        spec:
          hostNetwork: true
          containers:
            - name: package-runtime
              args:
                - --address=:9612
              ports:
                - containerPort: 9612
                  hostPort: 9612
                  name: grpc
                  protocol: TCP
                - containerPort: 9613
                  hostPort: 9613
                  name: metrics
                  protocol: TCP
  serviceTemplate:
    spec:
      ports:
        - name: webhook
          port: 9200 # doesn't matter
          targetPort: 9612
          protocol: TCP

@jbw976
Copy link
Member

jbw976 commented Oct 29, 2024

Unfortunately I don't have the changes in a public registry, but in a private one.

I was able to build your branch of provider-kubernetes and publish it to a publicly accessible repo at xpkg.upbound.io/jaredorg/provider-kubernetes-portconfig:v0.0.2.

Honestly never tried to remove the port, depends on whether the manifest will still pass the validation

I can confirm that the port value is required, even if it isn't used 😇

The DeploymentRuntimeConfig "provider-kubernetes" is invalid: spec.serviceTemplate.spec.ports[0].port: Required value

The provider-kubernetes pod looks healthy and running, but I'm curious about the mapping of Service -> Pod ports.

Service:

  ports:
  - name: webhook
    port: 9443
    protocol: TCP
    targetPort: 9610

Pod:

    ports:
    - containerPort: 8080
      hostPort: 8080
      name: metrics
      protocol: TCP
    - containerPort: 9443
      hostPort: 9443
      name: webhook
      protocol: TCP

Is that an issue that the pod's ports for webhook are still using 9443, but the Service is pointing to 9610? Should the DeploymentRuntimeConfig we are using for our testing also include the ports in the deployment template?

In general, how were you testing to validate that the webhook port is accessible and functional? That could help my testing here too 😁

@jbw976
Copy link
Member

jbw976 commented Oct 29, 2024

I also tested again with function-patch-and-transform, with nothing else installed in the cluster. It installed and got to a healthy state, so I have to imagine that the usage of HostNetwork with kind opens up ports on my host laptop and causes conflicts when more than 1 provider or function is installed that all want to open 9443...

Without specifying any ports in the deployment template (only specifying them in the service template), I expected the crossplane composition machinery to not be able to reach the function because the Service would not be pointing at the Pod's ports, i.e. the Service points at 9612 and the Pod declares 9443 and 8080:

❯ kcs get service function-patch-and-transform -ojsonpath='{.spec.ports}'
[{"name":"webhook","port":9443,"protocol":"TCP","targetPort":9612}]

❯ kcs get pod -l pkg.crossplane.io/function=function-patch-and-transform -ojsonpath='{.items[0].spec.containers[0].ports}'
[{"containerPort":8080,"hostPort":8080,"name":"metrics","protocol":"TCP"},{"containerPort":9443,"hostPort":9443,"name":"grpc","protocol":"TCP"}]

However, it still worked 😮 - this must be because we're passing --address=:9612 as an arg to the function pod, but I didn't think that would be enough. Can you help me understand why the connection would still work when the Service doesn't point at any ports declared by the Pod? 🤔

@Argannor
Copy link
Contributor Author

Argannor commented Oct 29, 2024

Service to Pod port mapping

For me this was not an issue and we decided to do it that way to keep the complexity of this change lower see this discussion above, as modifying the service port would demand for additional changes.

As to why this works, honestly I don't know. I verified it on our AWS EKS cluster with Calico, and the control plane (running outside of the overlay network provided by Calico) reached the container running in the host network on the pods port. But I don't understand how that works. Probably some kind of kubeproxy/iptables magic going on there.

And the ports are not in conflict with each other as each service gets their own IP address.

Testing done

In our test cluster (again AWS EKS + Calico), I deployed:

  • crossplane (built from this branch)
  • function-patch-and-transform (latest release)
  • provider-kubernetes (built from my branch over there)

All of them running in the hostNetwork and ports manually adjusted (as outlined in my previous examples). Then I wrote a Composition for the function and saw that crossplane and the function could talk to each other and worked as expected.

For provider-kubernetes it was a little bit more complicated, so if you deploy an old version (pre 0.11.0 so v0.10.0) and deploy a simple Object manifest, it will be deployed as the v1alpha1 variant. If you now update to a more recent version with the newer v1alpha2 version the provider refuses to start (see this issue). With the changes from my branch in this PR + crossplane from this branch, we can run it in the host network with a non-default port and the provider starts up correctly, and I observed that the conversion webhook was called by the control plane.

I have no idea how to replicate this with kind, as you would somehow need to create this split network, where workloads and control plane are not in the same network.

Regarding function-patch-and-transform and the port mystery

Kubernetes doesn't really care too much about the ports declared in a container, just like docker doesn't really care about the ports exposed in the Dockerfile:

List of ports to expose from the container. Not specifying a port here DOES NOT prevent that port from being exposed. Any port which is listening on the default "0.0.0.0" address inside a container will be accessible from the network. Modifying this array with strategic merge patch may corrupt the data. For more information See kubernetes/kubernetes#108255. Cannot be updated.

Kubernetes API Reference Container Spec

So it's a nice to have to actually declare them, and I would even say that it depends on the concrete runtime if a port gets opened or not but I'm not sure about it. So now we know that kind creates a port forwarding for all declared ports. I'm not sure if the "normal" kubelet does the same.

Copy link
Member

@jbw976 jbw976 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be working with my testing in a local kind instance for scenarios where I set specific ports and where I don't specify anything at all - so it seems the plain vanilla scenarios haven't regressed as far as I can tell 💪

Documentation and examples will be important for this feature when it's merged - that will help folks adopt and use this very easily, e.g. the examples you provided me for my testing streamlined my usage.

I'm still a bit uncomfortable not knowing how the networking mapping is working still, both in your AWS environment and my kind environment, even when the service target port doesn't match the pod/container port. Your explanation of Kubernetes doesn't really care too much about the ports declared in a container could be sufficient for our usage though 😇

I have a few comments including in this review, mostly around the API we're exposing and the usage of webhook vs grpc for Functions. Let me know what you think. Thanks for your continued efforts on this! 🙇‍♂️

ServiceWithSelectors(b.podSelectors()),
ServiceWithAdditionalPorts([]corev1.ServicePort{
{
Name: webhookPortName,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this Service() function is used in the context of both providers and functions. For Functions, the port name is grpc, so always setting it to webhook could cause some confusion. Is there a better way to set this to the appropriate name the pod will be using?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will move the service overrides to the calling side, so that we can have different port names for functions (grpc) and providers (webhook)


endpointPort := servicePort
for _, port := range svc.Spec.Ports {
if port.Name == webhookPortName {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this further adds to the confusion of function webhook vs. grpc - the intent here is to find the grpc port for the function, but we're calling it a webhook port (i think 😇)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the port is now named grpc we can use grpcPortName here instead too :)

Ports: []corev1.ServicePort{
{
Name: webhookPortName,
Port: int32(webhook.DefaultPort),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it odd that we are using webhook.DefaultPort for the comparison here, when our code is setting it explicitly using servicePort instead? those two values happen to be the same, but maybe it's still odd to use this for the comparison?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops, yeah that's confusing and will hurt later. I'll change it.

Comment on lines 108 to 110
MetricsBindAddress string `default:":8080" env:"METRICS_BIND_ADDRESS" help:"The address the metrics server listens on." placeholder:"host:port"`

HealthProbeBindAddress string `default:":8081" env:"HEALTH_PROBE_BIND_ADDRESS" help:"The address the health probe endpoint binds to." placeholder:"host:port"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's feeling a bit odd to expose these as addresses in the command line args, but as port numbers in the helm chart values. What do you think about consolidating them and only exposing as port numbers everywhere?

Is there a use case where the user would need to set the host also here in addition to the port? i'm not sure there is, but if there is, we'd probably need to make a change to the helm chart values later anyways to make that easier to set since we aren't exposing it there now.

My instinct is to just expose port numbers everywhere unless there's a clear use case for setting the host too 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no specific use case, I just used them as it was the more flexible approach and those where the arguments that had to be passed on. Changed it to port numbers only

Copy link
Contributor

@ravilr ravilr Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both the metric and health bind address flags are passed-into/consumed by the k8s-sigs/controller-runtime's Manager Options, Options

the argument towards keeping them as address string is: Some Security focussed setups would not want to expose the metrics port on all interfaces, instead only on 127.0.0.1:8080 and then front it with a rbac-proxy sidecar which requires auth for the prometheus scrapers to be able to reach it.

@Argannor
Copy link
Contributor Author

Thank you for your thorough review! I agree that documentation is necessary for this scenario. Do you want me to add a PR for that as well? In that case I'd ask you to guide me where (which repository) to do that, and where to put the added docs (as in where in the documentation).

I addressed your comments above and adjusted the code accordingly. Except for the comment about webhook service port vs webhook port, as I don't see the need for that (s. my comment above).

@Argannor
Copy link
Contributor Author

Sorry, forgot to address the check-diff question. No I was wondering about that one too. Git doesn't detect any changes, besides I ran earthly +reviewable before, so I'd assume that everything should be up to date already.

Maybe a line-ending issue? I'm working on Windows + WSL, so that wouldn't be the first time where line endings get messed up.

@ravilr
Copy link
Contributor

ravilr commented Oct 30, 2024

Without specifying any ports in the deployment template (only specifying them in the service template), I expected the crossplane composition machinery to not be able to reach the function because the Service would not be pointing at the Pod's ports, i.e. the Service points at 9612 and the Pod declares 9443 and 8080

I'm still a bit uncomfortable not knowing how the networking mapping is working still, both in your AWS environment and my kind environment, even when the service target port doesn't match the pod/container port. Your explanation of Kubernetes doesn't really care too much about the ports declared in a container could be sufficient for our usage though

From https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.31/#serviceport-v1-core :
the Service.spec.ports[].targetPort can be specified as a Number or name of the port to access on the pods targeted by the service. Number must be in the range 1 to 65535. Name must be an IANA_SVC_NAME. If this is a string, it will be looked up as a named port in the target Pod's container ports

So, if the targetPort in the Service is specified as the correct listening port number of the target pod container in question, Service->pods routing/forwarding doesn't rely on the target pod's container ports spec. Only when specifying the targetPort in the Service as a named port, they do have to be matching with the target pod's container ports spec for the lookup to be successful.

Considering the confusion this might cause for users wanting to customize ports for their setups, to keep the configuration surface simple, i'm wondering whether exposing spec.serviceTemplateSpec.spec in the DeploymentRuntimeConfig CRD is really needed and can be avoided ?

If the pkg runtime Service builder methods are updated to leverage the Service.spec.ports[].targetPort as named port, then there won't be really any need for customizing the ports in spec.serviceTemplateSpec.spec of DeploymentRuntimeConfig resource. One would have to just specify them in the spec.deploymentTemplate.spec.template.spec.containers[0.name==package-runtime].ports[] of the DeploymentRuntimeConfig resource.

Could you think of any other spec.serviceTemplateSpec.spec fields that folks would want to customize other than the ports ? If not, we could as well, not introduce/add it now, and update to use the targetPort as named port reference to target pod container's port.

For example, to override a provider deployment ports, one would just need to specify:

apiVersion: pkg.crossplane.io/v1beta1
kind: DeploymentRuntimeConfig
metadata:
  name: provider-upjet-aws-rds
spec:
  deploymentTemplate:
    spec:
      selector: {}
      template:
        spec:
          hostNetwork: true
          containers:
            name: package-runtime
            ports:
            - containerPort: 18001
              name: webhook
              protocol: TCP
            - containerPort: 19001
              name: metrics
              protocol: TCP
            env:
            - name: WEBHOOK_PORT
              value: "18001"
            - name: METRICS_BIND_ADDRESS
              value: ":19001"

and the pkg runtime Service builder would be just hardcoding the targetPort to targetPort: webhook in case of providers and to targetPort: grpc in case of functions, in the Service resource it builds/renders.

@ravilr
Copy link
Contributor

ravilr commented Oct 30, 2024

Also @Argannor, i don't think one have to run Function runtime pods in hostNetwork(hence no ports customization required). Are you running on EKS and finding Crossplane-pod-in-hostNetwork not able to talk to -> function grpc pod running in pod network ?

In setups where the pod network is configured to use custom CNI in overlay mode (could be Calico overlay, Cilium overlay, etc..), Pod-to-Pod communications (and Pod-in-HostNetwork to Pod-in-Overlay-network) themselves work over the overlay network enabled by the respective CNI, so Crossplane pod -> Function pods grpc endpoints works seamlessly(with appropriate Networkpolicy[1], if it is being enforced) . But, the the EKS APIServer don't have direct connectivity(route) to talk to the pods on the overlay network.

In case of kube APIServer(or EKS Managed control plane) -> provider conversion webhook (or crossplane core webhook) communications, running provider(or crossplane core) in hostNetwork, hence ports customization to avoid port conflicts with other providers deployment, are required.

In case of Crossplane pod -> function grpc communication, there is no kube APIServer/kube control-plane involved right ?

[1] NetworkPolicy allowing all ingress and egress on grpc port 9443 in crossplane-system namespace, where the crossplane-core pod and the function pods are deployed to:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-functions-egress-ingress
  namespace: crossplane-system
spec:
  policyTypes:
  - Ingress
  - Egress
  podSelector: {}
  ingress:
  - ports:
    - port: 9443
      protocol: TCP
  egress:
  - ports:
    - port: 9443
      protocol: TCP

@Argannor
Copy link
Contributor Author

Argannor commented Oct 31, 2024

Also @Argannor, i don't think one have to run Function runtime pods in hostNetwork(hence no ports customization required). Are you running on EKS and finding Crossplane-pod-in-hostNetwork not able to talk to -> function grpc pod running in pod network ?

You're perfectly right, I think I got confused by the fact that we needed to adjust the port in the FunctionRevisions status, as Crossplane (or better GRPC) would try to communicate via the default port, as it does not respect the ports of the service directly.

To verify that I created a small gist, using

  • nginx (as the function)
  • a headless service
  • and busybox (as the client/crossplane) running on hostNetwork

Where the client container was able to first perform an nslookup to get the overlay network IP and then reach the nginx.

In case of Crossplane pod -> function grpc communication, there is no kube APIServer/kube control-plane involved right ?

That's a good question...

The API server gets involved in the situation as soon as the function tries to manage multiple versions of their own CRDs and need to provide custom versions. But that wouldn't be happening via GRPC, so we would actually need to change the code again to accommodate for an optional webhook port as well. Or am I mistaken here?

If the pkg runtime Service builder methods are updated to leverage the Service.spec.ports[].targetPort as named port, then there won't be really any need for customizing the ports in spec.serviceTemplateSpec.spec of DeploymentRuntimeConfig resource. One would have to just specify them in the spec.deploymentTemplate.spec.template.spec.containers[0.name==package-runtime].ports[] of the DeploymentRuntimeConfig resource.

Could you think of any other spec.serviceTemplateSpec.spec fields that folks would want to customize other than the ports ? If not, we could as well, not introduce/add it now, and update to use the targetPort as named port reference to target pod container's port.

I don't see any other use case, so using the port names directly could make it a less confusing solution, and reduce the verbosity of the necessary config. I like that suggestion.

@Argannor
Copy link
Contributor Author

Update on using port names:

In order for GRPC to know which port the function is listening on we need to set the port as part of the FunctionRevision.status.endpoint (e.g. dns:///function-patch-and-transform.crossplane-system:9613). And since GRPC is not Kubernetes aware, it needs to be a port number.

This makes things more difficult, as the service needs to be setup in the Pre-Hook and the deployment in the Post-Hook, but with the change to port names we need information from both sources (name and namespace from the service, port from the deployment). To avoid this dilemma we could

  • either rely on the fact that the service is named after the package, and set the status during Post
  • or set the first part of the status dns:///<service-name>.<namespace-name> during Pre and add the port :<port> during Post

Do you have any better ideas? I don't particularly like the prospects of either of those, but the first option could be the lesser evil.

@ravilr
Copy link
Contributor

ravilr commented Oct 31, 2024

. But that wouldn't be happening via GRPC, so we would actually need to change the code again to accommodate for an optional webhook port as well. Or am I mistaken here?

i don't think Function runtime pods would ever have webhook listener, only grpc port.

In order for GRPC to know which port the function is listening on we need to set the port as part of the FunctionRevision.status.endpoint

since there is no need to run function runtime pods on hostNetwork, hence no ports customization required. So, the servicePort==targetPod containerPort==9443 should just work, as it right now before this PR change, right ?

@Argannor
Copy link
Contributor Author

i don't think Function runtime pods would ever have webhook listener, only grpc port.

Who manages CRDs of functions? Could you please elaborate your reasoning for me?

since there is no need to run function runtime pods on hostNetwork, hence no ports customization required. So, the servicePort==targetPod containerPort==9443 should just work, as it right now before this PR change, right ?

Sorry I was too focused on making it work. You're right, if we don't allow for customization, and it's never run on the host network (which shouldn't be required) we can again hard code it. But then why is the port configurable in the first place (via the SDK) and should we in this case also prevent customization of the container ports, since that would lead to an invalid configuration?

@ravilr
Copy link
Contributor

ravilr commented Oct 31, 2024

But that wouldn't be happening via GRPC, so we would actually need to change the code again to accommodate for an optional webhook port as well. Or am I mistaken here?

i don't think Function runtime pods would ever have webhook listener, only grpc port.

Who manages CRDs of functions? Could you please elaborate your reasoning for me?

As of today, Function Input CRDs exists and gets deployed through function packages to crossplane cluster, just for capturing the input schema of the function. They are never used for storing CRs of that CRD kind into the kube-api storage. As such, the conversion webhook for function input CRDs doesn't make sense ?

cc @negz @jbw976 to clarify on whether Function runtime pods would ever have Kube-apiserver->webhook-listener-on-function-pod communication call pattern.

since there is no need to run function runtime pods on hostNetwork, hence no ports customization required. So, the servicePort==targetPod containerPort==9443 should just work, as it right now before this PR change, right ?

Sorry I was too focused on making it work. You're right, if we don't allow for customization, and it's never run on the host network (which shouldn't be required) we can again hard code it. But then why is the port configurable in the first place (via the SDK) and should we in this case also prevent customization of the container ports, since that would lead to an invalid configuration?

Are you referring to --address flag/arg to the function ? i guess it is configurable to allow local development/testing to use custom port. But, i'm not sure if it is being used for other purposes. @negz ?

jbw976 added a commit to jbw976/crossplane that referenced this pull request Nov 4, 2024
@Argannor
Copy link
Contributor Author

Argannor commented Nov 6, 2024

Regardless of the outcome of this discussion, I

  • reverted to hard coded grpc ports for functions
  • changed provider ports to name based ports
  • removed the configurability for service ports

and verified in our test cluster again, that the communication works between:

  • crossplane (host network) -> functions (cluster network)
  • API server (outside cluster network) -> providers (host network)

Let me know what you think about this.

@Argannor
Copy link
Contributor Author

Excuse my persistence, but what do you think of the latest changes? Do these adress your concerns?

@jbw976
Copy link
Member

jbw976 commented Nov 15, 2024

@Argannor @ravilr great work continuing to push on this effort since the last time I reviewed. I had to focus on a few other things with the v1.18 release and Kubecon, but after Kubecon settles down I'm planning on taking another pass through this. I would love to get this merged early in the v1.19 milestone so it can bake a bit and definitely be included as a popular feature of 1.19! 🙇‍♂️

Copy link
Member

@jbw976 jbw976 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Argannor @ravilr for the work on this PR since the last time I reviewed, as well as your patience. Let's see if we can make a push to get this in for v1.19! code freeze is currently Jan 28, so let's try to get this finished and have something to ship :shipit:

A few notes to share:

  • overall, i'm liking the simplicity that you've found in the moving parts and the surface area that is exposed. Thanks for the ideas @ravilr on what didn't actually need implementing (e.g. configurable function gRPC ports)
  • looks like @ravilr's idea to use named ports and completely avoiding exposing a service template spec worked OK? I'm concluding that because I don't see the DeploymentRuntimeConfig API being touched at all anymore. I love that simplification so we don't have to expose an entire config object with lots of knobs.
  • Do you have some updated manifests that are representative of the testing you've been doing @Argannor? e.g. what you provided me in #5540 (comment) was really helpful for my testing, and now that the surface area / UX has been simplified a lot, it would be helpful if you can once again share what you're using to test so I can use that as a starting point for my testing 😉
    • your testing results look very encouraging so far for your setup! crossplane (host network) -> functions (cluster network) and API server (outside cluster network) -> providers (host network)
  • I'm personally only aware of the function address being configurable in the SDK and function template for local dev/test scenarios.

cc @negz @jbw976 to clarify on whether Function runtime pods would ever have Kube-apiserver->webhook-listener-on-function-pod communication call pattern.

As you all surmised, it's not expected for instances/resources to ever be created of the input CRDs that functions expose. Those CRDs were never intended to be installed into the control plane actually, the package manager unexpectedly found them and installs them. This is tracked in #5294. Therefore, I don't see any need to support webhook port configuration for functions, at least right now.

I'll take this for a test pass once you share some manifests @Argannor, then give the code another deeper look over. Just wanted to share these thoughts now as we're ramping up to get this included in v1.19! 🙏 💪

@ravilr
Copy link
Contributor

ravilr commented Jan 17, 2025

@jbw976 if you're looking to take the latest changes(without the spec.ServiceTemplate in DeploymentRuntimConfig) to ride, the DeploymentRuntimeConfig in provider-k8s.yaml from jbw976@08eb86f would have to be updated to below:

---
apiVersion: pkg.crossplane.io/v1beta1
kind: DeploymentRuntimeConfig
metadata:
  name: provider-kubernetes
spec:
  deploymentTemplate:
    spec:
      selector: {}
      template:
        spec:
          hostNetwork: true
          nodeSelector:
            kubernetes.io/os: linux
          imagePullSecrets:
            - name: crossplane-workaround
          containers:
            - name: package-runtime
              args:
                - --webhook-port=9610            # customize webhook port to hostNetwork 9610; default is 9443, if unset
                - --metrics-bind-address=":9611"    # customize metrics port to hostNetwork 9611; default is 8080, if unset
              ports:
                - containerPort: 9610  # should be set to same thing set in above arg
                  name: webhook
                  protocol: TCP    
                - containerPort: 9611 # should be set to same thing set in above arg
                  name: metrics
                  protocol: TCP               

As for functions runtimeConfig, there isn't a need for running function pods in hostNetwork, as the crossplane pod->function pod communications are within the cluster network (all CNIs enable direct connectivity from cluster node's hostNetwork -> cluster pod network), hence no need for any port customization. As such the recommendation for folks running custom CNI deployments would be to continue to use the Crossplane's default rendered function runtime spec.

jbw976 added a commit to jbw976/crossplane that referenced this pull request Jan 30, 2025
jbw976 added a commit to jbw976/crossplane that referenced this pull request Jan 30, 2025
Copy link
Member

@jbw976 jbw976 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to test this more thoroughly today and look through all the code as well. The behavior I'm expecting to see appears to all be working from my testing with a local kind cluster 🎉 You can see my testing notes in this gist:
https://gist.github.com/jbw976/0fb9c7070a541dffaa286883de13c26b

I've added some comments that are mostly around converging the code changes with the final design/solution we've landed on. Take a look at those and let me know what you think.

Code freeze for v1.19 is now Feb 4 on the community calendar and I'm definitely hoping we get this in! 🙇

@Argannor would you also be able to update the PR description to make sure it accurately captures the final code changes/design?

if tmpl.Metadata.Name != nil {
svc.Name = *tmpl.Metadata.Name
}
if meta := tmpl.Metadata; meta != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this refactoring is still lingering from when we were allowing the entire ServiceTemplate spec also - now that we're back to just the metadata, we can probably put this function back to exactly how it was before. 0 changes 😇

want want
}{
"ProviderServiceNoRuntimeConfig": {
reason: "No overrides should result in a deployment with default values",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test case is slightly confusing for me because it says "no overrides", but still does pass overrides below, and it also seems to be mentioning provider things but then passing an service override for gRPC which is a function thing.

Any clean-up you think you should do for this unit test?

@jbw976
Copy link
Member

jbw976 commented Jan 30, 2025

@ravilr, thanks for the manifest provided in #5540 (comment)! To get it working, I only had to remove the quotes around - --metrics-bind-address=":9611". See https://github.com/jbw976/crossplane/blob/configurable-ports-review/provider-k8s.yaml for what I got working 😉

@Argannor
Copy link
Contributor Author

Argannor commented Feb 1, 2025

Thank you @jbw976 for the review. Unfortunately I'm currently out with a fever and not sure if I can incorporate your review until the code freeze. Feel free to take over.

* expose crossplane ports for metrics, webhooks, and health probes
  in helm chart and CLI args
* allow providers/functions to have their ports configured through
  a DeploymentRuntimeConfig

Signed-off-by: Argannor <[email protected]>
Signed-off-by: Jared Watts <[email protected]>
@jbw976
Copy link
Member

jbw976 commented Feb 4, 2025

@Argannor i'm sorry to hear you are feeling ill! I hope you recover soon 🙏

I've taken all your changes and squashed/rebased them to the latest on main, resolved all merge conflicts, and then made a couple small tweaks for my last review feedback. You'll still be the commit author of your changes. I've pushed that all to https://github.com/jbw976/crossplane/tree/argannor-configurable-ports.

I'm doing some more testing now and I'll look through the code once again, but if all that looks good, I'll force push to your branch and then merge this PR!

@jbw976 jbw976 force-pushed the feat/configurable-ports branch from 96dab3d to a3978ae Compare February 4, 2025 15:24
Copy link
Member

@jbw976 jbw976 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've finished my final testing, squash/rebased the changes, and I think this is good to go for v1.19! 🚀

Thanks again @Argannor for your diligence and patience on this PR which is a highly demanded feature, and thank you @ravilr for your thorough reviews and insights to simplify this implementation as well. 🙇‍♂️

@jbw976 jbw976 merged commit 0ce3ce4 into crossplane:main Feb 4, 2025
17 of 18 checks passed
mbbush pushed a commit to crossplane-contrib/provider-upjet-aws that referenced this pull request May 1, 2025
Part of crossplane/crossplane#5520

As outlined in the above issue, when running Crossplane in certain environments (like EKS using Calico CNI/Cilium in overlay network), where the control plane resides outside the workload network it is necessary to run webhooks in the host network. This calls for the need to make the listening ports(webhook and metrics) of provider controllers configurable to avoid port conflicts among different provider families pods.

How has this code been tested
Deployed these changes along with the changes from feat/configurable ports crossplane/crossplane#5540, which allows customizing ports in DeploymentRuntimeConfig, and verified that a provider launched with these new flags with custom ports and corresponding DeploymentRuntimeConfig, had conversion webhook communications working without errors.

Signed-off-by: ravilr <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Running Crossplane on hostNetwork

3 participants