Skip to content

Add helm chart for unitycatalog#895

Merged
roeap merged 6 commits intounitycatalog:mainfrom
sdwbgn:helm
May 27, 2025
Merged

Add helm chart for unitycatalog#895
roeap merged 6 commits intounitycatalog:mainfrom
sdwbgn:helm

Conversation

@sdwbgn
Copy link
Contributor

@sdwbgn sdwbgn commented Feb 11, 2025

PR Checklist

  • A description of the changes is added to the description of this PR.
  • If there is a related issue, make sure it is linked to this PR.
  • If you've fixed a bug or added code that should be tested, add tests! [not applicable]
  • If you've added or modified a feature, documentation in docs is updated

Description of changes

Bringing helm chart from - https://github.com/sdwbgn/unitycatalog-helm

Features:

  • Deploys UC Server and UI
  • UC Configuration from helm values
    • OAuth
    • Data sources configuration (S3, GCS, ADLS) with secret management
  • Persistence of data in PVCs

Relates: #433

tag: latest

image:
repository: unitycatalog/unitycatalog-ui
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image naming and versions will be modified when #698 is closed

@tnightengale
Copy link
Collaborator

@sdwbgn This is really exciting! Currently #698 only pushes the backend image. Is it your recommendation for hosting on k8s that we also create and push a unitycatalog-ui image?

@tnightengale
Copy link
Collaborator

tnightengale commented Feb 11, 2025

@sdwbgn This is really exciting! Currently #698 only pushes the backend image. Is it your recommendation for hosting on k8s that we also create and push a unitycatalog-ui image?

Or for example, would it be better to bundle the ui into the backend image? Looking for guidance as to how I can help here, as I am not very familiar with frontend/backend distinctions on k8s.

@tnightengale
Copy link
Collaborator

My instinct is that we should overhaul the current ui/Dockerfile to be production quality, build the react app, and then serve it using nginx in a second stage of a multistage Dockerfile. What are your thoughts @sdwbgn @jamieknight-db?

@sdwbgn
Copy link
Contributor Author

sdwbgn commented Feb 11, 2025

Is it your recommendation for hosting on k8s that we also create and push a unitycatalog-ui image?
Or for example, would it be better to bundle the ui into the backend image?

Oh, I thought both images will be published in that PR. Usually it is best practice to have images as decoupled as possible. IMO best approach would be to separate image for backend and frontend.

then serve it using nginx in a second stage of a multistage Dockerfile

In K8s deploying the app and serving it are usually separate. It's mostly common for Helm charts to deploy bare minimum - the app itself and service for accessing the app.
After that, for example, one of the options is to expose your service by changing Service type to NodePort (to expose raw port on Kubernetes compute node) or LoadBalancer (to use your cloud provider's solution for ALB/NLB).
I think we could also integrate add Ingress to the chart to use it with controllers, like nginx-ingress.

@dennyglee
Copy link
Contributor

@sdwbgn @tnightengale - will you be able to make it to the next contributors meeting #902 so we can discuss the best approach for this?

Copy link
Contributor

@ognis1205 ognis1205 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it would be preferable to support TLS.

@tnightengale
Copy link
Collaborator

#902

@dennyglee I start a new role that week, so unclear if I will have a conflict. But will endeavor to attend!

@sdwbgn
Copy link
Contributor Author

sdwbgn commented Mar 4, 2025

I thought it would be preferable to support TLS.

Makes sense! Currently we can achieve that by terminating TLS/SSL connection on NLB or other ingress. Additionally we might want to consider support for TLS in UC itself.

@dennyglee dennyglee requested a review from roeap March 4, 2025 17:09
@dennyglee
Copy link
Contributor

Adding @roeap to review!

Copy link
Collaborator

@roeap roeap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think the Helm chart looks good, however I do have some questions around how we manage storage - and please excuse if I am completely off track here :)

IIRC, in a StatefulSet, each Pod is assigned dedicated storage (PV & PVC) isolated from all other pods. So when using it for the file based storage, we may end up with each replica effectively having its own version of the on-disk database that would be always out of sync?

For the configuration - since we render it on ever pod-start - i think we can just use an ephemeral volume and avoid creating a PV?

If the above is in-fact true, we may want to consider using a Deployment and creating a separate RedWriteMany PV/PVC that all pods can mount, or force all pods on the same node via affinities somehow? Or, since on-disk storage seems most suited for evaluation deployments, just limit it to one replica in that case? This would then allow a wider range of storages, since I believe only a limited sub-set supports RWMany ..

I was a little suprised by the storage credentials, but I guess this is simply due to the fact, that the Credentials api surface is not implemented yet, so there is no dynamic handling of credentials possible?

Lastly, does it make sense to also include the creation of an ingress (and/or gateway) object that users can optionally enable?

Comment on lines +178 to +181
{{- if not (and (eq .Values.db.type "file") .Values.db.fileConfig.persistence.enabled) }}
- name: db-volume
emptyDir: {}
{{- end }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be a trivial question, as I am not familiar with hibernate. In case we are not using a file based db, is the volume mount even necessary? i.e. should we have the condition on the volume / volume mount rather then mounting an ephemeral volume if we are not using the file-based db?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will fix that!
I will keep db volume mounts for "file" type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added postgresql DB support, fixed it

Comment on lines +104 to +107
{{- with .Values.server.statefulset.livenessProbe }}
livenessProbe:
{{- toYaml . | nindent 12 }}
{{- end }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My stance of liveliness probes has evolved over time and may not yet be fully stable :) - that said, personally I do find the reasoning here convincing and we may decide not to specify a liveliness probe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, will remove default livenessProbe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +114 to +116
volumes: []

volumeMounts: []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we prefix these with extra to indicate we are always mounting some volumes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +85 to +87
{{- define "unitycatalog.server.apiEndpoint" -}}
http://{{ include "unitycatalog.server.fullname" . }}:{{ .Values.server.service.port }}
{{- end }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This template initially confused me a bit, as I was thinking of an external endpoint. Since we are using it only once would it make sense to remove the template and just inline this where we use it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +213 to +222
{{- if .Values.server.config.persistence.enabled }}
- metadata:
name: config-volume
spec:
accessModes: {{ .Values.server.config.persistence.accessModes }}
resources:
requests:
storage: {{ .Values.server.config.persistence.size }}
storageClassName: {{ .Values.server.config.persistence.storageClassName }}
{{- end }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be wrong, but i do believe init containers run for every pod that starts, since we are rendering the config always do we need a dedicated volume for this, or could we just use an ephemeral volume with a lifetime tied to the pod?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a need for that due to JWT private key creation by UC. I moved it to Secret resource and removed the need for config persistence (but I need to test it further).

- metadata:
name: db-volume
spec:
accessModes: {{ .Values.db.fileConfig.persistence.accessModes }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are supporting file based dbs, can we even allow all access modes here, or do we require it to be ReadWriteMany?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider file-based DB to be single replica solution, hence default value is [ "ReadWriteOnce" ]. ReadWriteMany implementations in most clouds are not perfectly reliable, have quite a lot of limitations (performance, latency, race conditions) and overall cover very specific use cases.

Setting multiple accessModes is incredibly rare case, I never seen a case in production where that would be a requirement. If that is really needed, it currently can be overridden in values.yaml, but as for the default value ReadWriteOnce should be enough.

@captainnemo27
Copy link

captainnemo27 commented Mar 20, 2025

i think cannot go to prodution :< , i am trying but get a lot of missing config

db:
type: file
# Supported values: file
# TODO: add support for postgres and mysql

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this pull request ❤️ Cannot wait!
I assume in future this Postgres will help avoid using Persistent Volume, then can support multiple replicas (?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added postgresql support, tested simple case of multi-replica UC (without auth) and it seems to work

@ognis1205
Copy link
Contributor

Any updates on this?

@dennyglee
Copy link
Contributor

@roeap Can you review when you get the chance? Thanks!

@sdwbgn
Copy link
Contributor Author

sdwbgn commented Apr 20, 2025

Sorry for long wait, I am currently working on current comments, I will push new revision today or tomorrow.

@sdwbgn
Copy link
Contributor Author

sdwbgn commented Apr 21, 2025

Published next iteration

  • Added HTTPRoute and Ingress support
    • UI/API requests routing on same hosts based on prefix (/api prefix for API, everything else to UI pods)
    • Allows for disabling proxy for API requests in UI (required for using Ingress / HTTPRoute)
  • Added PostgreSQL support
    • Allows for multi-replica deployments
  • Refactored config creation to remove need for persisting config values in separate volume
  • Generated docs using helm-docs
    • Added precommit check to keep it in-sync
  • Using deployment instead of statefulset

Currently, I have tested following:

  • Set up UC with PostgreSQL and 2 replicas for both UI and API
  • Set up HTTPRoute with public DNS record and exposed to internet
    • Used Envoy Gateway as Gateway API provider
    • Restricted access using EnvoySecurityPolicy (basic-auth)
  • Set up SSL certificate using cert-manager and referenced in helm chart

What's left to test:

  • Test S3/ADLS/GCS credential management
  • External auth - could be broken during refactoring of secrets and keys

Overall there is still need for proper UI docker image.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example of values.yaml for prod-like deployment with 2 replicas, PostgreSQL and routing using HTTPRoute

@unfrgivn
Copy link

@sdwbgn Thanks for pushing on this and also adding Postgres support. Any interest in adding the typical array config for extraContainers inside the Deployment template so sidecars like SQL proxy can be added?

@unfrgivn
Copy link

@sdwbgn A couple items on the Gateway implementation. I didn't dig into that template specifically since I wouldn't use that and I don't think the Gateway object belongs in the chart. My simple proposal would be to change the default enabled: false when enabling the HTTPRoute. The Gateway API is great and I hope support for it starts appearing in more charts, but there's too many different implementations and dependencies on the cluster setup and CSP that will change how it's implemented.

Typically I add my routes and backendpolicy k8s objects via Kustomize on top of Helm. So if I were to use this I would disable the Gateway/HTTPRoute in the chart, but then it will try to use the frontend proxy in the Deployment. My second choice would be to enable the HTTPRoute using the extraParentRefs, disable the Gateway creation, and use Kustomize to patch the hosts and add another route to enforce HTTPS redirect.

And a minor correction, the chart docs say that httpRoute.host is an array but it's implemented in the template as a string value that gets added to the spec.hostnames array.

I am happy to collaborate more on this if you would like, but I'm also not trying to throw a bunch of changes into this PR as I know the goal is to get a first iteration merged.

@sdwbgn
Copy link
Contributor Author

sdwbgn commented May 12, 2025

@unfrgivn Thanks for your comments!

Any interest in adding the typical array config for extraContainers inside the Deployment template so sidecars like SQL proxy can be added?

Nice idea, added.

I didn't dig into that template specifically since I wouldn't use that and I don't think the Gateway object belongs in the chart.

I gave it a thought, and I agree with you, removed it and left only parentRefs

And a minor correction, the chart docs say that httpRoute.host is an array but it's implemented in the template as a string value that gets added to the spec.hostnames array.

Fixed the doc type, thanks!

Copy link
Collaborator

@roeap roeap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sdwbgn - looking great!

Since this is quite a complex thing with many nuances I expect we may need to iterate once this gets into the wild, but this is expected :).

One more update from main an need to fix the DCO, gut I think good to go.

Signed-off-by: Vasilii Bulatov <[email protected]>
Signed-off-by: Vasilii Bulatov <[email protected]>
Copy link

@Gerrit-K Gerrit-K left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @sdwbgn sorry for randomly chiming in here with a review. I was evaluating unity with my team and we were happy to see that there might be an official helm chart just around the corner. Unfortunately, while trying it out, we stumbled over a few things that we couldn't work around - most of which was related to security enforcements of our target cluster. I figured I might comment some of them here already, but please don't see this as a blocker for the PR.
Thanks for your work!

- /bin/sh
- -c
- |-
apk add --no-cache envsubst openssl curl

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Installing packages at runtime requires to run the container with root privileges, which is not recommended. Instead, it would be advisable to either use off-the-shelf images or provide a custom pre-built image that contains the required tools. (same applies for the create-users-job)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a separate PR with that - #1019. Will update the chart as soon as we merge it.

#
# [Kubernetes docs](https://kubernetes.io/docs/concepts/policy/security-context/)
#
securityContext:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general suggestion, I would propose to add a few settings by default and let users decide whether they need to extend/override the securityContext. A sensible default could be:

podSecurityContext:
  runAsNonRoot: true
  runAsUser: 100
  runAsGroup: 101
  fsGroup: 101
  seccompProfile: { type: RuntimeDefault }

securityContext:
  allowPrivilegeEscalation: false
  capabilities: { drop: [ALL] }

At least the runAs... settings are useful so that the users don't need to inspect the image to get the user and group IDs to run the container with.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea! Currently, due to the alpine images and package installation, it's not possible to drop root permissions. I have prepared a PR with adding those packages to main UC image - #1019. After it's merged, I will drop bare alpine image and usage of apk, and we can use these defaults.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick reply! Sounds reasonable 👍
Looking forward to the further improvements

@sdwbgn
Copy link
Contributor Author

sdwbgn commented May 25, 2025

@Gerrit-K Thanks for your suggestions! They are exactly what was missing for making the deployment more secure. I've addressed the certificate issue and securityContext for initContainer of UI. For the init image and default securityContext I want to make a separate change, after PR #1019 is merged.

@roeap roeap merged commit ad29ac4 into unitycatalog:main May 27, 2025
11 checks passed
@sdwbgn sdwbgn deleted the helm branch May 28, 2025 03:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants