Skip to content

Add Docker image for Unity Catalog#18

Closed
pflooky wants to merge 21 commits intounitycatalog:mainfrom
data-catering:main
Closed

Add Docker image for Unity Catalog#18
pflooky wants to merge 21 commits intounitycatalog:mainfrom
data-catering:main

Conversation

@pflooky
Copy link

@pflooky pflooky commented Jun 14, 2024

Some small changes are included in this PR to help create a working Docker image for Unity Catalog:

  • Add Dockerfile
  • Add script to run build and push for Docker images
  • Add GitHub action to build and upload Docker image
  • Add uc-docker script for CLI to work straight away on the Docker image
  • Allow for server.properties and log4j2.properties file pathways to be set via environment variables or system properties
  • Fix banner having two v's
  • Change --format to --output in the documentation

@tdas
Copy link
Contributor

tdas commented Jun 15, 2024

Thank you for this pr. Docker would be great. I am going to play with this thing as soon as I get a chance.

as well as provide a convenient way to explore the content of any UC server implementation.

### Prerequisites
- Docker
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do have to clone this repo still, so the 2 options has to be under the instructions of cloning the repo.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to clone the repo if running the docker run ... command. It will pull the image from Dockerhub and start running straight away.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate a bit more about this? what will be the full command of docker run ...
If we are adding a completely new way to run these steps, then we need to write down the exact steps that people can copy paste and run and leave nothing to assumption.

And I am not as familiar with Docker as i want to be. I am guessing that this docker run ... command is going to download from dockerhub? who is going to publish docker hub?

@@ -0,0 +1,32 @@
#!/bin/bash

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some comments in the file on what does this file do?

@@ -0,0 +1,32 @@
#!/bin/bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this script have to be in the root of the repo. Keeping th root of the repo uncluttered is better

@stoffeastrom
Copy link

Would love to see a docker image for this as well but I'm wondering how datacatering is affiliated to unitycatalog? Since the image is built in this repo wouldn't it be better if it's published under unitycatalog?

@dennyglee dennyglee mentioned this pull request Jun 18, 2024
@pflooky
Copy link
Author

pflooky commented Jun 18, 2024

Would love to see a docker image for this as well but I'm wondering how datacatering is affiliated to unitycatalog? Since the image is built in this repo wouldn't it be better if it's published under unitycatalog?

I was using datacatering to test the docker image creation. Happy to change to unitycatalog.

When this PR is merged, to get the GitHub action to work, someone will have to add in DOCKER_HUB_USER and DOCKER_HUB_TOKEN to the GitHub action secrets that has access to upload to unitycatalog.


ENV SERVER_PROPERTIES_FILE=/opt/app/etc/conf/server.properties
ENV SERVER_JOG4J_CONFIGURATION_FILE=/opt/app/etc/conf/server.log4j2.properties
ENV CLI_JOG4J_CONFIGURATION_FILE=/opt/app/etc/conf/cli.log4j2.properties
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are these ENV variables used? What's the point of assigning them?

WORKDIR /opt/app

COPY server/target/unitycatalog-server-assembly.jar unitycatalog-server.jar
COPY examples/cli/target/unitycatalog-cli-assembly.jar unitycatalog-cli.jar
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whats the purpose of the CLI jar in docker? we are not running the CLI in docker


### Run the UC Server
```shell
docker run -d -i --name unitycatalog -p 8081:8081 unitycatalog/unitycatalog:0.1.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this docker run step will only work with published versions of the server? like 0.1?
my goal was for these steps to be runnable by anyone with the current unpublished code as soon as they clone this repo

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could document how to build the container locally, and afterwards also execute it locally.

@@ -0,0 +1,26 @@
#!/bin/bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the point of this file

return apiClient;
}

private String getLogPropertiesFilePath() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to define these properties. why not just keep it simple and always load the configurations from /etc/conf?

Copy link
Contributor

@tdas tdas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, thank you for your attempt to dockerize the whole thing. But there are a lot of changes in this PR that I dont understand the purpose of. could you please bear with me, and answer my questions so that we understand what we are merging and what will be needed to maintain this?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why this is a separate script, and not part of yaml pipeline file?

@pflooky pflooky requested a review from tdas July 2, 2024 02:33
@pdbg
Copy link

pdbg commented Jul 21, 2024

+1

Nice to have dockerization version of uc.

@LeonardoSanBenitez
Copy link

+1
Thanks for the initiative.
I'm getting started now with UC outside databricks and this docker image is great to have.

haogang pushed a commit that referenced this pull request Aug 1, 2024
# Description of changes

For this PR I took an alternative approach to #18 and #22 and created a
Dockerfile with build and start scripts that require minimal
intervention and interaction with the codebase.

From the codebase the only change is the .gitignore where I added the
.DS_Store which can be helpful in the future for contributors using Mac
OS,

PR #22 is a great start but maybe oversimplified.

PR #18 Has good thought put into it but I wanted to stay close to the
recommended way of running Unity Catalog as outlined in the project's
README. I tried not to fiddle directly with the jars and use the
provided `/bin/start-uc-server` to run the catalog. With this approach
the Dockerfile remains focused on building the environment and any
changes to how the environment should run can be made in the future
inside the `start-uc-server` script rather than the Dockerfile.

# Rationale of the PR

This pull request introduces a way to run Unity Catalog using Docker
containers. It provides a Dockerfile that builds the necessary
environment and separate bash scripts for building and starting the
catalog. This simplifies the process for users by requiring minimal
interaction with the codebase itself. The included README provides
detailed instructions on how to use these scripts to build and run the
Unity Catalog container.

> [!NOTE]
> The `README.md` contains two API calls that create an external and an
managed table.
> These APIs are not working yet because they are not supported by the
catalogue yet.

Signed-off-by: Jean Boutros <[email protected]>

---------

Signed-off-by: Jean Boutros <[email protected]>
Co-authored-by: Fokko Driesprong <[email protected]>
Co-authored-by: Denny Lee <[email protected]>
@creechy
Copy link
Collaborator

creechy commented Aug 8, 2024

This duplicates a PR #116 that included Docker functionality

@creechy creechy closed this Aug 8, 2024
dennyglee pushed a commit that referenced this pull request Sep 2, 2024
tdas pushed a commit that referenced this pull request Sep 5, 2024
rtyler pushed a commit to rtyler/unitycatalog that referenced this pull request Sep 5, 2024
kevinzwang pushed a commit to kevinzwang/unitycatalog that referenced this pull request Oct 10, 2024
# Description of changes

For this PR I took an alternative approach to unitycatalog#18 and unitycatalog#22 and created a
Dockerfile with build and start scripts that require minimal
intervention and interaction with the codebase.

From the codebase the only change is the .gitignore where I added the
.DS_Store which can be helpful in the future for contributors using Mac
OS,

PR unitycatalog#22 is a great start but maybe oversimplified.

PR unitycatalog#18 Has good thought put into it but I wanted to stay close to the
recommended way of running Unity Catalog as outlined in the project's
README. I tried not to fiddle directly with the jars and use the
provided `/bin/start-uc-server` to run the catalog. With this approach
the Dockerfile remains focused on building the environment and any
changes to how the environment should run can be made in the future
inside the `start-uc-server` script rather than the Dockerfile.

# Rationale of the PR

This pull request introduces a way to run Unity Catalog using Docker
containers. It provides a Dockerfile that builds the necessary
environment and separate bash scripts for building and starting the
catalog. This simplifies the process for users by requiring minimal
interaction with the codebase itself. The included README provides
detailed instructions on how to use these scripts to build and run the
Unity Catalog container.

> [!NOTE]
> The `README.md` contains two API calls that create an external and an
managed table.
> These APIs are not working yet because they are not supported by the
catalogue yet.

Signed-off-by: Jean Boutros <[email protected]>

---------

Signed-off-by: Jean Boutros <[email protected]>
Co-authored-by: Fokko Driesprong <[email protected]>
Co-authored-by: Denny Lee <[email protected]>
Signed-off-by: Kevin Wang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants