Spark Jupyter getting started docker compose#295
Conversation
|
I want to make sure this is something we want to do before proceeding to add more to the PR cc @collado-mike / @flyrain |
|
Make sense to me. Thanks @kevinjqliu! Do we have any doc for its usage? We may add doc if not. |
|
@flyrain yep i'll have a README in here, similar to the trino one |
|
Sounds good. We will need these doc to be in the Polaris doc site, like this https://polaris.apache.org/docs/overview/. I couldn't find Trino's doc there, this may involve doc publish and link. cc @jbonofre |
|
I see, this is the README for trino. I'll add a similar README for spark. As a follow-up, we can change the Polaris doc to refer to these guides https://polaris.apache.org/docs/quickstart |
|
This looks good to me. We should change the name of the compose file to just |
|
@collado-mike makes sense, will do. I have a question on slack about unable to assume the role |
e8f2187 to
92a2ad5
Compare
|
r? @flyrain @RussellSpitzer @collado-mike Also opened #319 to update the Polaris doc site once this is merged. |
getting-started/spark/README.md
Outdated
There was a problem hiding this comment.
I'm a bit conflicted about this doc. It feels like it doesn't really teach the reader anything about Polaris, although it does give you a really fast way to get bootstrapped.
There was a problem hiding this comment.
yes I'll admit, this README is a filler for now as a way to get spark & polaris up and running quickly
There was a problem hiding this comment.
I wonder if it might be easier to use the CLI here
There was a problem hiding this comment.
might be, if you want a spark-shell. I think the jupyter notebook does a good job of explaining a lot of the concepts
There was a problem hiding this comment.
Sorry, I meant the polaris CLI instead of using curl
There was a problem hiding this comment.
ah i dont know how to use the polaris CLI, so i just copied directly from https://github.com/apache/polaris/blob/main/regtests/run_spark_sql.sh
e097bb3 to
b75d998
Compare
|
md check intermittently shows |
It's OK to remove the link for now since we’re transitioning to Hugo. |
|
@flyrain just had to run the CI a few times, it's unrelated to this change |
b75d998 to
3eda72b
Compare
3eda72b to
48a9f00
Compare
.github/workflows/check-md-link.yml
Outdated
There was a problem hiding this comment.
This PR moved notebooks/ from top-level directory into the getting-started/ directory
flyrain
left a comment
There was a problem hiding this comment.
Thanks @kevinjqliu for working on it. LGTM overall. Left some comments and questions.
getting-started/spark/README.md
Outdated
There was a problem hiding this comment.
Nit: could we be more explicit that it starts with an in-memory metastore?
getting-started/spark/README.md
Outdated
There was a problem hiding this comment.
local catalog -> a catalog backed by the local file system?
There was a problem hiding this comment.
I'm not entirely sure if we need this file. Could we handle everything directly within the notebook, like the other operations in SparkPolaris.ipynb? Would it simplify things if we moved the operations there?
There was a problem hiding this comment.
we could, but i think its a good idea to separate infra code (this script) from application code (the notebook)
There was a problem hiding this comment.
We could initialize catalog in notebook as well. I feel it's more flexible that way. for example, you don't have to worry about the an env variable for catalog name. But I'm OK with either one. Not a blocker for me.
getting-started/spark/README.md
Outdated
There was a problem hiding this comment.
There is other way to try Spark with Polaris without docker, it's a not a blocker, we can add it later.
|
Thanks for the review @flyrain, addressed your comments |
|
We cannot merge any PR until #374 is merged. |
|
Thanks for the heads up, I'll rebase once that PR's merged |
a0f6c9a to
797fabb
Compare
|
@flyrain took your advice, moved |
|
Thanks a lot for working on it, @kevinjqliu! Thanks all for the review. |
Description
This PR moves the
docker-compose-jupyter.ymlfile (and thenotebooks/directory), formerly in the top-level directory, into thegetting-started/spark/folder.The purpose is to unify the "getting started" guides into the same directory.
Fixes #110
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Open the
SparkPolaris.ipynbJupyter notebookGrab the
root principal credentialsfrom the Polaris service and replace in the notebook cell.Run all cells in notebook
Checklist:
Please delete options that are not relevant.