Pegasus Workflows on
OLCF - Summit
George Papadimitriou
[email protected]
http://pegasus.isi.edu
Outline
• Kubernetes/OpenShift
• What is Kubernetes (Specs, Pods, Services)
• Why use Kubernetes in HPC
• Openshift at OLCF
• Pegasus Deployment on Openshift at OLCF
• How to Deploy
• Prerequisites
• Instructions
• Demo
• Pegasus Workflow on Summit
https://panorama360.github.io 2
Kubernetes
https://panorama360.github.io 3
Kubernetes: Brief Overview
• Kubernetes is an open-source platform for running
and coordinating containerized application across
a cluster of machines.
• It can be useful for:
• Orchestrating containers across multiple hosts
• Control and automate deployments
• Scale containerized applications on the fly
• And more…
Reference:
• Key objects in the Kubernetes architecture are: https://www.redhat.com/en/topics/containers/what-is-kubernetes
• Master: Controls Kubernetes nodes – assign tasks
• Node: Perform the assigned tasks
• Pod: A group of one or more containers deployed on a single node
• Replication Controller: Controls how many copies of a pod should be running
• Service: Allow pods to be reached from the outside world
• Kubelet: Runs on the nodes and starts the defined containers
https://panorama360.github.io 4
Kubernetes: Configuring Objects
• Within Kubernetes, specification files describe the
applications, services and objects being deployed
• Specification files can be written in YAML and JSON
formats and can be used to
• Deploy Pods
• Create and mount volumes
• Expose services etc.
Reference:
https://kubernetes.io/docs/tasks/configure-pod-container/
https://panorama360.github.io 5
Kubernetes: Pods
• A Pod is the basic execution unit of a Kubernetes
application
• Pods represent processes running on the cluster
• One can have one or multiple containers running
within a Pod.
• Networking: Each Pod is assigned a unique IP
address within the cluster
• Storage: A Pod can specify a set of shared storage
Volumes. Volumes persist data and allow Pods to
maintain state between restarts.
• Lifecycle: A Pod starts running on its assigned
References:
cluster-node until the container(s) exit or it is https://kubernetes.io/docs/concepts/workloads/pods/pod-overview/
removed for some other reason (e.g. user deletes it). https://kubernetes.io/docs/concepts/workloads/pods/pod/
https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/
https://kubernetes.io/docs/concepts/storage/volumes/
https://panorama360.github.io 6
Kubernetes: Services
• A Service provides an abstract way to expose an
application running on a set of Pods as network
service to the rest of the world
• Since Pods are ephemeral, services allow users to
access the backend applications via a common way
• Service types are:
• ClusterIP: Exposes the service on a cluster-
internal IP
• NodePort: Exposes the service on each Node’s
IP at a static port
• LoadBalancer: Exposes the service externally
and loadbalances it
• ExternalName: Maps the service to a name,
returns a CNAME record Reference:
https://kubernetes.io/docs/concepts/services-networking/service/
https://panorama360.github.io 7
Kubernetes: Why it can be useful in HPC
• Running services on login nodes can be cumbersome (build from scratch, compile all
dependences etc.) and sometimes prohibited by the system administrators.
• Maintaining an application/service up to day is easier
• Assist workflow execution
• Create submission environments
• Handle data movement and job submissions
• Automation and Reproducibility
• Create collaborative web portals
• Jupyter Notebooks
• Workflow Design (e.g. Wings)
• Streaming Data
• Consuming
• Publishing
https://panorama360.github.io 8
Kubernetes (OpenShift) at OLCF
• OLCF has deployed OpenShift, a distribution of Kubernetes
developed by RedHat
• OpenShift provides a command line and a web interface to
manage your Kubernetes objects (pods, deployments,
services, storage etc.)
• OLCF’s deployment has automation mechanisms that allow
users to submit jobs to the batch system and access the
shared file systems (NFS, GPFS)
• All containers run as an automation user that is tied to a
project
Reference:
https://www.olcf.ornl.gov/wp-content/uploads/2017/11/2018UM-Day3-Kincl.pdf
https://panorama360.github.io 9
Kubernetes (OpenShift) at OLCF: Pegasus Deployment
https://panorama360.github.io 10
Kubernetes at OLCF: Pegasus Deployment - Advantages
• Pegasus workflow environments at OLCF have been simplified.
• Using the Kubernetes cluster at OLCF, we can deploy Pegasus submit nodes as
services, within a few seconds.
• The deployment uses HTCondor’s BOSCO SSH style submissions on the DTNs and
achieves submissions to the SLURM and LSF batch schedulers.
• This approach allows a single workflow to be configured to use all of OLCF’s
resources. E.g. Execute transfers on the DTNs, run simulations and heavy
processing on Summit and then do lightweight post processing steps on RHEA.
https://panorama360.github.io 11
How to Deploy
We will follow the tutorial: https://pegasus.isi.edu/tutorial/summit/tutorial_setup.php
https://panorama360.github.io 12
How to Deploy: Prerequisites
• Pegasus Kubernetes Templates for OLCF:
• https://github.com/pegasus-isi/pegasus-olcf-kubernetes
• Openshift’s Origin Client:
• https://github.com/openshift/origin/releases
• A working RSA Token to access OLCF’s systems
• An automation user for OLCF’s systems
• Allocation on OLCF’s Openshift Cluster (https://marble.ccs.ornl.gov)
https://panorama360.github.io 13
How to Deploy: Useful Origin Client Commands
• oc login: acquires an access token, authenticate against a cluster
• oc status: returns/prints the status of your deployments
• oc describe: shows details of a specific resource
• oc create: creates a Kubernetes resource from specification
• oc start-build: initiates the creation of a container image
• oc logs: returns/prints the Kubernetes log for a resource
• oc exec: executes a command in a container
• oc delete: deletes a resource
https://panorama360.github.io 14
How to Deploy: Pegasus - Kubernetes Templates
• bootstrap.sh Generates customized Dockerfile and Kubernetes pod and service specifications for
your deployment.
• Specs/pegasus-submit-build.yml Contains Kubernetes build specification for the pegasus-olcf
image.
• Specs/pegasus-submit-service.yml Contains Kubernetes service specification that can be used
to spawn a Nodeport service that exposes the HTCondor Gridmanager Service running in your
submit pod, to outside world.
• Specs/pegasus-submit-pod.yml Contains Kubernetes pod specification that can be used to
spawn a pegasus/condor pod that has access to Summits's GPFS filesystem and its batch
scheduler.
https://panorama360.github.io 15
How to Deploy: Customize Templates
In bootstrap.sh update the section "ENV Variables For User and Group"
with your automation user's name, id, group name, group id and the
Gridmanager Service Port, which must be in the range 30000-32767.
Replace the highlighted text:
• USER: with the username of your automation user (eg. csc001_auser)
• USER_ID: with the user id of your automation user (eg. 20001)
• USER_GROUP: with the project name your automation user belongs
to (eg. csc001)
• USER_GROUP_ID: with the project group id your automation user
belongs to (eg. 10001)
• GRIDMANAGER_SERVICE_PORT: with the Kubernetes Nodeport port
number the Gridmanager Service should use (eg. 32752)
Execute Script:
https://panorama360.github.io 16
How to Deploy: Acquire an Access Token (Step 1)
https://panorama360.github.io 17
How to Deploy: Build the Container Image (Step 2)
Create a new build and build the image:
https://panorama360.github.io 18
How to Deploy: Build the Container Image (Step 2)
Trace the progress of the build:
https://panorama360.github.io 19
How to Deploy: Start the Kubernetes Service (Step 3)
Start a Kubernetes Service that will expose your pod’s services:
Note: In case this step fails, go back to the bootstrap.sh change the
service port number and execute it again.
Proceed from this step, there is no need to rebuild the container.
https://panorama360.github.io 20
How to Deploy: Start the Pegasus Pod (Step 4)
Start a Kubernetes Pod with Pegasus and HTCondor:
Logon to the Pod:
https://panorama360.github.io 21
How to Deploy: Configuring for Batch Submissions (Step 5)
If this is the first time you bringing up the Pegasus container in Kubernetes we need to
configure it for batch submissions.
In the shell you got on the previous step execute:
Note: This script installs some additional files needed to operate on OLCF, and prepares the environment
on the DTNs, by installing BOSCO.
https://panorama360.github.io 22
How to Deploy: Check the status of the deployment
If all goes well you should see something similar to this in your terminal:
https://panorama360.github.io 23
How to Deploy: Deleting the Pod and the Service
Deleting the Pod:
Deleting the Service:
Deleting the container
image:
https://panorama360.github.io 24
Demo Workflow
We will follow the tutorial: https://pegasus.isi.edu/tutorial/summit/tutorial_submitting_wf.php
https://panorama360.github.io 25
Acknowledgements
Special thanks to the OLCF people that helped us make this deployment
happen !
Jason Kincl Valentine Anantharaj Jack Wells
[email protected] [email protected] [email protected] https://panorama360.github.io 26
• GitHub:
https://github.com/Panorama360
• Website:
https://panorama360.github.io
George Papadimitriou
Computer Science PhD Student
University of Southern California
email: [email protected]
https://panorama360.github.io/
https://panorama360.github.io
Pegasus est. 2001
Automate, recover, and debug scientific computations.
Pegasus Website
http://pegasus.isi.edu
Get Started Users Mailing List
[email protected] Support
[email protected]
Pegasus Online Office Hours
https://pegasus.isi.edu/blog/online-pegasus-office-hours/
Bi-monthly basis on second Friday of
the month, where we address user
questions and also apprise the
community of new developments