0% found this document useful (0 votes)
105 views28 pages

HPC Kubernetes Deployment Guide

oc create -f Specs/pegasus-submit-service.yml This will create a Kubernetes Service that exposes the Gridmanager port on a NodePort. You can check the service has started: oc get services And see the NodePort it is listening on. This port can now be used to submit jobs to your Pegasus workflow environment.

Uploaded by

Amine Besrour
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
105 views28 pages

HPC Kubernetes Deployment Guide

oc create -f Specs/pegasus-submit-service.yml This will create a Kubernetes Service that exposes the Gridmanager port on a NodePort. You can check the service has started: oc get services And see the NodePort it is listening on. This port can now be used to submit jobs to your Pegasus workflow environment.

Uploaded by

Amine Besrour
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

Pegasus Workflows on

OLCF - Summit

George Papadimitriou
[email protected]

http://pegasus.isi.edu
Outline

• Kubernetes/OpenShift
• What is Kubernetes (Specs, Pods, Services)
• Why use Kubernetes in HPC
• Openshift at OLCF
• Pegasus Deployment on Openshift at OLCF
• How to Deploy
• Prerequisites
• Instructions
• Demo
• Pegasus Workflow on Summit

https://panorama360.github.io 2
Kubernetes

https://panorama360.github.io 3
Kubernetes: Brief Overview

• Kubernetes is an open-source platform for running


and coordinating containerized application across
a cluster of machines.
• It can be useful for:
• Orchestrating containers across multiple hosts
• Control and automate deployments
• Scale containerized applications on the fly
• And more…
Reference:
• Key objects in the Kubernetes architecture are: https://www.redhat.com/en/topics/containers/what-is-kubernetes
• Master: Controls Kubernetes nodes – assign tasks
• Node: Perform the assigned tasks
• Pod: A group of one or more containers deployed on a single node
• Replication Controller: Controls how many copies of a pod should be running
• Service: Allow pods to be reached from the outside world
• Kubelet: Runs on the nodes and starts the defined containers

https://panorama360.github.io 4
Kubernetes: Configuring Objects

• Within Kubernetes, specification files describe the


applications, services and objects being deployed

• Specification files can be written in YAML and JSON


formats and can be used to
• Deploy Pods
• Create and mount volumes
• Expose services etc.

Reference:
https://kubernetes.io/docs/tasks/configure-pod-container/

https://panorama360.github.io 5
Kubernetes: Pods
• A Pod is the basic execution unit of a Kubernetes
application
• Pods represent processes running on the cluster
• One can have one or multiple containers running
within a Pod.

• Networking: Each Pod is assigned a unique IP


address within the cluster

• Storage: A Pod can specify a set of shared storage


Volumes. Volumes persist data and allow Pods to
maintain state between restarts.

• Lifecycle: A Pod starts running on its assigned


References:
cluster-node until the container(s) exit or it is https://kubernetes.io/docs/concepts/workloads/pods/pod-overview/
removed for some other reason (e.g. user deletes it). https://kubernetes.io/docs/concepts/workloads/pods/pod/
https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/
https://kubernetes.io/docs/concepts/storage/volumes/

https://panorama360.github.io 6
Kubernetes: Services
• A Service provides an abstract way to expose an
application running on a set of Pods as network
service to the rest of the world
• Since Pods are ephemeral, services allow users to
access the backend applications via a common way
• Service types are:
• ClusterIP: Exposes the service on a cluster-
internal IP
• NodePort: Exposes the service on each Node’s
IP at a static port
• LoadBalancer: Exposes the service externally
and loadbalances it
• ExternalName: Maps the service to a name,
returns a CNAME record Reference:
https://kubernetes.io/docs/concepts/services-networking/service/

https://panorama360.github.io 7
Kubernetes: Why it can be useful in HPC

• Running services on login nodes can be cumbersome (build from scratch, compile all
dependences etc.) and sometimes prohibited by the system administrators.
• Maintaining an application/service up to day is easier
• Assist workflow execution
• Create submission environments
• Handle data movement and job submissions
• Automation and Reproducibility
• Create collaborative web portals
• Jupyter Notebooks
• Workflow Design (e.g. Wings)
• Streaming Data
• Consuming
• Publishing

https://panorama360.github.io 8
Kubernetes (OpenShift) at OLCF

• OLCF has deployed OpenShift, a distribution of Kubernetes


developed by RedHat

• OpenShift provides a command line and a web interface to


manage your Kubernetes objects (pods, deployments,
services, storage etc.)

• OLCF’s deployment has automation mechanisms that allow


users to submit jobs to the batch system and access the
shared file systems (NFS, GPFS)

• All containers run as an automation user that is tied to a


project
Reference:
https://www.olcf.ornl.gov/wp-content/uploads/2017/11/2018UM-Day3-Kincl.pdf

https://panorama360.github.io 9
Kubernetes (OpenShift) at OLCF: Pegasus Deployment

https://panorama360.github.io 10
Kubernetes at OLCF: Pegasus Deployment - Advantages

• Pegasus workflow environments at OLCF have been simplified.

• Using the Kubernetes cluster at OLCF, we can deploy Pegasus submit nodes as
services, within a few seconds.

• The deployment uses HTCondor’s BOSCO SSH style submissions on the DTNs and
achieves submissions to the SLURM and LSF batch schedulers.

• This approach allows a single workflow to be configured to use all of OLCF’s


resources. E.g. Execute transfers on the DTNs, run simulations and heavy
processing on Summit and then do lightweight post processing steps on RHEA.

https://panorama360.github.io 11
How to Deploy
We will follow the tutorial: https://pegasus.isi.edu/tutorial/summit/tutorial_setup.php

https://panorama360.github.io 12
How to Deploy: Prerequisites

• Pegasus Kubernetes Templates for OLCF:


• https://github.com/pegasus-isi/pegasus-olcf-kubernetes
• Openshift’s Origin Client:
• https://github.com/openshift/origin/releases
• A working RSA Token to access OLCF’s systems
• An automation user for OLCF’s systems
• Allocation on OLCF’s Openshift Cluster (https://marble.ccs.ornl.gov)

https://panorama360.github.io 13
How to Deploy: Useful Origin Client Commands

• oc login: acquires an access token, authenticate against a cluster


• oc status: returns/prints the status of your deployments
• oc describe: shows details of a specific resource
• oc create: creates a Kubernetes resource from specification
• oc start-build: initiates the creation of a container image
• oc logs: returns/prints the Kubernetes log for a resource
• oc exec: executes a command in a container
• oc delete: deletes a resource

https://panorama360.github.io 14
How to Deploy: Pegasus - Kubernetes Templates

• bootstrap.sh Generates customized Dockerfile and Kubernetes pod and service specifications for
your deployment.

• Specs/pegasus-submit-build.yml Contains Kubernetes build specification for the pegasus-olcf


image.

• Specs/pegasus-submit-service.yml Contains Kubernetes service specification that can be used


to spawn a Nodeport service that exposes the HTCondor Gridmanager Service running in your
submit pod, to outside world.

• Specs/pegasus-submit-pod.yml Contains Kubernetes pod specification that can be used to


spawn a pegasus/condor pod that has access to Summits's GPFS filesystem and its batch
scheduler.

https://panorama360.github.io 15
How to Deploy: Customize Templates

In bootstrap.sh update the section "ENV Variables For User and Group"
with your automation user's name, id, group name, group id and the
Gridmanager Service Port, which must be in the range 30000-32767.

Replace the highlighted text:


• USER: with the username of your automation user (eg. csc001_auser)
• USER_ID: with the user id of your automation user (eg. 20001)
• USER_GROUP: with the project name your automation user belongs
to (eg. csc001)
• USER_GROUP_ID: with the project group id your automation user
belongs to (eg. 10001)
• GRIDMANAGER_SERVICE_PORT: with the Kubernetes Nodeport port
number the Gridmanager Service should use (eg. 32752)

Execute Script:

https://panorama360.github.io 16
How to Deploy: Acquire an Access Token (Step 1)

https://panorama360.github.io 17
How to Deploy: Build the Container Image (Step 2)
Create a new build and build the image:

https://panorama360.github.io 18
How to Deploy: Build the Container Image (Step 2)
Trace the progress of the build:

https://panorama360.github.io 19
How to Deploy: Start the Kubernetes Service (Step 3)
Start a Kubernetes Service that will expose your pod’s services:

Note: In case this step fails, go back to the bootstrap.sh change the
service port number and execute it again.
Proceed from this step, there is no need to rebuild the container.

https://panorama360.github.io 20
How to Deploy: Start the Pegasus Pod (Step 4)
Start a Kubernetes Pod with Pegasus and HTCondor:

Logon to the Pod:

https://panorama360.github.io 21
How to Deploy: Configuring for Batch Submissions (Step 5)
If this is the first time you bringing up the Pegasus container in Kubernetes we need to
configure it for batch submissions.

In the shell you got on the previous step execute:

Note: This script installs some additional files needed to operate on OLCF, and prepares the environment
on the DTNs, by installing BOSCO.

https://panorama360.github.io 22
How to Deploy: Check the status of the deployment
If all goes well you should see something similar to this in your terminal:

https://panorama360.github.io 23
How to Deploy: Deleting the Pod and the Service

Deleting the Pod:

Deleting the Service:

Deleting the container


image:

https://panorama360.github.io 24
Demo Workflow
We will follow the tutorial: https://pegasus.isi.edu/tutorial/summit/tutorial_submitting_wf.php

https://panorama360.github.io 25
Acknowledgements

Special thanks to the OLCF people that helped us make this deployment
happen !

Jason Kincl Valentine Anantharaj Jack Wells


[email protected] [email protected] [email protected]

https://panorama360.github.io 26
• GitHub:
https://github.com/Panorama360

• Website:
https://panorama360.github.io

George Papadimitriou
Computer Science PhD Student
University of Southern California

email: [email protected]

https://panorama360.github.io/
https://panorama360.github.io
Pegasus est. 2001
Automate, recover, and debug scientific computations.

Pegasus Website
http://pegasus.isi.edu

Get Started Users Mailing List


[email protected]

Support
[email protected]

Pegasus Online Office Hours


https://pegasus.isi.edu/blog/online-pegasus-office-hours/
Bi-monthly basis on second Friday of
the month, where we address user
questions and also apprise the
community of new developments

You might also like