0% found this document useful (0 votes)

90 views42 pages

Distributed & Parallel Computing Cluster: Patrick Mcguigan

This document provides an overview of the Distributed & Parallel Computing Cluster (DPCC) at the University of Texas at Arlington. It describes the goals of establishing the cluster through an NSF grant to facilitate collaborative research requiring large-scale computing resources. The DPCC consists of over 97 machines with over 50 terabytes of storage connected by a 1 gigabit per second network. Resources include worker nodes, RAID storage servers, and a fiber channel SAN. The document outlines how researchers can access and utilize the computing resources through the use of batch job scheduling, interactive nodes, development tools, and parallel processing using MPI.

Uploaded by

anon_393232299

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

90 views42 pages

Distributed & Parallel Computing Cluster: Patrick Mcguigan

Uploaded by

anon_393232299

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

Distributed & Parallel Computing Cluster

Patrick McGuigan
mcguigan@[Link]

2/12/04

DPCC Background

NSF funded Major Research Instrumentation (MRI) grant Goals Personnel

PI Co-PIs Senior Personnel Systems Administrator

2/12/04

DPCC Goals

Establish a regional Distributed and Parallel Computing Cluster at UTA (DPCC@UTA) An inter-departmental and inter-institutional facility Facilitate collaborative research that requires large scale storage (tera to peta bytes), high speed access (gigabit or more) and mega processing (100s of processors)

2/12/04

DPCC Research Areas

Data mining / KDD

Association rules, graph-mining, Stream processing etc. Simulation, moving towards a regional D Center

High Energy Physics Dermatology/skin cancer

Image database, lesion detection and monitoring

Grid computing, PICO Non-intrusive network performance evaluation Formal specification and verification Video streaming, scene analysis

Distributed computing Networking Software Engineering Multimedia Facilitate collaborative efforts that need high-performance computing

2/12/04

DPCC Personnel

PI Dr. Chakravarthy CO-PIs

Drs. Aslandogan, Das, Holder, Yu

Senior Personnel Paul Bergstresser, Kaushik De, Farhad Kamangar, David Kung, Mohan Kumar, David Levine, Jung-Hwan Oh, Gregely Zaruba Systems Administrator

Patrick McGuigan
2/12/04

DPCC Components
Establish a distributed memory cluster (150+ processors) Establish a Symmetric or shared multiprocessor system Establish a large shareable high speed storage (100s of Terabytes)

2/12/04

DPCC Cluster as of 2/1/2004

Located in 101 GACB Inauguration 2/23 as part of E-Week 5 racks of equipment + UPS

2/12/04

Photos (scaled for presentation)

2/12/04

DPCC Resources

97 machines

81 worker nodes 2 interactive nodes 10 IDE based RAID servers 4 nodes support Fibre Channel SAN 4.5 TB in each IDE RAID 5.2 TB in FC SAN
2/12/04

50+ TB storage

DPCC Resources (continued)

1 Gb/s network interconnections

core switch satellite switches

1 Gb/s SAN network UPS

2/12/04

DPCC Layout
node1 node2 node3 node4 node5 node6 node7 node8 node9 node10 node11 node12 node13 node14 Node15 node16 node17 node18 node19 node20 node21 node22 100 Mbs switch node23 node24 node25 node26 node27 node28 node29 node30 node31 node32 ArcusII 5.2 TB Brocade 3200 gfsnode3 gfsnode2 lockserver gfsnode1 node53 node54 node55 node56 node57 node58 node59 node60 node61 node62 raid4 IDE 6TB node63 node64 node65 node66 node67 node68 node69 node70 node71 node72 raid5 IDE 6TB raid7 IDE 6TB node73 node74 node75 node76 node77 node78 node79 node80 node81
Open Open Open Open

Campus

Linksys 3512

[Link]

raid3

node33 node34 node35

node43 node44 node45 node46 node47 node48 node49 Node50 node51 node52 6TB raid2 IDE 6TB

raid6 Foundry FastIron 800 raid8

node36 node37 node38 node39 node40

raid9

node41 node42

raid10

raid1

IDE

Linksys 3512

2/12/04

DPCC Resource Details

Worker nodes

Dual Xeon processors

32 machines @ 2.4GHz 49 machines @ 2.6GHz

2 GB RAM IDE Storage

32 machines @ 60 GB 49 machines @ 80 GB

Redhat 7.3 Linux (2.4.20 kernel)

2/12/04

DPCC Resource Details (cont.)

Raid Server

Dual Xeon processors (2.4 GHz) 2 GB RAM 4 Raid Controllers

2 port controller (qty 1) Mirrored OS disks 8 port controller (qty 3) RAID5 with hot spare

24 250GB disks 2 40GB disk NFS used to support worker nodes

2/12/04

DPCC Resource Details (cont.)

FC SAN

RAID5 Array 42 142GB FC disks FC Switch 3 GFS nodes

Dual Xeon (2.4 Ghz) 2 GB RAM Global File System (GFS) Serve to cluster via NFS

1 GFS Lockserver
2/12/04

Using DPCC

Two nodes available for interactive use

[Link] [Link]

More nodes are likely to support other services (Web, DB access) Access through SSH (version 2 client)

Freeware Windows clients are available ([Link]) File transfers through SCP/SFTP

2/12/04

Using DPCC (continued)

User quotas not implemented on home directory yet. Be sensible in your usage. Large data sets will be stored on RAIDs (requires coordination with sys admin) All storage visible to all nodes.

2/12/04

Getting Accounts

Have your supervisor request account Account will be created Bring ID to 101 GACB to receive password

Keep password safe [Link] [Link]

Login to any interactive machine

USE yppasswd command to change password If you forget your password

See me in my office, I will reset your password (with ID) Call or e-mail me, I will reset your password to the original password

2/12/04

User environment

Default shell is bash

Change with ypchsh Customize user environment using startup files

.bash_profile (login session) .bashrc (non-login) export <variable>=<value> source <shell file>

Customize with statements like:

Much more information in man page

2/12/04

Program development tools

GCC 2.96

C C++ Java (gcj) Objective C Chill JDK Sun J2SDK 1.4.2

2/12/04

Java

Development tools (cont.)

Python

python = version 1.5.2 python2 = version 2.2.2 Version 5.6.1

Perl

Flex, Bison, gdb If your favorite tool is not available, well consider adding it!
2/12/04

Batch Queue System

OpenPBS

Server runs on master pbs_mom runs on worker nodes Scheduler runs on master Jobs can be submitted from any interactive node User commands

qsub submit a job for execution qstat determine status of job, queue, server qdel delete a job from the queue qalter modify attributes of a job

Single queue (workq)

2/12/04

PBS qsub

qsub used to submit jobs to PBS

A job is represented by a shell script Shell script can alter environment and proceed with execution Script may contain embedded PBS directives Script is responsible for starting parallel jobs (not PBS)

2/12/04

Hello World
[mcguigan@master pbs_examples]$ cat helloworld echo Hello World from $HOSTNAME [mcguigan@master pbs_examples]$ qsub helloworld [Link] [mcguigan@master pbs_examples]$ ls helloworld helloworld.e15795 helloworld.o15795 [mcguigan@master pbs_examples]$ more helloworld.o15795 Hello World from [Link]

2/12/04

Hello World (continued)

Job ID is returned from qsub Default attributes allow job to run

1 Node 1 CPU [Link] CPU time

standard out and standard error streams are returned

2/12/04

Hello World (continued)

Environment of job

Defaults to login shell (overide with #!) or S switch Login environment variable list with PBS additions:

PBS_O_HOST PBS_O_QUEUE PBS_O_WORKDIR PBS_ENVIRONMENT PBS_JOBID PBS_JOBNAME PBS_NODEFILE PBS_QUEUE

Additional environment variables may be transferred using -v switch

2/12/04

PBS Environment Variables

PBS_ENVIRONMENT PBS_JOBCOOKIE PBS_JOBID PBS_JOBNAME PBS_MOMPORT PBS_NODENUM PBS_O_HOME PBS_O_HOST PBS_O_LANG PBS_O_LOGNAME PBS_O_MAIL PBS_O_PATH PBS_O_QUEUE PBS_O_SHELL PBS_O_WORKDIR PBS_QUEUE=workq PBS_TASKNUM=1

2/12/04

qsub options

Output streams:

-e (error output path) -o (standard output path) -j (join error + output as either output or error) -m [aben] when to mail (abort, begin, end, none) -M who to mail -N (15 printable characters MAX first is alphabetical) -q [name] Unimportant for now -v pass specific variables -V pass all environment variables of qsub to job

Mail options

Name of job

Which queue to submit job to Environment variables Additional attributes -w specify dependencies

2/12/04

Qsub options (continued)

-l switch used to specify needed resources

Number of nodes

nodes = x ncpus = x cput=hh:mm:ss walltime=hh:mm:ss

Number of processors

CPU time

Walltime

See man page for pbs_resources

2/12/04

Hello World

qsub l nodes=1 l ncpus=1 l cput=[Link] N helloworld m a q workq helloworld Options can be included in script: #PBS -l nodes=1 #PBS -l ncpus=1 #PBS -m a #PBS -N helloworld2 #PBS -l cput=[Link] echo Hello World from $HOSTNAME
2/12/04

qstat

Used to determine status of jobs, queues, server $ qstat $ qstat <job id> Switches

-u <user> list jobs of user -f provides extra output -n provides nodes given to job -q status of the queue -i show idle jobs
2/12/04

qdel & qalter

qdel used to remove a job from a queue

qdel <job ID>

qalter used to alter attributes of currently queued job

qalter <job id> attributes (similar to qsub)

2/12/04

Processing on a worker node

All RAID storage visible to all nodes

/dataxy where x is raid ID, y is Volume (1-3) /gfsx where x is gfs volume (1-3) /scratch

Local storage on each worker node

Data intensive applications should copy input data (when possible) to /scratch for manipulation and copy results back to raid storage

2/12/04

Parallel Processing

MPI installed on interactive + worker nodes

MPICH 1.2.5 Path: /usr/local/mpich-1.2.5 -l nodes=x -l ncpus=2x

Asking for multiple processors

2/12/04

Parallel Processing (continued)

PBS node file created when job executes Available to job via $PBS_NODEFILE Used to start processes on remote nodes

mpirun rsh

2/12/04

Using node file (example job)

#!/bin/sh #PBS -m n #PBS -l nodes=3:ppn=2 #PBS -l walltime=[Link] #PBS -j oe #PBS -o [Link] #PBS -N helloword_mpi NN=`cat $PBS_NODEFILE | wc -l` echo "Processors received = "$NN echo "script running on host `hostname`" cd $PBS_O_WORKDIR echo echo "PBS NODE FILE" cat $PBS_NODEFILE echo /usr/local/mpich-1.2.5/bin/mpirun -machinefile $PBS_NODEFILE -np $NN ~/mpi-example/helloworld

2/12/04

MPI

Shared Memory vs. Message Passing MPI

C based library to allow programs to communicate Each cooperating execution is running the same program image Different images can do different computations based on notion of rank MPI primitives allow for construction of more sophisticated synchronization mechanisms (barrier, mutex)

2/12/04

helloworld.c
#include #include #include #include <stdio.h> <unistd.h> <string.h> "mpi.h"

int main( argc, argv ) int argc; char **argv; { int rank, size; char host[256]; int val; val = gethostname(host,255); if ( val != 0 ){ strcpy(host,"UNKNOWN"); } MPI_Init( &argc, &argv ); MPI_Comm_size( MPI_COMM_WORLD, &size ); MPI_Comm_rank( MPI_COMM_WORLD, &rank ); printf( "Hello world from node %s: process %d of %d\n", host, rank, size ); MPI_Finalize(); 2/12/04 return 0;

Using MPI programs

Compiling

$ /usr/local/mpich-1.2.5/bin/mpicc helloworld.c $ /usr/local/mpich-1.2.5/bin/mpirun <options> \ helloworld Common options:

Executing

-np number of processes to create -machinefile list of nodes to run on

2/12/04

Resources for MPI

[Link]

MPI documentation

[Link]

Links to various tutorials

Parallel programming course

2/12/04

Overview of Distributed Computing Systems
No ratings yet
Overview of Distributed Computing Systems
27 pages
HPC Introduction Lecture 2
No ratings yet
HPC Introduction Lecture 2
55 pages
Using A CPU Farm
No ratings yet
Using A CPU Farm
22 pages
Parallel and Cluster Computing
No ratings yet
Parallel and Cluster Computing
31 pages
HPC Intro Genentech
No ratings yet
HPC Intro Genentech
42 pages
PBS Tutorial: Fangrui Ma Universit of Nebraska-Lincoln October 26th, 2007
No ratings yet
PBS Tutorial: Fangrui Ma Universit of Nebraska-Lincoln October 26th, 2007
6 pages
22 Clusters Slides
No ratings yet
22 Clusters Slides
61 pages
HPC Cluster Guide for Physics Students
No ratings yet
HPC Cluster Guide for Physics Students
8 pages
Fundamentals 2007 Chap1
No ratings yet
Fundamentals 2007 Chap1
36 pages
HENP Computing Systems Overview
No ratings yet
HENP Computing Systems Overview
19 pages
2013luv Supercomputers
No ratings yet
2013luv Supercomputers
12 pages
Windows HPC Guide for Researchers
No ratings yet
Windows HPC Guide for Researchers
23 pages
Mpi Sample Script
No ratings yet
Mpi Sample Script
6 pages
PDC HPC Resource Guide
No ratings yet
PDC HPC Resource Guide
54 pages
19 JobSchedulers
No ratings yet
19 JobSchedulers
37 pages
High-Performance Computing Cluster Guide
No ratings yet
High-Performance Computing Cluster Guide
69 pages
Parallel Programming Using MPI
No ratings yet
Parallel Programming Using MPI
69 pages
1 Cluster Computing
No ratings yet
1 Cluster Computing
42 pages
Intro to Distributed Systems
100% (1)
Intro to Distributed Systems
24 pages
PBS Pro Configuration and Job Submission
No ratings yet
PBS Pro Configuration and Job Submission
13 pages
How To Make Computers Work For You When You Are Enjoying Life
No ratings yet
How To Make Computers Work For You When You Are Enjoying Life
29 pages
HPC Basics for New Users
No ratings yet
HPC Basics for New Users
77 pages
Distributed CS571
No ratings yet
Distributed CS571
36 pages
SciNet Tutorial
No ratings yet
SciNet Tutorial
22 pages
Lecture 08
No ratings yet
Lecture 08
25 pages
Introduction To High Performance Computing: Shaohao Chen Research Computing Services (RCS) Boston University
No ratings yet
Introduction To High Performance Computing: Shaohao Chen Research Computing Services (RCS) Boston University
29 pages
On Distributed Os
100% (1)
On Distributed Os
131 pages
HPC 2013 Cluster User Guide
No ratings yet
HPC 2013 Cluster User Guide
4 pages
BE Comps DC Week1
No ratings yet
BE Comps DC Week1
67 pages
Evolution of Operating Systems
No ratings yet
Evolution of Operating Systems
20 pages
Os - Top 10 One Page Answers (Vardhaman)
No ratings yet
Os - Top 10 One Page Answers (Vardhaman)
6 pages
Term Paper of Cpu Scheduling in Linux and Unix
100% (2)
Term Paper of Cpu Scheduling in Linux and Unix
11 pages
Introductory Supercomputing PDF
No ratings yet
Introductory Supercomputing PDF
94 pages
High Performance Computing
100% (2)
High Performance Computing
61 pages
Comparative Study of Distributed Resource Management Systems - Sge, LSF, Pbs Pro, and Loadleveler
No ratings yet
Comparative Study of Distributed Resource Management Systems - Sge, LSF, Pbs Pro, and Loadleveler
19 pages
Distributed Systems Slides-Lesson4
No ratings yet
Distributed Systems Slides-Lesson4
36 pages
L1.1 HPC Environment
No ratings yet
L1.1 HPC Environment
27 pages
Introduction To HPC and Current Usage in HEP
No ratings yet
Introduction To HPC and Current Usage in HEP
33 pages
Cluster Computing Overview
No ratings yet
Cluster Computing Overview
11 pages
High Performance Cluster Computing
No ratings yet
High Performance Cluster Computing
15 pages
Bcsl063 Operating System Concepts and Networking Lab - 092955.Pdf - 20250428 - 111207 - 0000
No ratings yet
Bcsl063 Operating System Concepts and Networking Lab - 092955.Pdf - 20250428 - 111207 - 0000
5 pages
Parallel and Distributed Computing Lec 1 & 2
No ratings yet
Parallel and Distributed Computing Lec 1 & 2
32 pages
1994 WASH DCC - Facilities.overview Proc
No ratings yet
1994 WASH DCC - Facilities.overview Proc
9 pages
Cluster 2
No ratings yet
Cluster 2
26 pages
Slide 02
No ratings yet
Slide 02
101 pages
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 03-Aug-2021 Lecture1-Course Introduction
No ratings yet
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 03-Aug-2021 Lecture1-Course Introduction
39 pages
Linux vs UNIX CPU Scheduling
No ratings yet
Linux vs UNIX CPU Scheduling
14 pages
A To Z PPT Notes
No ratings yet
A To Z PPT Notes
62 pages
Interesting Facts About RAC
No ratings yet
Interesting Facts About RAC
40 pages
2 Typesofos
No ratings yet
2 Typesofos
34 pages
Lecture2 Historic Overview
No ratings yet
Lecture2 Historic Overview
36 pages
IntroductionHPC Session03 Mar23
No ratings yet
IntroductionHPC Session03 Mar23
52 pages
SciNet Tutorial
No ratings yet
SciNet Tutorial
22 pages
Intro Week5
No ratings yet
Intro Week5
32 pages
Parallel Computing - Unit IV
No ratings yet
Parallel Computing - Unit IV
40 pages
Lect1 - Introduction
No ratings yet
Lect1 - Introduction
76 pages
Lecture
No ratings yet
Lecture
32 pages
Watson Clustering PDF
No ratings yet
Watson Clustering PDF
20 pages
10 1016@j Future 2019 04 047 PDF
No ratings yet
10 1016@j Future 2019 04 047 PDF
31 pages
Parallel & Distributed Computing Course
No ratings yet
Parallel & Distributed Computing Course
4 pages
FDSFSD
No ratings yet
FDSFSD
35 pages
PW User Guide
No ratings yet
PW User Guide
25 pages
Project 1: Parallel Implementation of Matrix Multiplication: C C B B B B A A A A
No ratings yet
Project 1: Parallel Implementation of Matrix Multiplication: C C B B B B A A A A
1 page
Embedded System Architecture
No ratings yet
Embedded System Architecture
10 pages
Michael Förster (Auth.) - Algorithmic Differentiation of Pragma-Defined Parallel Regions - Differentiating Computer Programs Containing OpenMP-Vieweg+Teubner Verlag (2014)
No ratings yet
Michael Förster (Auth.) - Algorithmic Differentiation of Pragma-Defined Parallel Regions - Differentiating Computer Programs Containing OpenMP-Vieweg+Teubner Verlag (2014)
411 pages
Cluster and Grid Computing
No ratings yet
Cluster and Grid Computing
37 pages
Tech Note 303 - Setting Up An MPI Connection With Siemens SIMATIC NET 6
No ratings yet
Tech Note 303 - Setting Up An MPI Connection With Siemens SIMATIC NET 6
43 pages
PhreeqcRM: Geochemical Reaction Module
No ratings yet
PhreeqcRM: Geochemical Reaction Module
1 page
HPC Workshop 2024 at WIT Dehradun
No ratings yet
HPC Workshop 2024 at WIT Dehradun
1 page
Fluent Beta Features Manual
No ratings yet
Fluent Beta Features Manual
682 pages
HPC Tools for Data Science Module
No ratings yet
HPC Tools for Data Science Module
15 pages
INTEL - The Parallel Universe - Issue 03 - 2010
No ratings yet
INTEL - The Parallel Universe - Issue 03 - 2010
17 pages
Communication Overlap in Multi-Tier Parallel Algorithms - A33-Baden
No ratings yet
Communication Overlap in Multi-Tier Parallel Algorithms - A33-Baden
20 pages
CS3551 Unit 1 Part1
100% (1)
CS3551 Unit 1 Part1
32 pages
Parallel and Distributed Computing
No ratings yet
Parallel and Distributed Computing
2 pages
GADGET-2 Best Practices
No ratings yet
GADGET-2 Best Practices
3 pages
PetroMod 2021 1 InstallationGuide
No ratings yet
PetroMod 2021 1 InstallationGuide
30 pages
Parallel Computing Assignment
No ratings yet
Parallel Computing Assignment
3 pages
RegCM Installation Guide for Linux
No ratings yet
RegCM Installation Guide for Linux
16 pages
Computing With The IMSL Scientific Libraries: Serial and Parallel
No ratings yet
Computing With The IMSL Scientific Libraries: Serial and Parallel
31 pages
Mpi Communication: Siemens
No ratings yet
Mpi Communication: Siemens
21 pages
PP QuestionBank
No ratings yet
PP QuestionBank
2 pages
Star CD
No ratings yet
Star CD
46 pages
Recent Advances MPI
No ratings yet
Recent Advances MPI
464 pages
Pub Parallel-Programming PDF
100% (1)
Pub Parallel-Programming PDF
242 pages
MPI Parallel Factorial Calculation
No ratings yet
MPI Parallel Factorial Calculation
3 pages
DAKOTA Plugin
No ratings yet
DAKOTA Plugin
2 pages
PDC
No ratings yet
PDC
36 pages