Grid Computing
How It Started
While helping to build/integrate a diverse range of distributed
applications, the same problems kept showing up over and over
again.
Too hard to keep track of authentication data (ID/password)
across institutions
Too hard to monitor system and application status across
institutions
Too many ways to submit jobs
Too many ways to store & access files and data
Too many ways to keep track of data
Too easy to leave dangling resources lying around (robustness)
Why Grids?
A biochemist exploits 10,000 computers to screen 100,000
compounds in an hour
1,000 physicists worldwide pool resources for petaop analyses
of petabytes of data
Civil engineers collaborate to design, execute, & analyze
shake table experiments
Climate scientists visualize, annotate, & analyze terabyte
simulation datasets
An emergency response team couples real time data, weather
model, population data
Why Grids?
A multidisciplinary analysis in aerospace couples code and data
A home user invokes architectural design functions at an
application service provider
An application service provider purchases cycles from compute
cycle providers
Scientists working for a multinational soap company design a new
product
A community group pools members PCs to analyze alternative
designs for a local road
What is Grid ?
An infrastructure that couples:
Computers (PCs, workstations, clusters, traditional supercomputers, and even
laptops, notebooks, mobile computers, PDA, and so on)
Software (e.g., ASPs renting expensive special purpose applications on
demand)
Catalogued Data/Databases (e.g., transparent access to human genome
database)
Special Instruments (e.g., radio telescope--SETI@Home Searching for Life in
galaxy, Austrophysics@Swinburne for pulsars)
People/collaborators
and offers a simple, consistent, dependable, & pervasive access across
(local/wide-area) networks to present them as an unified integrated resource.
Ian Fosters Grid Checklist (2002)
A Grid is a system that:
Coordinates resources that are not subject to centralized
control
Users reside within different administrative domains
Use of standard, open, general-purpose protocols and interfaces
Uses standard, open, general-purpose protocols and
interfaces
Delivers non-trivial qualities of service
components can be used in a coordinated way to deliver combined
services, which are appreciably greater than sum of the individual
components
Bill Johnstons Definition (2002)
A Grid is an environment that provides access and management for the whole
range of computing resources needed to solve complex computing and data
handling problems
It is a well understood and standardized set of services that provide uniform
access to a large number of diverse and distributed resources, together with
several critical auxiliary services for resource discovery and secure
communication based on authenticated, global identity.
Resource discovery
Resource scheduling
Uniform computing access
Uniform data access
Asynchronous information sources
Authentication, delegation, and secure communication
Identify certificate management
System management and access
Grid computing
is many things to many people
At its core, its about
Sharing computing resources between organisations
Enabling more complex and demanding applications by
providing widespread access to powerful computers and
storage
Integrating existing systems together
Grid: Resource-Sharing Environment
Users:
1000s from 10s institutions
Well-established communities
Resources:
Computers, data, instruments,
storage, applications
Owned/administered by
institutions
Applications: data- and compute-
intensive processing
Approach: common infrastructure
Which Grid Technologies Exist?
SETI@home / distributed.net / BOINC
Globus
Condor
Legion / Avaki
Unicore
Types of grid computing
Service Oriented Architecture (SOA)
Job submission (supercomputer access)
Cycle stealing
Service Oriented Architecture (SOA)
Applications are exposed as services, which
provide a well-defined interface and are accessed
through standard protocols
Clients use remote procedure calls to access these
services
Service Oriented Architecture and Grid
Computing
An SOA application is a composition of services
A service is the atomic unit of an SOA
Services encapsulate a business process
Service Providers Register themselves
Service use involves: Find, Bind,
Execute
Most well-known instance is Web
Services
SOA Actors
Service Provider
Provides a stateless, location transparent business service
Service Registry
Allows service consumers to locate service providers that meet
required criteria
Service Consumer
Uses service providers to complete business processes
Benefits of SOA
SOA is platform agnostic
Client doesnt need to know how service is implemented
Service doesnt need to know how client is implemented
SOA is vendor independent
Based on open standards no lock in
All SOA vendors support the same standards to enable interoperability
SOA is widely supported
Many companies are getting behind it
Being adopted widely in commercial and scientific organisations
Business Benefits
Focus on Business Domain solutions
Leverage Existing Infrastructure
Agility
Technical Benefits
Loose Coupling
Autonomous Service
Location Transparency
Late Binding
Job submission
Many organisations have large supercomputers (SMP or clusters) that they want
users to be able to submit jobs to
This can be achieved by installing middleware on each supercomputer which
interfaces to the local job queue
e.g. Globus GRAM - allows users to submit to job queues such as PBS, LSF,
etc.
Users submit jobs to a superscheduler which manages a higher level queue
and dispatches jobs to resources
The grid middleware handles tasks such as copying files to and from the
execution node, monitoring job progress, and abstracts the details of these away
from clients
Job submission
Cluster SMP machine Cluster
Superscheduler
Client Client Client
Benefits of Job Submission Grids
Users do not have to worry about differences between job submission systems
running on different resources
Superschedulers make it possible to automatically find resources that will
execute the job quicker
A user submits a job to a grid, it runs, and they get the results back later
Job submission can be implemented on top of SOA by providing a service with
methods for submitting and monitoring jobs, as well as notifying clients of
failures or completion
e.g. Globus MMJFS provides a web service interface to allow users to submit
jobs
Cycle stealing
The use of large numbers of desktop PCs to run embarrassingly
parallel applications
A master node coordinates execution and hands out tasks to
workers
The worker process on each machine polls the master for work to
do, and then executes the tasks as they become ready
Worker detects when the machine is being used by a user and
suspends/aborts the active task
This model is inherently fault tolerant; if a machine dies or a task is
aborted it can just be sent to another worker
Cycle stealing
Master
Worker Worker Worker Worker
Benefits of cycle stealing
Organisations can use their existing infrastructure to run
computationally demanding applications
No need to invest in large SMP systems or clusters
Large-scale internet projects can get free computing
power
provided they can convince users to donate CPU time
e.g. SETI@Home
Cheap supercomputing
Generally easy to deploy
Benefits of cycle stealing
Organisations can use their existing infrastructure to run
computationally demanding applications
No need to invest in large SMP systems or clusters
Large-scale internet projects can get free computing
power
provided they can convince users to donate CPU time
e.g. SETI@Home
Cheap supercomputing
Generally easy to deploy
Three Uses of Grid Computing
Computational grids
Data grids
Collaborative grids
Types of Grid
Information grids: These provide an efficient and
simple access to the data without worries about
platforms, location and performance.
Compute Grids : These exploit the processing power
from a distributed collection of systems
Service grids: They provide scalability and reliability
across different server with the establishment of
simulated instance of grid services
GRID ARCHITECTURE
The Grid Problem
Flexible, secure, coordinated resource sharing among
dynamic collections of individuals, institutions, and
resource
Enable communities (virtual organizations) to share
geographically distributed resources as they pursue
common goals -- assuming the absence of
central location,
central control,
omniscience,
existing trust relationships.
Elements of the Problem
Resource sharing
Computers, storage, sensors, networks,
Sharing always conditional: issues of trust, policy,
negotiation, payment,
Coordinated problem solving
Beyond client-server: distributed data analysis,
computation, collaboration,
Dynamic, multi-institutional virtual orgs
Community overlays on classic org structures
Large or small, static or dynamic
Grid Architecture
Identifies system components, specifies purpose and function,
indicates interaction
Effective VO operation require sharing relationships
Interoperability is the core issue
In networked environment it means common protocols
Assure general purpose mechanism for interaction
Grid Arch is a protocol Architecture
Negotiation, establish, manage sharing relationship
Standard protocol ~Standard services
Service defined by protocol.
Grid Architecture (Layered)
(By Analogy to Internet Architecture)
Application
Collective Application
Resource
Transport
Connectivity
Internet
Fabric Link
Fabric Layer: Controlling things locally: Access to, & control of,
resources
protocols and interfaces that provide access to the resources
Connectivity Layer : Talking to things: communication (Internet
protocols) & security
core protocols required for grid-specific network transactions
core grid security protocol
Resource Layer Sharing single resources: negotiating access,
controlling use
protocols required to initiate and control sharing of local
resources
Collective Layer Coordinating multiple resources: ubiquitous
infrastructure services, app-specific distributed services
protocols that provide system oriented capabilities for wide scale
deployment
Application Layer
protocols and services that are targeted toward a specific
application or a class of applications
From Theory to Practice
Building a Grid (in Practice)
Building a Grid system or application is currently an exercise in
software integration.
Define user requirements
Derive system requirements or features
Survey existing components
Identify useful components
Develop components to fit into the gaps
Integrate the system
Deploy and test the system
Maintain the system during its operation
This should be done iteratively, with many loops in the flow.
Protocols, Services,
and APIs Occur at Each Level
Applications
Languages/Frameworks
Collective Service APIs and SDKs
Collective Service Protocols
Collective Services
Resource APIs and SDKs
Resource Service Protocols
Resource Services
Connectivity APIs
Connectivity Protocols Local Access APIs and Protocols
Fabric Layer
Grid Architecture (Service-oriented)
Applications
Grid Architected Services
Web Services (Extended Web
Services)
Securit Filesyste Databas Director Messagin
y ms e y g
Servers Storage Network
Grid Architected Services
Domain Specific Services
Grid Program Grid
Execution Grid
Core Services Data Services
Services
1- Service Management 2 - Service
Communication 3 - Policy
Management 4 - Security
1. Provisioning & deploying components, collecting and
exchanging data
2. Supports basic method for grid services to
communicate with each other
3. General framework for creation, negotiation,
management of policies
4. Support, integrate and unify diff security models to
Manageability
The ability of a resource to be managed
Manageability interfaces support common operations (control and monitor)
Manageability standards specify standard interfaces
Problem:
Existing interfaces are generally resource-specific
Almost impossible to add standard interfaces to legacy resources
New standards may require additional interfaces
Solution:
Common standards
Based on Service orientation, integration and virtualization.
Service orientation
Software services
A service provides some capability to its clients through message exchanges
represent the physical manageable entities
understand the unique interfaces for the entities they represent
implement applicable standard interfaces
Integration
Encapsulated application in services become Integratable building blocks
The management process
Manager invokes the operation (services standard interface)
Service performs operation on managed entity (resources unique interface)
Service returns result to manager (through the standard interface)
Problem
Need a common way to implement service
Solution: Web Services
Virtualization
MANAGER
COMMON INTERFACES
OTHER WEB
DISKS COMPUTERS TELESCOPES SERVICES
SERVICE
PROVIDERS
RESOURCE SPECIFIC INTERFACES
Cluster Mainfram
Blades e
R R R R R R R R R
I
BI
BMM
PHYSICAL RESOURCES
Resource Management
What needs to be managed: Resources
Physical resources (computer, disks, databases, networks, scientific instruments).
Logical resources (jobs, executing applications, complex workflows etc.).
What is the Goal
Resources must be available and meet performance criteria.
What is Management:
The process of locating various types of capability, arranging for their use, utilizing them and
monitoring their state.
Maintenance of resources and environment
Monitoring their state and performance
Reacting to internal and external changes in resource or its environment
Initiating routine operations: initialization, start/stop and tuning
Traditional Resource Management
Batch schedulers, workflow engines, operating systems
Designed and operated under the assumption that:
They have complete control over a resource
They can implement the mechanisms and policies needed for effective use of that resource in
isolation
This is not the case for Grid Resource management
Separate administrative domains
Resource Heterogeneity
Lack of control and difference policies
Grid Resource Management
What is Grid Resource Management?
Identifying application requirements, resource specification
Matching resources to applications
Allocating/scheduling and monitoring those resources and applications over time in order to run
as effectively as possible.
Challenges in Grid Resource Management
Resources are heterogeneous in nature
Processors, disks, data, networks, other services.
Application has to compete for resources
Lack of available data about current systems, needs of users, resource owners and administrators
Grid Resource
Management Issues
Authentication (once).
Specify (code, resources, etc.).
Discover resources.
Negotiate authorization, acceptable use, Cost, etc.
Acquire resources.
Schedule Jobs.
Initiate computation.
Steer computation.
Access remote data-sets.
Collaborate with results.
Account for usage.
Sources of Complexity in Grid Resource
Management
No single administrative control.
No single ownership policy:
Each resource owner has their own policies or scheduling
mechanisms;
Users must honour them (particularly external Grid users).
Heterogeneity of resources.
Dynamic availability may appear and disappear