0% found this document useful (0 votes)
363 views41 pages

Cloud Computing - 1

Uploaded by

xitavox455
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
363 views41 pages

Cloud Computing - 1

Uploaded by

xitavox455
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

CLOUD COMPUTING Scheme : 2022

Course Name : CLOUD COMPUTING

Course Code : BAD515C

Course outcome (Course Skill Set)

At the end of the course, the student will be able to:

CO1: Describe various cloud computing platforms and service providers.

CO2: Illustrate the significance of various types of virtualization.

CO3: Identify the architecture, delivery models and industrial platforms for cloud computing

based applications.

CO4: Analyze the role of security aspects in cloud computing.

CO5: Demonstrate cloud applications in various fields using suitable cloud platforms.

Module-1
Distributed System Models and Enabling Technologies: Scalable Computing Over the
Internet, Technologies for Network Based Systems, System Models for Distributed and Cloud
Computing, Software Environments for Distributed Systems and Clouds, Performance, Security
and Energy Efficiency.

Department of Artificial Intelligence & Data Science, AITM Page 1


CLOUD COMPUTING Scheme : 2022

1.1 Scalable Computing Over the Internet

1. Introduction to Scalable Computing:

• Definition: Scalable computing refers to the capability of a computing system to handle


increasing workloads by adding resources, such as more servers or storage, particularly
over the Internet.
• Historical Context: Over the past 60 years, computing technology has evolved from
single, centralized mainframes to highly distributed systems capable of performing
complex, large-scale computations.

2. Evolution of Computing Platforms:

• Machine Architecture:
o Transition from mainframes (centralized) to personal computers (PCs) and,
eventually, to server farms and data centers.
o Emergence of multicore processors, GPUs, and specialized accelerators like TPUs
(Tensor Processing Units) for handling specific tasks like AI and machine
learning.
• Operating Systems:
o From simple, single-user OS (like MS-DOS) to complex, multi-user and multi-
tasking OS (like UNIX, Linux, Windows Server).
o Introduction of virtualization technologies that allow multiple operating systems
to run on a single physical machine.
• Network Connectivity:
o Early standalone computers evolved into networked environments, leading to the
development of the Internet.
o Transition from local area networks (LANs) to wide area networks (WANs) and
global connectivity through the Internet.
o Advent of high-speed networking technologies (fiber optics, 5G) that enable
faster data transfer and lower latency.

Department of Artificial Intelligence & Data Science, AITM Page 2


CLOUD COMPUTING Scheme : 2022
3. Paradigm Shift to Parallel and Distributed Computing:

• Definition of Distributed Computing:


o A computing model where multiple computers (nodes) work together to perform
tasks. Nodes can be geographically dispersed and connected via the Internet.
o Unlike centralized systems, distributed computing splits tasks into smaller sub-
tasks that run concurrently, allowing for greater efficiency and speed.

Characteristics:

• Parallelism: Multiple tasks are executed simultaneously.


• Scalability: Systems can grow by adding more nodes or resources.
• Fault Tolerance: System resilience is improved by having multiple nodes; failure of one
node doesn’t necessarily cause the entire system to fail.

4. Applications of Modern Distributed Systems:

• Data-Intensive Applications:
o Big Data Analytics: Large-scale data processing systems like Hadoop and Spark.
o AI and Machine Learning: Distributed training of machine learning models across
multiple nodes.
• Network-Centric Applications:
o Content Delivery Networks (CDNs): Distribute content (e.g., videos, images)
closer to users to improve loading speeds.
o Online Gaming: Real-time multiplayer gaming that requires low latency and high
data transfer rates.
• Social Media Platforms:
o Highly scalable systems handling billions of user interactions, real-time updates,
and large datasets.

Department of Artificial Intelligence & Data Science, AITM Page 3


CLOUD COMPUTING Scheme : 2022
5. Impact on Society:

• Quality of Life Enhancements:


o Real-time communication (e.g., video conferencing, social media).
o Access to vast information and digital services (e.g., online education,
telemedicine).
• Business and Industry Transformation:
o E-commerce platforms that scale to handle millions of transactions
simultaneously.
o Digital transformation in industries like finance, healthcare, and logistics through
AI and data analytics.

1.1.1 The Age of Internet Computing

1. Introduction to the Age of Internet Computing:

• Overview: The Internet has become an integral part of daily life for billions of people
worldwide. This explosion in Internet usage has created a massive demand for computing
resources that can handle large-scale, concurrent user activities.
• Shift in Computing Needs: Traditional high-performance computing (HPC)
benchmarks, like the Linpack Benchmark, are becoming less relevant as the focus shifts
from purely computational performance to managing vast amounts of data and numerous
simultaneous tasks.

2. Limitations of Traditional High-Performance Computing (HPC):

• HPC Overview:
o Designed primarily for scientific and engineering tasks that require significant
computational power, such as simulations, modeling, and complex calculations.
o Measured using benchmarks like Linpack, which focuses on floating-point
operations per second (FLOPS).

Department of Artificial Intelligence & Data Science, AITM Page 4


CLOUD COMPUTING Scheme : 2022
• Why Linpack is No Longer Sufficient:
o Linpack measures peak computational capability but does not adequately reflect
the needs of modern Internet-based applications.
o It doesn’t account for factors like data transfer speed, system responsiveness, or
the ability to handle a high number of concurrent tasks.

3. Rise of High-Throughput Computing (HTC):

• Definition of HTC:
o High-Throughput Computing focuses on maximizing the total number of tasks
completed over a given time rather than just peak performance.
o Utilizes parallel and distributed computing technologies to process numerous
independent tasks simultaneously, ideal for large-scale Internet applications.
• Key Characteristics:
o Scalability: Ability to add more nodes or resources to handle increased
workloads.
o Concurrency: Designed to manage many tasks or users at once, critical for
services like web hosting, cloud storage, and social media platforms.
o Data-Driven Performance: Emphasizes data processing speed, latency
reduction, and high input/output operations per second (IOPS).

1.1.1.1 The Platform Evolution

1. Overview of Computing Generations:

• Computing technology has evolved through five distinct generations, with each
generation bringing new advancements that reshaped how we use computers. Each
generation lasted approximately 10 to 20 years, often overlapping with the next.
• The evolution reflects the increasing complexity, capability, and accessibility of
computing systems, moving from centralized mainframes to highly distributed and
interconnected systems.

Department of Artificial Intelligence & Data Science, AITM Page 5


CLOUD COMPUTING Scheme : 2022
2. Generations of Computer Technology:

• First Generation (1950-1970): Mainframes


o Key Technologies: Large, powerful mainframe computers such as IBM 360 and
CDC 6400.
o Users: Primarily used by large businesses, government organizations, and
research institutions for complex computations and data processing.
o Characteristics: Expensive, room-sized machines with centralized processing
capabilities; required specialized environments and personnel.
• Second Generation (1960-1980): Minicomputers
o Key Technologies: Lower-cost minicomputers like the DEC PDP-11 and VAX
Series.
o Users: More accessible to small businesses, universities, and research labs.
o Characteristics: Smaller, less expensive, and easier to use than mainframes;
popular for departmental computing and scientific research.
• Third Generation (1970-1990): Personal Computers (PCs)
o Key Technologies: Personal computers powered by VLSI (Very Large Scale
Integration) microprocessors, like the Intel 8080 and Apple II.
o Users: Became widely available to individual consumers, businesses, and
educational institutions.
o Characteristics: Marked the transition from centralized computing to personal
and individual use; sparked the home computing revolution.
• Fourth Generation (1980-2000): Portable and Pervasive Devices
o Key Technologies: Laptops, handheld devices, and the early rise of mobile
phones.
o Users: Both personal and professional users, expanding into new areas such as
mobile computing and wireless communication.
o Characteristics: Computing became portable and pervasive, with devices
connecting through both wired and wireless networks.

Department of Artificial Intelligence & Data Science, AITM Page 6


CLOUD COMPUTING Scheme : 2022
• Fifth Generation (1990-present): HPC and HTC Systems
o Key Technologies: High-Performance Computing (HPC) clusters, grids, and
cloud computing platforms.
o Users: Broad usage from individual consumers to large-scale enterprises,
supporting applications like web services, data analytics, and AI.
o Characteristics: Shift toward leveraging shared resources, distributed computing,
and massive data handling capabilities through clusters, grids, and cloud
environments.

3. Transition from HPC to HTC:

• HPC Systems:
o Evolution: Supercomputers with massively parallel processors (MPPs) have been
gradually replaced by clusters of cooperative computers.
o Clusters: Composed of homogeneous compute nodes physically connected in
close proximity, providing high-speed communication and shared resources.
o Focus: Primarily used for scientific simulations, modeling, and tasks requiring
immense computational power.
• HTC Systems:
o HTC Overview: Focuses on completing a large number of tasks, often
independent, over a distributed network of nodes. Prioritizes throughput over peak
performance.
o Peer-to-Peer (P2P) Networks: Enable distributed file sharing and content
delivery applications. These systems are built over many globally distributed
client machines, emphasizing a decentralized approach.
o Applications: HTC systems are extensively used in cloud computing, web
services, and P2P platforms, where the emphasis is on handling vast amounts of
data and numerous concurrent user requests.

Department of Artificial Intelligence & Data Science, AITM Page 7


CLOUD COMPUTING Scheme : 2022
4. Key Technologies Driving the Evolution:

• Clustering: The use of clusters, which are groups of interconnected computers that work
together as a single system to perform computational tasks efficiently.
• Grid Computing: Extends the concept of clusters to geographically dispersed networks,
forming computational grids that pool resources from multiple locations.
• P2P Networks: Decentralized networks that allow direct sharing of resources among
peers, often used in applications like file sharing, content delivery, and collaborative
computing.
• Cloud Computing: Provides scalable and on-demand access to computing resources
over the Internet, facilitating both HTC and HPC applications with flexible, pay-as-you-
go models.

5. Trends in Platform Evolution:

• Integration of HPC and HTC Systems: While traditionally separate, there is a growing
overlap as HPC tasks are increasingly executed on cloud platforms, and HTC systems
incorporate high-performance elements for specific workloads.
• Rise of Data-Centric Computing: Emphasis is shifting from purely computational tasks
to data-driven approaches, leveraging vast datasets and advanced analytics to drive
insights and decisions.
• Emergence of Web Services and APIs: Many modern applications are built on services
that integrate easily with other platforms, allowing for more complex, distributed, and
interoperable computing environments.

6. Future Directions:

• Edge Computing: Bringing computing resources closer to the data source, enhancing
real-time processing and reducing latency, especially important for IoT applications.
• Quantum Computing: Potentially revolutionary, offering exponentially greater
processing power for certain types of complex problems that are infeasible for classical
computers.

Department of Artificial Intelligence & Data Science, AITM Page 8


CLOUD COMPUTING Scheme : 2022
• AI and Machine Learning Integration: Increasingly embedded within both HPC and
HTC systems to optimize resource allocation, automate tasks, and enhance decision-
making processes.

1.1.1.2High-Performance Computing (HPC)

1. Overview of HPC:

• Definition: HPC refers to the use of powerful supercomputers and computing clusters to
solve complex computational problems that require immense processing speed.
• Focus on Speed: For years, HPC systems have prioritized raw speed performance,
measured in floating-point operations per second (FLOPS).
• Evolution of Performance:
o Early 1990s: HPC systems operated at speeds measured in gigaflops (Gflops).

Department of Artificial Intelligence & Data Science, AITM Page 9


CLOUD COMPUTING Scheme : 2022
o 2010s: Performance improved dramatically, reaching petaflops (Pflops) levels,
driven by the need for faster computation in scientific, engineering, and
manufacturing sectors.

2. Benchmarking HPC Systems:

• Linpack Benchmark: The primary metric for measuring HPC performance, focusing on
the system’s ability to solve large sets of linear equations.
• Top 500 Supercomputers: A widely recognized list ranking the world's most powerful
computing systems based on Linpack benchmark results.

3. Limited Supercomputer User Base:

• Despite their capabilities, supercomputers serve a niche market, with fewer than 10% of
all computer users accessing these systems.
• Primary Users: Mostly scientists, engineers, and researchers working on complex
simulations, data analysis, and modeling tasks.

4. Broader Computing Landscape:

• Mainstream Computing Needs: The majority of users rely on desktop computers,


laptops, and large servers for everyday tasks like Internet searches, data processing, and
market-driven applications.
• Shift Toward Accessible Computing: While HPC systems remain essential for
specialized tasks, most computing demands are met by more accessible and widely
available systems, reflecting the diverse needs of global users.

1.1.1.3 High-Throughput Computing (HTC)

1. Overview of HTC:

• Definition: HTC emphasizes completing a large number of tasks efficiently over a given
period, focusing on high-flux computing to manage vast workloads.

Department of Artificial Intelligence & Data Science, AITM Page 10


CLOUD COMPUTING Scheme : 2022
• Shift from HPC to HTC: While HPC prioritizes speed for complex scientific tasks,
HTC focuses on handling numerous tasks simultaneously, particularly in web services
and large-scale Internet applications.

2. Key Characteristics of HTC:

• High Throughput: Measures system performance by the number of tasks completed per
unit of time, rather than peak computational speed.
• Scalability: Designed to support millions of users and tasks concurrently, making it ideal
for applications like Internet searches and cloud-based services.

3. Challenges Addressed by HTC:

• Cost Efficiency: HTC systems aim to reduce operational costs in data centers by
optimizing resource utilization.
• Energy Savings: Focus on minimizing power consumption through efficient hardware
and software design, critical for large-scale, high-demand environments.
• Security and Reliability: Enhances data protection and system stability, crucial for
enterprise-level computing and data management.

4. Applications of HTC:

• Web Services: Manages high-traffic websites and online applications, ensuring


responsiveness and reliability.
• Batch Processing: Efficiently handles large-scale data processing tasks, such as financial
transactions, data analytics, and content delivery networks.

5. Broader Impact:

• HTC systems are increasingly important for meeting the computing demands of everyday
users, bridging the gap between specialized high-performance computing and general-
purpose, market-driven applications.

Department of Artificial Intelligence & Data Science, AITM Page 11


CLOUD COMPUTING Scheme : 2022

1.1.1.4 Three New Computing Paradigms

1. Service-Oriented Architecture (SOA):

• Definition: SOA is a framework that allows different services to communicate and work
together over a network, enabling Web 2.0 applications.
• Impact: Facilitates modular, interoperable software that can be easily integrated and
reused, making it essential for modern web services.
• Details Covered: SOA will be explored in detail in Chapter 5.

2. Cloud Computing:

• Definition: Cloud computing uses virtualization to provide scalable, on-demand


computing resources over the Internet, treating data centers as a single computing unit.
• Key Quotes:
o John Gage (1984): "The network is the computer."
o David Patterson (2008): "The data center is the computer."
o Rajkumar Buyya: "The cloud is the computer."
• Evolution: Clouds are often seen as evolved clusters or grids with enhanced
virtualization capabilities, supporting massive data processing tasks from social
networks, traditional Internet, and IoT.
• Details Covered: In-depth analysis of cloud computing will be provided in Chapters 4, 6,
and 9.

3. Internet of Things (IoT):

• Definition: IoT connects physical devices using technologies like RFID, GPS, and
sensors, allowing them to collect and exchange data.
• Impact: Transforms how devices interact with each other and with users, integrating
cyber-physical systems (CPS) into everyday life.
• Details Covered: IoT and CPS will be discussed in Chapter 9.

Department of Artificial Intelligence & Data Science, AITM Page 12


CLOUD COMPUTING Scheme : 2022
Evolution of Distributed Computing:

• Historical Context:
o The Internet’s inception in 1969 set the stage for the growth of networked
computing, envisioned as a utility similar to electricity or telephone services.
• Blurring Lines:
o The distinctions between clusters, grids, P2P systems, and clouds are increasingly
blurred, as these technologies integrate and evolve.
• Future Trends:
o Clouds are expected to process vast datasets from the traditional Internet, social
media, and IoT, pushing the boundaries of distributed and cloud computing
models.

1.1.1.5 Computing Paradigm Distinctions

Key Concepts & Definitions

1. Centralized Computing:
o Definition: All computer resources (processors, memory, storage) are centralized
in a single physical system.
o Characteristics:
▪ Resources are fully shared within one integrated operating system.
▪ Common in data centers and supercomputers.
▪ Tightly coupled hardware components.
o Use Cases: Often used in parallel, distributed, and cloud computing.
2. Parallel Computing:
o Definition: A computing paradigm where multiple processors work
simultaneously on different parts of a task.
o Characteristics:
▪ Processors can be tightly coupled (centralized shared memory) or loosely
coupled (distributed memory).
▪ Interprocessor communication via shared memory or message passing.

Department of Artificial Intelligence & Data Science, AITM Page 13


CLOUD COMPUTING Scheme : 2022
▪ Programs are known as parallel programs and writing them is parallel
programming.
o Use Cases: High-performance computing applications, complex simulations.
3. Distributed Computing:
o Definition: Involves a collection of autonomous computers that communicate
over a network to achieve a common goal.
o Characteristics:
▪ Each system has private memory, and communication is accomplished
through message passing.
▪ Programs running in such environments are called distributed programs.
▪ Writing software for these systems is termed distributed programming.
o Use Cases: Internet applications, large-scale data processing (e.g., Hadoop).
4. Cloud Computing:
o Definition: A flexible computing paradigm that can use centralized or distributed
resources to provide scalable services over the internet.
o Characteristics:
▪ Cloud resources can be physical or virtualized and may operate across
centralized or distributed data centers.
▪ May involve parallel or distributed computing.
▪ Often seen as a form of utility computing or service computing.
o Use Cases: On-demand services (e.g., SaaS, IaaS, PaaS), web hosting, enterprise
applications.

1.1.1.6 Distributed System Families

Key Concepts & Definitions

1. P2P Networks:
o Definition: Peer-to-peer (P2P) networks consist of distributed nodes (machines)
that communicate and share resources without relying on a centralized server.
o Example: File-sharing systems like BitTorrent.
o Scale: Can involve millions of client machines working simultaneously.

Department of Artificial Intelligence & Data Science, AITM Page 14


CLOUD COMPUTING Scheme : 2022
2. Computational Grids (Data Grids):
o Definition: Networks of distributed clusters designed for resource sharing
(computational power, data sets) across wide-area networks.
o Characteristics:
▪ Resource sharing occurs across server clusters.
▪ Often used in scientific applications like CERN’s grid for particle physics.
o Scale: Can connect hundreds of server clusters to form a massive grid
infrastructure.
3. Cloud Computing:
o Definition: The provision of scalable, on-demand computational resources and
services via the internet.
o Characteristics:
▪ Built with server clusters and virtualized resources at large data centers.
▪ Can involve distributed or centralized architectures.
▪ Provides infrastructure, platform, and software as services (IaaS, PaaS,
SaaS).
o Example: Amazon Web Services (AWS), Google Cloud.
o Scale: Experimental cloud clusters with thousands of processing nodes have been
deployed.
4. HPC (High-Performance Computing):
o Definition: Systems designed to maximize computational speed and efficiency,
often involving parallel computing on a large scale.
o Example: Supercomputers used for simulations in science and engineering.
5. HTC (High-Throughput Computing):
o Definition: Systems designed for processing many tasks that require significant
computing power but are less dependent on synchronization between nodes.
o Example: Large-scale data processing in business, such as in e-commerce and
social networks.

Department of Artificial Intelligence & Data Science, AITM Page 15


CLOUD COMPUTING Scheme : 2022
Design Objectives for HPC and HTC Systems

1. Efficiency:
o HPC: Maximize the utilization of resources by exploiting parallelism.
o HTC: Focus on job throughput, optimizing data access, and power efficiency
(throughput per watt).
2. Dependability:
o Ensure reliability and self-management across the system, even in failure
conditions.
o Provide Quality of Service (QoS) assurances at the application and system levels.
3. Adaptation:
o Support billions of job requests across massive data sets.
o Efficiently manage virtualized cloud resources under varying workloads and
service models.
4. Flexibility:
o Distributed systems must be capable of running both HPC (scientific and
engineering) and HTC (business) applications effectively.

1.1.2 Scalable Computing Trends and New Paradigms

Several predictable technology trends continue to shape the evolution of scalable computing
applications. Designers and programmers attempt to forecast the capabilities of future systems to
meet the growing demand for distributed and parallel processing.

Key Trends in Scalable Computing:

• Moore’s Law:

• Definition: The observation that the number of transistors on a chip doubles


approximately every 18 months, leading to a corresponding increase in processing speed.

• Gilder’s Law:

• Definition: The assertion that network bandwidth has doubled annually in the past.

Department of Artificial Intelligence & Data Science, AITM Page 16


CLOUD COMPUTING Scheme : 2022

1.1.2.1 Degrees of Parallelism


Key Concepts & Definitions

1. Bit-Level Parallelism (BLP):


o Definition: The earliest form of parallelism, BLP improves processing speed by
transitioning from bit-serial processing to word-level processing.
2. Instruction-Level Parallelism (ILP):

• Definition: The ability of a processor to execute multiple instructions concurrently rather


than sequentially.
• Technologies:
o Pipelining: Breaking instruction execution into stages so that multiple
instructions can be processed at the same time.
o Superscalar Computing: Allows multiple instructions to be issued and executed
in parallel.
o Very Long Instruction Word (VLIW): Bundles multiple operations into a single
instruction.
o Multithreading: Multiple threads of execution are processed simultaneously by a
single CPU.

3. Data-Level Parallelism (DLP):

• Definition: Involves performing the same operation on multiple data points


simultaneously, which is common in applications that involve large data sets.
• Technologies:
o Single Instruction, Multiple Data (SIMD): Executes a single instruction across
multiple data points.
o Vector Machines: Use vector or array instructions to process large data sets in
parallel.
• Hardware and Compiler Support: DLP requires specialized hardware and compiler
assistance to fully optimize performance.

Department of Artificial Intelligence & Data Science, AITM Page 17


CLOUD COMPUTING Scheme : 2022
4. Task-Level Parallelism (TLP):

• Definition: Involves parallel execution of different tasks or threads on multiple cores or


processors.
• Technologies: Enabled by multicore processors and chip multiprocessors (CMPs).

5. Job-Level Parallelism (JLP):

• Definition: The parallel execution of large, independent jobs across multiple computing
nodes or machines in a distributed environment.

1.1.2.2 Innovative Applications


Both HPC and HTC systems desire transparency in many application aspects. For example, data
access, resource allocation, process location, concurrency in execution, job replication, and
failure recovery should be made transparent to both users and system management. Table 1.1
highlights a few key applications that have driven the development of parallel and distributed
systems over the years. These applications spread across many important domains in science,
engineering, business, education, health care, traffic control, Internet and web services, military,
and government applications.

Department of Artificial Intelligence & Data Science, AITM Page 18


CLOUD COMPUTING Scheme : 2022

1.1.2.3 The Trend toward Utility Computing


Figure 1.2 identifies major computing paradigms to facilitate the study of distributed systems
and their applications. These paradigms share some common characteristics.

First, they are all ubiquitous in daily life. Reliability and scalability are two major design
objectives in these computing models.

Second, they are aimed at autonomic operations that can be self-organized to support dynamic
dis covery.

Finally, these paradigms are composable with QoS and SLAs (service-level agreements). These
paradigms and their attributes realize the computer utility vision.

Utility computing focuses on a business model in which customers receive computing resources
from a paid service provider.

Department of Artificial Intelligence & Data Science, AITM Page 19


CLOUD COMPUTING Scheme : 2022

1.1.2.4 The Hype Cycle of New Technologies


Any new and emerging computing and information technology may go through a hype cycle, as
illustrated in Figure 1.3. This cycle shows the expectations for the technology at five different
stages. The expectations rise sharply from the trigger period to a high peak of inflated
expectations. Through a short period of disillusionment, the expectation may drop to a valley and
then increase steadily over a long enlightenment period to a plateau of productivity. The number
of years for an emerging technology to reach a certain stage is marked by special symbols. The
hollow circles indicate technologies that will reach mainstream adoption in two years. The gray
circles represent technologies that will reach mainstream adoption in two to five years. The solid
circles represent those that require five to 10 years to reach mainstream adoption, and the
triangles denote those that require more than 10 years. The crossed circles represent technologies
that will become obsolete before they reach the plateau. The hype cycle in Figure 1.3 shows the
technology status as of August 2010.

Department of Artificial Intelligence & Data Science, AITM Page 20


CLOUD COMPUTING Scheme : 2022

1.1.3 The Internet of Things and Cyber-Physical Systems


In this section, we will discuss two Internet development trends: the Internet of Things [48] and
cyber-physical systems. These evolutionary trends emphasize the extension of the Internet to
every day objects.

1.1.3.1 The Internet of Things

The Internet of Things (IoT) refers to the networked interconnection of everyday objects and
devices, introduced in 1999 at MIT. Unlike the traditional Internet, which connects machines or
web pages, IoT enables the connection of physical objects through technologies like RFID and
GPS. With IPv6, there are enough IP addresses to uniquely identify all objects on Earth,
supporting the tracking of up to 100 trillion items.

IoT allows for communication in three patterns: human-to-human (H2H), human-to-thing (H2T),
and thing-to-thing (T2T). These connections can occur anytime and anywhere, facilitating
dynamic interactions between people and devices. Although still in its early stages, IoT is
expected to grow into a global network of interconnected objects, with cloud computing
supporting efficient and intelligent exchanges. The IoT aims to create a "smart Earth" with
intelligent cities, clean water, efficient transportation, and more, although achieving this vision
will take time.

1.1.3.2 Cyber-Physical Systems (CPS)

Cyber-physical systems (CPS) integrate computational processes ("cyber") with physical objects
and environments. They combine the "3C" technologies—computation, communication, and
control—into intelligent feedback systems that bridge the physical and digital worlds. While the
Internet of Things (IoT) focuses on networking physical objects, CPS emphasizes the interaction
between the virtual and physical worlds, often through applications like virtual reality (VR). CPS
has the potential to revolutionize how we engage with the physical world, much like the Internet
transformed virtual interactions.

Department of Artificial Intelligence & Data Science, AITM Page 21


CLOUD COMPUTING Scheme : 2022

1.2 TECHNOLOGIES FOR NETWORK-BASED SYSTEMS


With the concept of scalable computing under our belt, it’s time to explore hardware, software,
and network technologies for distributed computing system design and applications. In
particular, we will focus on viable approaches to building distributed operating systems for
handling massive par allelism in a distributed environment.

1.2.1 Multicore CPUs and Multithreading Technologies


Consider the growth of component and network technologies over the past 30 years. They are crucial to
the development of HPC and HTC systems. In Figure 1.4, processor speed is measured in millions of
instructions per second (MIPS) and network bandwidth is measured in megabits per second (Mbps) or
gigabits per second (Gbps). The unit GE refers to 1 Gbps Ethernet bandwidth.

1.2.1.1 Advances in CPU Processors

Modern CPUs have evolved into multicore architectures, featuring dual, quad, or more
processing cores, each capable of executing multiple instruction threads. These processors
exploit instruction-level parallelism (ILP) and task-level parallelism (TLP). Multicore
processors, like Intel’s i7 and AMD’s Opteron, include private L1 caches and shared L2 caches,
with potential future integration of L3 caches. Many-core GPUs, with hundreds to thousands of
cores, also leverage data-level parallelism (DLP). While processor speed has drastically
increased over the years, clock rates have hit a limit near 5 GHz due to power and heat
limitations, requiring innovation in chip design.

Department of Artificial Intelligence & Data Science, AITM Page 22


CLOUD COMPUTING Scheme : 2022

1.2.1.2 Multicore CPU and Many-Core GPU Architectures

The future of multicore CPUs is expected to see an increase in core counts from tens to
potentially hundreds, but their ability to exploit massive data-level parallelism (DLP) is limited
by memory wall issues. This has led to the rise of many-core GPUs, which feature hundreds of
thin cores designed for high performance. Both IA-32 and IA-64 architectures are incorporated
into commercial CPUs, with x86 processors increasingly utilized in high-performance computing
(HPC) and high-throughput computing (HTC) systems. The trend shows a shift from RISC
processors to multicore x86 and many-core GPU systems in top supercomputers. Future
developments may include asymmetric or heterogeneous chip multiprocessors that integrate both
fat CPU cores and thin GPU cores on the same chip.

1.2.1.3 Multithreading Technology


Consider in Figure 1.6 the dispatch of five independent threads of instructions to four pipelined
data paths (functional units) in each of the following five processor categories, from left to right:
a four-issue superscalar processor, a fine-grain multithreaded processor, a coarse-grain multi
threaded processor, a two-core CMP, and a simultaneous multithreaded (SMT) processor. The
superscalar processor is single-threaded with four functional units. Each of the three
multithreaded processors is four-way multithreaded over four functional data paths. In the dual-
core processor, assume two processing cores, each a single-threaded two-way superscalar
processor. Instructions from different threads are distinguished by specific shading patterns for

Department of Artificial Intelligence & Data Science, AITM Page 23


CLOUD COMPUTING Scheme : 2022
instructions from five independent threads. Typical instruction scheduling patterns are shown
here. Only instructions from the same thread are executed in a superscalar processor. Fine-grain
multithreading switches the execution of instructions from different threads per cycle. Course-
grain multi threading executes many instructions from the same thread for quite a few cycles
before switching to another thread. The multicore CMP executes instructions from different
threads completely. The SMT allows simultaneous scheduling of instructions from different
threads in the same cycle. These execution patterns closely mimic an ordinary program. The
blank squares correspond to no available instructions for an instruction data path at a particular
processor cycle. More blank cells imply lower scheduling efficiency. The maximum ILP or
maximum TLP is difficult to achieve at each processor cycle. The point here is to demonstrate
your understanding of typical instruction scheduling patterns in these five different micro-
architectures in modern processors.

Department of Artificial Intelligence & Data Science, AITM Page 24


CLOUD COMPUTING Scheme : 2022

1.2.2 GPU Computing to Exascale and Beyond

• Definition: A GPU (Graphics Processing Unit) is a graphics coprocessor that offloads


graphics tasks from the CPU.
• Historical Context: The first GPU, NVIDIA's GeForce 256, was introduced in 1999 and
can process over 10 million polygons per second.
• Core Count: Modern GPUs have hundreds of processing cores, compared to traditional
CPUs, like the Xeon X5670, which has only six cores.
• Architecture: GPUs use a throughput architecture, executing many concurrent threads
slowly, as opposed to a single long thread quickly.
• Growth in Popularity: There is increasing interest in parallel GPUs and GPU clusters as
alternatives to CPUs due to their higher parallelism.
• General-Purpose Computing: GPGPU (General-Purpose computing on GPUs) has
emerged in the high-performance computing (HPC) field.
• CUDA Model: NVIDIA's CUDA programming model supports the use of GPUs in HPC
applications.
• Future Exploration: Subsequent discussions will focus on GPU clusters for massively
parallel computing.

1.2.2.1 How GPUs Work

• Historical Function: Early GPUs served as coprocessors connected to CPUs.


• Modern Architecture: NVIDIA GPUs now feature 128 cores on a single chip, allowing
for significant computational power.
• Thread Handling: Each GPU core can manage eight threads, enabling up to 1,024
threads to run concurrently, showcasing true massive parallelism.

Optimization Differences:

• CPU: Optimized for low-latency operations and caches.


• GPU: Optimized for high throughput with explicit management of on-chip memory.

Department of Artificial Intelligence & Data Science, AITM Page 25


CLOUD COMPUTING Scheme : 2022
• Versatile Applications: Modern GPUs are not limited to graphics and video encoding;
they are extensively used in high-performance computing (HPC) systems and
supercomputers.
• Floating-Point Operations: GPUs are designed to handle a large number of floating-
point operations in parallel, offloading data-intensive calculations from the CPU.
• Widespread Use: GPUs are prevalent in various devices, including mobile phones, game
consoles, embedded systems, PCs, and servers.

• HPC Applications: NVIDIA’s CUDA Tesla and Fermi architectures are used in GPU
clusters for parallel processing of large floating-point datasets.

1.2.2.2 GPU Programming Model

• GPU Programming Model Figure 1.7 shows the interaction between a CPU and GPU in
performing parallel execution of floating-point operations concurrently.
• The CPU is the conventional multicore processor with limited parallelism to exploit.
• The GPU has a many-core architecture that has hundreds of simple processing cores
organized as multiprocessors. Each core can have one or more threads.
• Essentially, the CPU’s floating-point kernel computation role is largely offloaded to the
many-core GPU. The CPU instructs the GPU to perform massive data processing.
• The bandwidth must be matched between the on-board main memory and the on-chip
GPU memory.
• This process is carried out in NVIDIA’s CUDA programming using the GeForce 8800 or
Tesla and Fermi GPUs.

Department of Artificial Intelligence & Data Science, AITM Page 26


CLOUD COMPUTING Scheme : 2022

1.2.2.3 Power Efficiency of the GPU

Bill Dally of Stanford University considers power and massive parallelism as the major benefits
of GPUs over CPUs for the future. By extrapolating current technology and computer
architecture, it was estimated that 60 Gflops/watt per core is needed to run an exaflops system
(see Figure 1.10). Power constrains what we can put in a CPU or GPU chip. Dally has estimated
that the CPU chip consumes about 2 nJ/instruction, while the GPU chip requires 200
pJ/instruction, which is 1/10 less than that of the CPU. The CPU is optimized for latency in
caches and memory, while the GPU is optimized for throughput with explicit management of on-
chip memory. Figure 1.9 compares the CPU and GPU in their performance/power ratio measured
in Gflops/ watt per core. In 2010, the GPU had a value of 5 Gflops/watt at the core level,
compared with less than 1Gflop/watt per CPU core.

1.2.3 Memory, Storage, and Wide-Area Networking

1.2.3.1 Memory Technology


The upper curve in Figure 1.10 illustrates the growth of DRAM chip capacity, which increased
from 16KB in 1976 to 64GB in 2011, indicating a fourfold capacity increase every three years.
In contrast, memory access times have not improved significantly, leading to a worsening
memory wall problem as processors continue to speed up. For hard drives, capacity rose from
260MB in 1981 to 250GB in 2004, with the Seagate Barracuda XT hard drive reaching 3TB in
2011—an approximate tenfold increase in capacity every eight years. The anticipated increase in

Department of Artificial Intelligence & Data Science, AITM Page 27


CLOUD COMPUTING Scheme : 2022
disk array capacities will likely outpace these trends. As processors become faster and memory
capacity expands, the gap between them widens, potentially exacerbating the memory wall issue
and limiting future CPU performance.

1.2.3.2 Disks and Storage Technology

Beyond 2011, disk drives and disk arrays surpassed 3TB in capacity. The lower curve in Figure
1.10 reflects a seven-order-of-magnitude growth in disk storage over 33 years. Flash memory
and solid-state drives (SSDs) have also grown rapidly, significantly impacting the future of high-
performance computing (HPC) and high-throughput computing (HTC) systems. SSDs have a
relatively low mortality rate, with each block capable of handling between 300,000 and 1 million
write cycles, allowing them to last several years even with heavy write usage. SSDs and flash
memory will provide significant speed improvements in many applications.

However, large system development will eventually be constrained by factors such as power
consumption, cooling, and packaging. Power consumption increases linearly with clock

Department of Artificial Intelligence & Data Science, AITM Page 28


CLOUD COMPUTING Scheme : 2022
frequency and quadratically with applied voltage, meaning clock rates can't be increased
indefinitely. Lower voltage supplies are increasingly necessary. In a talk at the University of
Southern California, Jim Gray remarked, "Tape units are dead, disks are tape units, flashes are
disks, and memory are caches now," highlighting the shifting future of storage technology. As of
2011, SSDs remain too expensive to fully replace traditional disk arrays in the storage market.

1.2.3.3 System-Area Interconnects

In small clusters, the nodes are typically interconnected via an Ethernet switch or a local area network
(LAN). Figure 1.11 illustrates that LANs are commonly used to connect client hosts to large servers. A
storage area network (SAN) connects servers to network storage systems such as disk arrays, while
network-attached storage (NAS) links client hosts directly to disk arrays. All three types of networks—
LAN, SAN, and NAS—are often found in large clusters built using commercial network components. For
smaller clusters without distributed storage, a setup can be created using a multiport Gigabit Ethernet
switch and copper cables to connect the machines. These network types are commercially available and
widely used.

1.2.3.4 Wide-Area Networking

The lower curve in Figure 1.10 highlights the rapid growth of Ethernet bandwidth, which
increased from 10 Mbps in 1979 to 1 Gbps in 1999, and reached 40-100 Gbps by 2011.
Predictions suggested that 1 Tbps network links could become available by 2013. In 2006,
Berman, Fox, and Hey reported network link bandwidths of 1,000 Gbps for international

Department of Artificial Intelligence & Data Science, AITM Page 29


CLOUD COMPUTING Scheme : 2022
connections, 1,000 Gbps for national links, 100 Gbps for organizational networks, 10 Gbps for
optical desktops, and 1 Gbps for copper desktop connections.

Network performance was reported to double every year, a rate that outpaces Moore’s law for
CPU performance, which doubles every 18 months. This trend indicates that more computers
will be used concurrently, leading to the development of massively distributed systems.
According to the IDC 2010 report, both InfiniBand and Ethernet were predicted to remain the
dominant interconnect technologies in the high-performance computing (HPC) arena. Most data
centers have adopted Gigabit Ethernet to interconnect their server clusters.

1.2.4 Virtual Machines and Virtualization Middleware

A conventional computer uses a single operating system (OS) image, leading to a rigid structure
that tightly couples application software with specific hardware. This makes it difficult for
software to run on different machines with varying instruction sets or OS environments. Virtual
machines (VMs) solve these issues by improving resource utilization, application flexibility,
software management, and security.

For building large clusters, grids, and cloud environments, significant computing, storage, and
networking resources must be virtualized and aggregated to form a unified system image. Cloud
computing, in particular, relies on the dynamic virtualization of processors, memory, and I/O. Key
concepts like VMs, virtual storage, and virtual networking, along with their virtualization software, are
essential for operating modern large-scale systems. Figure 1.12 visually represents different VM
architectures.

Department of Artificial Intelligence & Data Science, AITM Page 30


CLOUD COMPUTING Scheme : 2022

1.2.4.1 Virtual Machines

In Figure 1.12, the host machine contains physical hardware, such as an x86 desktop running
Windows OS (as shown in part (a)). A Virtual Machine (VM) can be created on any hardware
system, with virtual resources managed by a guest OS to run specific applications. Between the
VM and the host, a middleware layer known as the Virtual Machine Monitor (VMM) is required.

1. Native VM (Bare-metal VM):


o Shown in Figure 1.12(b), this setup uses a hypervisor (VMM) running in
privileged mode to directly manage the hardware (CPU, memory, I/O). The guest
OS (e.g., Linux) runs on top of the hypervisor. This approach, known as bare-
metal, does not require the host OS to manage the hardware. An example is the
XEN hypervisor.
2. Host VM:
o Figure 1.12(c) illustrates a host VM setup where the VMM runs in non-privileged
mode. In this case, the host OS is not modified, and the VMM operates on top of
it.

Department of Artificial Intelligence & Data Science, AITM Page 31


CLOUD COMPUTING Scheme : 2022
3. Dual-mode VM:
o As shown in Figure 1.12(d), part of the VMM operates at the user level and
another part at the supervisor level. This may require some modification to the
host OS.

Multiple VMs can be run on the same hardware system, offering hardware independence for the
OS and applications. A VM can run on a different OS than the host, providing portability and
flexibility for running applications across various platforms.

1.2.4.2 VM Primitive Operations

The VMM provides the VM abstraction to the guest OS. With full virtualization, the
VMM exports a VM abstraction identical to the physical machine so that a standard OS such as
Windows 2000 or Linux can run just as it would on the physical hardware. Low-level VMM
operations are indicated by Mendel Rosenblum [41] and illustrated in Figure 1.13.

Department of Artificial Intelligence & Data Science, AITM Page 32


CLOUD COMPUTING Scheme : 2022
First, the VMs can be multiplexed between hardware machines, as shown in Figure 1.13(a).

• Second, a VM can be suspended and stored in stable storage, as shown in Figure 1.13(b).

• Third, a suspended VM can be resumed or provisioned to a new hardware platform, as shown


in Figure 1.13(c).

• Finally, a VM can be migrated from one hardware platform to another, as shown in Figure
1.13(d).

These VM operations enable a VM to be provisioned to any available hardware platform. They


also enable flexibility in porting distributed application executions. Furthermore, the VM
approach will significantly enhance the utilization of server resources. Multiple server functions
can be consolidated on the same hardware platform to achieve higher system efficiency. This
will eliminate server sprawl via deployment of systems as VMs, which move transparency to the
shared hardware. With this approach, VMware claimed that server utilization could be increased
from its current 5–15 percent to 60–80 percent.

1.2.4.3 Virtual Infrastructures

Virtual Infrastructures Physical resources for compute, storage, and networking at the bottom of
Figure 1.14 are mapped to the needy applications embedded in various VMs at the top.
Hardware and software are then sepa rated. Virtual infrastructure is what connects resources to
distributed applications. It is a dynamic mapping of system resources to specific applications.
The result is decreased costs and increased efficiency and responsiveness. Virtualization for
server consolidation and containment is a good example of this.

Department of Artificial Intelligence & Data Science, AITM Page 33


CLOUD COMPUTING Scheme : 2022

1.2.5 Data Center Virtualization for Cloud Computing


Cloud architecture is built with commodity hardware and network devices. Almost all cloud
platforms choose the popular x86 processors. Low-cost terabyte disks and Gigabit Ethernet are
used to build data centers. Data center design emphasizes the performance/price ratio over speed
performance alone. In other words, storage and energy efficiency are more important than shear
speed performance.

1.2.5.1 Data Center Growth and Cost Breakdown

A large data center may be built with thousands of servers. Smaller data centers are typically
built with hundreds of servers. The cost to build and maintain data center servers has increased
over the years.Accordingtoa2009IDCreport (see Figure 1.14), typically only 30 percent of data
center costs goes toward purchasing IT equipment (such as servers and disks), 33 percent is
attributed to the chiller, 18 percent to the uninterruptible power supply (UPS),9percent to
computer room air conditioning (CRAC), and the remaining 7 percent to power distribution,
lighting, and transformer costs. Thus, about 60 percent of the cost to run a data center is allocated
to management and maintenance. The server purchase cost did not increase much with time. The
cost of electricity and cooling did increase from 5 percent to 14 percent in 15 years.

1.2.5.2 Low-Cost Design Philosophy

High-end switches or routers may be too cost-prohibitive for building data centers. Thus, using
high-bandwidth networks may not fit the economics of cloud computing. Given a fixed budget,
commodity switches and networks are more desirable in data centers. Similarly, using
commodity x86 servers is more desired over expensive mainframes. The software layer handles
network traffic balancing, fault tolerance, and expandability. Currently, nearly all cloud
computing data centers use Ethernet as their fundamental network technology.

Department of Artificial Intelligence & Data Science, AITM Page 34


CLOUD COMPUTING Scheme : 2022

1.2.5.3 Convergence of Technologies

Cloud computing is driven by the convergence of four key technological areas:

1. Hardware Virtualization and Multi-core Chips: These enable dynamic configurations


in the cloud by providing efficient use of hardware resources.
2. Utility and Grid Computing: They form the foundational infrastructure for cloud
services, supporting distributed computing across networks.
3. Service-Oriented Architecture (SOA), Web 2.0, and Mashups: These technologies
integrate diverse platforms and push the cloud to offer more flexible, scalable services.
4. Autonomic Computing and Data Center Automation: These advancements improve
the management and operation of cloud data centers, making them more automated and
self-sufficient.

Data Deluge: Jim Gray highlighted the challenge of managing and analyzing the massive influx
of data from sensors, experiments, simulations, archives, and the web. This "data deluge"
demands new tools for data preservation, movement, access, and analysis, including scalable file
systems, databases, algorithms, workflows, and visualization techniques.

Impact on Science: The shift towards data-centric science (e-science) is creating a new
paradigm of discovery through data-intensive technologies. Cloud computing enables the capture
and analysis of vast data sets, supporting interdisciplinary research across fields like biology,
chemistry, physics, and social sciences.

MapReduce: At the platform level, the MapReduce programming model allows for easy data
parallelism and fault tolerance, which is essential for handling large-scale data processing in the
cloud. Iterative MapReduce extends these capabilities to support more complex data mining
algorithms, crucial for scientific applications.

Convergence of Data-Intensive Science and Multicore Computing: Cloud computing, data-


intensive science, and multicore technologies are revolutionizing computing by creating large
clusters of commodity hardware (e.g., many-core GPU clusters). This convergence enables the

Department of Artificial Intelligence & Data Science, AITM Page 35


CLOUD COMPUTING Scheme : 2022
transformation of raw data into machine wisdom, shaping the future of computing architectures
and programming models

1.3 SYSTEM MODELS FOR DISTRIBUTED AND CLOUD COMPUTING

Distributed and cloud computing systems consist of numerous autonomous computer nodes,
interconnected via Storage Area Networks (SANs), Local Area Networks (LANs), or Wide Area
Networks (WANs) in a hierarchical manner. Modern networking technology allows a few LAN
switches to easily connect hundreds of machines into a working cluster. WANs can connect
multiple local clusters to form a larger "cluster of clusters," creating massive systems with
potentially millions of computers connected to edge networks.

These large-scale systems are considered highly scalable, capable of achieving web-scale
connectivity both physically and logically. Table 1.2 classifies massive systems into four
categories:

1. Clusters
2. Peer-to-Peer (P2P) Networks
3. Computing Grids
4. Internet Clouds

These systems can range from hundreds to millions of computers, with nodes participating
collaboratively or cooperatively. The classification also considers various technical and
application aspects, highlighting how these systems function collectively to achieve distributed
computing tasks at different scales.

Department of Artificial Intelligence & Data Science, AITM Page 36


CLOUD COMPUTING Scheme : 2022

1.3.1 Clusters of Cooperative Computers

A computing cluster consists of interconnected stand-alone computers which work cooperatively


as a single integrated computing resource. In the past, clustered computer systems have
demonstrated impressive results in handling heavy workloads with large data sets.

1.3.1.1 Cluster Architecture

Figure 1.15 shows A cluster architecture consists of server nodes connected through a low-
latency, high-bandwidth interconnection network. This network can be a Storage Area Network
(SAN) like Myrinet or a Local Area Network (LAN) like Ethernet. To scale the cluster with
more nodes, the interconnection can be built hierarchically using multiple levels of Gigabit
Ethernet, Myrinet, or InfiniBand switches.

Department of Artificial Intelligence & Data Science, AITM Page 37


CLOUD COMPUTING Scheme : 2022
The architecture can be extended through SAN, LAN, or even Wide Area Networks (WANs) to
create larger clusters. These clusters are typically connected to the Internet via a Virtual Private
Network (VPN) gateway, which identifies the cluster through its IP address.

In most clusters, the node computers are loosely coupled, meaning each node's resources are
independently managed by its own operating system (OS), resulting in multiple system images.
Each autonomous node operates under its OS, so the cluster doesn't share a single system image,
but instead, the nodes work together while retaining individual OS control.

1.3.1.2 Single-System Image

A Single-System Image (SSI) is an ideal concept in cluster design, as noted by Greg Pfister. The
goal of SSI is to merge multiple system images into one cohesive unit, allowing users to interact
with the cluster as if it were a single machine.

Key Features of SSI:

1. Unified Resource Management: SSI enables sharing of CPUs, memory, and I/O across
all nodes in the cluster, presenting them as an integrated resource.
2. User Transparency: It creates an illusion for users, who see the cluster as one powerful
system rather than a collection of independent computers.

Department of Artificial Intelligence & Data Science, AITM Page 38


CLOUD COMPUTING Scheme : 2022
3. Middleware or Cluster OS: To achieve SSI, cluster designers seek specific operating
systems or middleware solutions that can manage and coordinate resources across the
cluster effectively.

Without SSI, a cluster with multiple system images functions merely as a group of independent
computers, lacking the seamless integration that SSI provides. This integrated approach enhances
the usability and performance of clustered systems, making them more effective for
computational tasks.

1.3.1.3 Hardware, Software, and Middleware Support

In the context of high-performance computing (HPC) clusters, particularly those known as


Massively Parallel Processors (MPPs), several components are essential for effective design
and functionality:

1. Building Blocks:
o Computer Nodes: These can be PCs, workstations, servers, or Symmetric
Multiprocessing (SMP) systems.
o Communication Software: Essential software such as PVM (Parallel Virtual
Machine) or MPI (Message Passing Interface) facilitates communication
among nodes.
o Network Interface Cards: Each node requires a network interface card to
connect with other nodes.
2. Operating System:
o Most HPC clusters operate under Linux OS, which is favored for its performance
and flexibility in managing resources.
3. High-Bandwidth Interconnection:
o Nodes are interconnected using high-speed networks like Gigabit Ethernet,
Myrinet, or InfiniBand, which enable efficient data transfer between nodes.
4. Middleware Support:
o Specialized middleware is necessary to implement Single-System Image (SSI) or
ensure High Availability (HA).

Department of Artificial Intelligence & Data Science, AITM Page 39


CLOUD COMPUTING Scheme : 2022
o This middleware enables features such as distributed shared memory (DSM),
allowing users to share memory across distributed nodes despite the multiple
images inherent in distributed memory systems.
5. Cluster Operations:
o Clusters can run both sequential and parallel applications, but special parallel
environments are required to manage and utilize cluster resources effectively.
o Achieving SSI features can be costly and complex, leading many clusters to
remain loosely coupled.
6. Virtual Clusters:
o With virtualization technologies, it is possible to dynamically create multiple
virtual clusters based on user demand.

This combination of hardware, software, and middleware is crucial for building efficient,
scalable, and user-friendly HPC clusters that can handle complex computational tasks.

1.3.1.4 Major Cluster Design Issues

Unfortunately, a cluster-wide OS for complete resource sharing is not available yet. Middleware
or OS extensions were developed at the user space to achieve SSI at selected functional levels.
Without this middleware, cluster nodes cannot work together effectively to achieve cooperative
computing. The software environments and applications must rely on the middleware to achieve
high performance. The cluster benefits come from scalable performance, efficient message
passing, high system availability, seamless fault tolerance, and cluster-wide job management, as
summarized in Table 1.3.

Department of Artificial Intelligence & Data Science, AITM Page 40


CLOUD COMPUTING Scheme : 2022

1.3.2 Grid Computing Infrastructures

In the past 30 years, users have experienced a natural growth path from Internet to web and grid
computing services. Internet services such as the Telnet command enables a local computer to
connect to a remote computer. A web service such as HTTP enables remote access of remote
web pages. Grid computing is envisioned to allow close interaction among applications running
on distant computers simultaneously. The evolution from Internet to web and grid services is
certainly playing a major role in this growth.

1.3.2.1ComputationalGrids

Department of Artificial Intelligence & Data Science, AITM Page 41

You might also like