0% found this document useful (0 votes)
25 views25 pages

Parallel and Distributed Computing

Uploaded by

mirzazain3310
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views25 pages

Parallel and Distributed Computing

Uploaded by

mirzazain3310
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Parallel and Distributed Computing

Difference between Parallel and distributed computing


Why use parallel and distributed systems?
Why not use them?

A distributed system is a model where distributed applications are running on multiple computers
linked by a communications network. Sometimes it is also called loosely coupled systems because in
which each processor has its own local memory and processing units. LOCUS and MICROS are some
examples of distributed operating systems.
Advantages of Distributed Systems:
 Scalability: Distributed systems can be easily scaled by adding more computers to the
network.
 Fault Tolerance: Distributed systems can recover from failures by redistributing work to other
computers in the network.
 Geographical Distribution: Distributed systems can be geographically distributed, allowing
for better performance and resilience.

Disadvantages of Distributed Systems:


 Complexity: Distributed systems are more complex to design and maintain compared to
single computer systems.
 Communication Overhead: Communication between computers in a distributed system adds
overhead and can impact performance.
 Security: Distributed systems are more vulnerable to security threats, as the communication
between computers can be intercepted and compromised.
Parallel and Distributed Computing
Parallel Systems are designed to speed up the execution of programs by dividing the programs into
multiple fragments and processing these fragments at the same time. Flynn has classified computer
systems into four types based on parallelism in the instructions and in the data streams.
1. Single Instruction stream, single data stream
2. Single Instruction stream, multiple data stream
3. Multiple Instruction stream, single data stream
4. Multiple Instruction stream, multiple data stream
Advantages of Parallel Systems:
 High Performance: Parallel systems can execute computationally intensive tasks more quickly
compared to single processor systems.
 Cost Effective: Parallel systems can be more cost-effective compared to distributed systems,
as they do not require additional hardware for communication.

Disadvantages of Parallel Systems:


 Limited Scalability: Parallel systems have limited scalability as the number of processors or
cores in a single computer is finite.
 Complexity: Parallel systems are more complex to program and debug compared to single
processor systems.
 Synchronization Overhead: Synchronization between processors in a parallel system can add
overhead and impact performance.

Difference Between Distributed System and Parallel System:

S.
No Parallel System Distributed System

Parallel systems are the systems that can


In these systems, applications are running on
process the data simultaneously, and
multiple computers linked by communication
increase the computational speed of a
lines.
1. computer system.

Parallel systems work with the The distributed system consists of a number of
simultaneous use of multiple computer computers that are connected and managed so
resources which can include a single that they share the job processing load among
2. computer with multiple processors. various computers distributed over the network.

Tasks are performed with a more speedy


Tasks are performed with a less speedy process.
3. process.
Parallel and Distributed Computing

S.
No Parallel System Distributed System

These systems are multiprocessor In Distributed Systems, each processor has its
4. systems. own memory.

It is also known as a tightly coupled Distributed systems are also known as loosely
5. system. coupled systems.

These systems have close These systems communicate with one another
communication with more than one through various communication lines, such as
6. processor. high-speed buses or telephone lines.

These systems share a memory, clock, These systems do not share memory or clock in
7. and peripheral devices contrast to parallel systems.

In this there is no global clock in distributed


In this, all processors share a single
computing, it uses various synchronization
master clock for synchronization.
8. algorithms.

E.g:- Hadoop, MapReduce, Apache E.g:- High-Performance Computing clusters,


9. Cassandra Beowulf clusters

Speedup and Amdahl's Law


Amdahl’s Law, proposed by Gene Amdahl in 1967, explains the theoretical speedup of a program
when part of it is improved or parallelized. It is widely used in parallel computing to predict the
benefits of using multiple processors.
Parallel and Distributed Computing
The main idea is that the speedup of a system is limited by the portion of the program that cannot be
parallelized (the sequential part).
Key Terms
 Speedup (S):
Performance improvement gained by enhancement.
Old Execution time
S= N ew ExecutionTime
 Fraction Enhanced (P):
The proportion of the program that can be parallelized (0 < P < 1).
 Number of Processors (N):
The number of parallel units used for execution.
Formula
1
S= ( 1−P ) + P
N
 (1 - P): sequential portion (cannot be parallelized).
 P/N: parallel portion divided among N processors.
Maximum Speedup
 If processors are unlimited (N → ∞)
1
Smax= 1−P
 This means the non-parallelizable fraction sets the performance limit.
 If P = 1 (100% parallelizable), theoretical speedup is infinite (not realistic).
Example
Suppose a program spends 20% (P = 0.2) of its time in parallelizable work, and we use 5 processors
(N = 5):
1
S= ( 1−P ) + P
N
1
S= ( 1−0.2 ) + 0.2
5
1
S= ( 0.8 ) +0.04
1
S= 0 .84
Parallel and Distributed Computing

S=1.19
The system improves by only 19%, showing that the 80% sequential part is the bottleneck.
Advantages
 Provides a clear upper bound on performance.
 Helps identify bottlenecks in programs.
 Useful in guiding hardware/software design decisions.
Disadvantages
 Assumes the sequential part is fixed (in practice, it can sometimes be optimized).
 Assumes processors are identical, not always true in heterogeneous systems.
 Ignores real-world factors like communication, synchronization, and load balancing overhead.

Hardware architectures:
Multiprocessors (shared memory)
1. Multiprocessor: A Multiprocessor is a computer system with two or more central processing units
(CPUs) share full access to a common RAM. The main objective of using a multiprocessor is to boost
the system’s execution speed, with other objectives being fault tolerance and application matching.
There are two types of multiprocessors, one is called shared memory multiprocessor and another is
distributed memory multiprocessor. In shared memory multiprocessors, all the CPUs shares the
common memory but in a distributed memory multiprocessor, every CPU has its own private
memory.
Parallel and Distributed Computing
The interconnection among two or more processor and shared memory is done with three methods

1)Time shared common bus

2)Multiport memories

3)Crossbar switch network

1)Time shared common bus

Illustrate time shared common bus

As the name itself indicates, int this method is contains a single shared bus through which all
processor & memory unit can be communicated.

Consider CPU-1 is interacting with memory unit using common shared bud in that case all other
processor must be idle as we have only one bus to communicate.

Advantage:

 Simple to implement.

 Due to single common bus cost to implement is very less.

Disadvantage:

 Data transfer rate is slow.


Parallel and Distributed Computing
2)Multiport memories

Unlike in the shared common bus method, hence it contains separate bus for each processor to
communicate with the memory module.

Suppose CPU-1 wants to interact with memory module 1 then port mm1 is enabled. Similarly CPU-4
wants to

to interact with memory module 4 then port mm4 is enabled. Hence all the process can be
communicated parallelly. If more than one CPU request for same time memory module, priority will
be given in the order of CPU-1,CPU-2,CPU-3,CPU-4.

Multiport memories architecture

3)Crossbar switch network


Parallel and Distributed Computing
Here instead multiport unlike in multiport memories, a switch will be installed between memory unit
and CPU. Switch is responsible for whether to pass the request to a particular memory module or not
based on the request made for.

Illustrate cross-bar switch network

Advantage:

 High data through rate.

Disadvantage:

 Complex to implement as more switches involved.

 Costlier to implement.

Applications of Multiprocessor -

1. As a uniprocessor, such as single instruction, single data stream (SISD).

2. As a multiprocessor, such as single instruction, multiple data stream (SIMD), which is usually
used for vector processing.
Parallel and Distributed Computing
3. Multiple series of instructions in a single perspective, such as multiple instruction, single data
stream (MISD), which is used for describing hyper-threading or pipelined processors.

4. Inside a single system for executing multiple, individual series of instructions in multiple
perspectives, such as multiple instruction, multiple data stream (MIMD).

Benefits of using a Multiprocessor -

 Enhanced performance.

 Multiple applications.

 Multi-tasking inside an application.

 High throughput and responsiveness.

 Hardware sharing among CPUs.

Advantages:

Improved performance: Multiprocessor systems can execute tasks faster than single-processor
systems, as the workload can be distributed across multiple processors.

Better scalability: Multiprocessor systems can be scaled more easily than single-processor systems,
as additional processors can be added to the system to handle increased workloads.

Increased reliability: Multiprocessor systems can continue to operate even if one processor fails, as
the remaining processors can continue to execute tasks.

Reduced cost: Multiprocessor systems can be more cost-effective than building multiple single-
processor systems to handle the same workload.

Enhanced parallelism: Multiprocessor systems allow for greater parallelism, as different processors
can execute different tasks simultaneously.

Disadvantages:

Increased complexity: Multiprocessor systems are more complex than single-processor systems, and
they require additional hardware, software, and management resources.

Higher power consumption: Multiprocessor systems require more power to operate than single-
processor systems, which can increase the cost of operating and maintaining the system.

Difficult programming: Developing software that can effectively utilize multiple processors can be
challenging, and it requires specialized programming skills.
Parallel and Distributed Computing
Synchronization issues: Multiprocessor systems require synchronization between processors to
ensure that tasks are executed correctly and efficiently, which can add complexity and overhead to the
system.

Limited performance gains: Not all applications can benefit from multiprocessor systems, and some
applications may only see limited performance gains when running on a multiprocessor system.

2. Multicomputer: A multicomputer system is a computer system with multiple processors that are
connected together to solve a problem. Each processor has its own memory and it is accessible by that
particular processor and those processors can communicate with each other via an interconnection
network.

As the multicomputer is capable of messages passing between the processors, it is possible to divide
the task between the processors to complete the task. Hence, a multicomputer can be used for
distributed computing. It is cost effective and easier to build a multicomputer than a multiprocessor.

Difference between multiprocessor and Multicomputer:

1. Multiprocessor is a system with two or more central processing units (CPUs) that is capable
of performing multiple tasks where as a multicomputer is a system with multiple processors
that are attached via an interconnection network to perform a computation task.
Parallel and Distributed Computing
2. A multiprocessor system is a single computer that operates with multiple CPUs where as a
multicomputer system is a cluster of computers that operate as a singular computer.

3. Construction of multicomputer is easier and cost effective than a multiprocessor.

4. In multiprocessor system, program tends to be easier where as in multicomputer system,


program tends to be more difficult.

5. Multiprocessor supports parallel computing, Multicomputer supports distributed computing.

Advantages:

Improved performance: Multicomputer systems can execute tasks faster than single-computer
systems, as the workload can be distributed across multiple computers.

Better scalability: Multicomputer systems can be scaled more easily than single-computer systems,
as additional computers can be added to the system to handle increased workloads.

Increased reliability: Multicomputer systems can continue to operate even if one computer fails, as
the remaining computers can continue to execute tasks.

Reduced cost: Multicomputer systems can be more cost-effective than building a single large
computer system to handle the same workload.

Enhanced parallelism: Multicomputer systems allow for greater parallelism, as different computers
can execute different tasks simultaneously.

Disadvantages:

Increased complexity: Multicomputer systems are more complex than single-computer systems, and
they require additional hardware, software, and management resources.

Higher power consumption: Multicomputer systems require more power to operate than single-
computer systems, which can increase the cost of operating and maintaining the system.

Difficult programming: Developing software that can effectively utilize multiple computers can be
challenging, and it requires specialized programming skills.

Synchronization issues: Multicomputer systems require synchronization between computers to


ensure that tasks are executed correctly and efficiently, which can add complexity and overhead to the
system.

Network latency: Multicomputer systems rely on a network to communicate between computers, and
network latency can impact system performance
Parallel and Distributed Computing

Networks of workstations (distributed memory)


A network of workstations (NOW) is a distributed memory parallel computing system that connects
multiple high-performance computers to achieve high processing power at a low cost. Unlike a central
supercomputer, a NOW uses commodity hardware and software, making it a cost-effective alternative
for computationally intensive tasks.

Key concepts of a NOW

 Distributed memory: Each workstation in the network has its own private memory. For
workstations to access remote data, they must communicate with other nodes over a high-
speed network.

 Parallel computing: Computational tasks are broken down and executed simultaneously
across multiple workstations to improve performance. Programming for this requires a
strategy for distributing and moving data efficiently.

 High-speed network: Workstations are connected by high-speed networks, such as Ethernet,


which have become faster and more affordable over time.

Primary benefits

Implementing a network of workstations offers significant advantages for businesses and researchers.

 Enhanced performance: High-performance hardware, including multi-core CPUs and


advanced GPUs, enables faster processing for demanding applications like 3D rendering,
scientific simulations, and software development.

 Cost-effectiveness: Utilizing commodity hardware and software makes parallel processing


capabilities more accessible and affordable compared to specialized supercomputers.

 Centralized management and security: IT administrators can manage software updates,


user permissions, and security settings for all workstations from a central location.
Centralized storage and backups also reduce the risk of data loss.

 Improved collaboration: Workstations enable seamless file sharing, access to centralized


databases, and real-time collaboration on cloud-based documents. For remote workers, a
Virtual Private Network (VPN) can provide secure access.

 Scalability: Network capacity can be increased by adding more workstations as needed,


allowing the system to grow with the organization's needs.

Modern implementations
Parallel and Distributed Computing
The concept of networked workstations has evolved with advancements in technology, particularly
with the rise of virtualization and cloud computing.

 Desktop workstations: These are powerful, on-premise PCs that connect to local servers and
network peripherals.

 Virtual workstations: Workstation environments are hosted on a central server, and users
access them remotely using thin clients or less powerful machines. This centralizes
management, improves security, and pools resources for more efficient use.

 Cloud-based workstations: Similar to virtual workstations, these environments are hosted


on a cloud platform like Amazon WorkSpaces or Microsoft Windows 365. This model offers
scalable, on-demand resources accessible from anywhere with an internet connection.

Hardware and software components

To set up a modern network of workstations, several key components are necessary.

 Workstation hardware: High-end processors (CPU), ample memory (RAM), powerful


graphics cards (GPU), and fast storage (SSD) to meet professional demands.

 Network devices: A high-quality core switch to connect servers and workstations, along with
routers to manage data traffic.

 Operating system: Choices include Windows, macOS, or specialized Linux distributions,


depending on the software requirements.

 Connectivity: High-speed Ethernet cables (Cat5e, Cat6) are typical for wired connections,
while Wi-Fi provides mobility for laptops and mobile workstations.

 Networking protocols: Standard protocols include TCP/IP for communication, DHCP for
dynamic IP assignment, and DNS for name resolution. File and print sharing often rely on
protocols like SMB or NFS.

Clusters (latest variation).


Introduction:
Cluster computing is a collection of tightly or loosely connected computers that work together so that
they act as a single entity. The connected computers execute operations all together thus creating the
idea of a single system. The clusters are generally connected through fast local area networks (LANs).
Parallel and Distributed Computing

Cluster Computing

Why is Cluster Computing important?

1. Cluster computing gives a relatively inexpensive, unconventional to the large server or


mainframe computer solutions.

2. It resolves the demand for content criticality and process services in a faster way.

3. Many organizations and IT companies are implementing cluster computing to augment their
scalability, availability, processing speed and resource management at economic prices.

4. It ensures that computational power is always available.

5. It provides a single general strategy for the implementation and application of parallel high-
performance systems independent of certain hardware vendors and their product decisions.
Parallel and Distributed Computing

A
Simple Cluster Computing Layout

Types of Cluster computing :


1. High performance (HP) clusters :
HP clusters use computer clusters and supercomputers to solve advance computational problems.
They are used to performing functions that need nodes to communicate as they perform their jobs.
They are designed to take benefit of the parallel processing power of several nodes.

2. Load-balancing clusters :

Incoming requests are distributed for resources among several nodes running similar programs or
having similar content. This prevents any single node from receiving a disproportionate amount of
task. This type of distribution is generally used in a web-hosting environment.

3. High Availability (HA) Clusters :

HA clusters are designed to maintain redundant nodes that can act as backup systems in case any
failure occurs. Consistent computing services like business activities, complicated databases,
customer services like e-websites and network file distribution are provided. They are designed to
give uninterrupted data availability to the customers.
Classification of Cluster :

1. Open Cluster :

IPs are needed by every node and those are accessed only through the internet or web. This type of
cluster causes enhanced security concerns.

2. Close Cluster :
Parallel and Distributed Computing
The nodes are hidden behind the gateway node, and they provide increased protection. They need
fewer IP addresses and are good for computational tasks.
Cluster Computing Architecture :

 It is designed with an array of interconnected individual computers and the computer systems
operating collectively as a single standalone system.

 It is a group of workstations or computers working together as a single, integrated computing


resource connected via high speed interconnects.

 A node – Either a single or a multiprocessor network having memory, input and output
functions and an operating system.

 Two or more nodes are connected on a single line or every node might be connected
individually through a LAN connection.

Cluster Computing Architecture

Components of a Cluster Computer :

1. Cluster Nodes

2. Cluster Operating System


Parallel and Distributed Computing
3. The switch or node interconnect

4. Network switching hardware

Cluster Components

Advantages of Cluster Computing :

1. High Performance :

The systems offer better and enhanced performance than that of mainframe computer networks.

2. Easy to manage :

Cluster Computing is manageable and easy to implement.

3. Scalable :

Resources can be added to the clusters accordingly.

4. Expandability :

Computer clusters can be expanded easily by adding additional computers to the network. Cluster
computing is capable of combining several additional resources or the networks to the existing
computer system.

5. Availability :
Parallel and Distributed Computing
The other nodes will be active when one node gets failed and will function as a proxy for the failed
node. This makes sure for enhanced availability.

6. Flexibility :

It can be upgraded to the superior specification or additional nodes can be added.


Disadvantages of Cluster Computing :

1. High cost :

It is not so much cost-effective due to its high hardware and its design.

2. Problem in finding fault :

It is difficult to find which component has a fault.

3. More space is needed :

Infrastructure may increase as more servers are needed to manage and monitor.
Applications of Cluster Computing :

 Various complex computational problems can be solved.

 It can be used in the applications of aerodynamics, astrophysics and in data mining.

 Weather forecasting.

 Image Rendering.

 Various e-commerce applications.

 Earthquake Simulation.

 Petroleum reservoir simulation.

Software architectures:
In today's world, people of all ages, from children to adults, use smartphones, laptops, computers, and
PDAs to solve both simple and complex tasks online using various software programs. To the user,
everything may seem simple and easy to use. And that’s the point of good software to provide high-
quality services in a user-friendly environment.

There the overall abstraction of any software product makes it looks like simple and very easier for
user to use. But in back if we will see building a complex software application includes complex
processes which comprises of a number of elements of which coding being just one part of the puzzle.
After gathering of business requirement by a business analyst then developer team begins working on
the Software Requirement Specification (SRS).
Parallel and Distributed Computing
This is followed by steps like testing, acceptance, deployment, maintenance etc. Every software
development process is carried out by following some sequential steps which comes under Software
Development Life Cycle (SDLC). In the design phase of Software Development Life Cycle the
software architecture is defined and documented. This article delves into the importance of software
architecture within the SDLC.

What is Software Architecture?

Software Architecture defines fundamental organization of a system and more simply defines a
structured solution. It determines how the various components of a software system are assembled,
how they relate to one another, and how they communicate. Essentially, it serves as a blueprint for
the application and a foundation for the development team to build upon.

Software architecture defines a list of things which results in making many things easier in the
software development process.

 System structure: The organization and arrangement of components.

 System behavior: The expected functionality and performance.

 Component relationships: How different parts of the system interact.

 Communication structure: The way components communicate with each other.

 Stakeholder balance: Meeting the needs and expectations of all stakeholders.

 Team structure: How the development team is organized and coordinated.

 Early design decisions: Making important choices early on to guide development.

Key Characteristics of Software Architecture

Software architecture is a multifaceted concept, and architects often categorize its characteristics
based on various factors such as operation, requirements, and structure. Here are some important
characteristics to consider:

Operational Architecture Characteristics

 Availability: The system should be accessible when needed.

 Performance: The system should meet performance goals such as speed and responsiveness.

 Reliability: The system should work consistently without failure.

 Fault tolerance: The system should gracefully handle errors and failures.
Parallel and Distributed Computing
 Scalability: The system should be able to handle increasing loads without performance
degradation.

Structural Architecture Characteristics

 Configurability: The ability to configure the system according to needs.

 Extensibility: The system should be easily extendable to add new features.

 Supportability: The ease with which the system can be maintained and supported.

 Portability: The system should work across different environments.

 Maintainability: The system should be easy to update and fix over time.

Cross-Cutting Architecture Characteristics

 Accessibility: Ensuring the system is usable by a wide range of people, including those with
disabilities.

 Security: Protecting the system from unauthorized access and data breaches.

 Usability: Ensuring the system is easy to use and intuitive for users.

 Privacy: Protecting users’ sensitive information.

 Feasibility: The system should be realistic to develop within the constraints.

SOLID principles of Software Architecture

Each character of the word SOLID defines one principle of software architecture.
The SOLID principles are key guidelines for creating well-structured, maintainable software
architectures This SOLID principle is followed to avoid product strategy mistakes. A software
architecture must adhere to SOLID principle to avoid any architectural or developmental failure.
Parallel and Distributed Computing

S.O.L.I.D PRINCIPLE

 Single Responsibility - Each services or module should have only one responsibility or
purpose .

 Open-Closed Principle - Software modules should be independent and expandable.

 Liskov Substitution Principle - Objects or services should be interchangeable without


altering the correctness of the system.

 Interface Segregation Principle - Software should be divided into such microservices there
should not be any redundancies.

 Dependency Inversion Principle - Higher-levels modules should not be depending on low-


lower-level modules and changes in higher level will not affect to lower level.

Why is Software Architecture Important?

Software architecture comes under design phase of software development life cycle. It serves as one
of the first steps in the software development process. Without software architecture proceeding to
Parallel and Distributed Computing
software development is like building a house without designing architecture of house. So software
architecture is one of important part of software application development.
In technical and developmental aspects point of view below are reasons software architecture are
important.

 Optimizes quality attributes: Architects select quality attributes to focus on, such as
performance and scalability.

 Facilitates early prototyping: Architecture allows for early prototypes to be built, offering
insight into system behavior.

 Component-based development: Systems are often built using components, which makes
them easier to develop, test, and maintain.

 Adapts to changes: Architecture helps manage and integrate changes smoothly throughout
the development process.

Besides all these software architecture is also important for many other factors like quality of
software, reliability of software, maintainability of software, Supportability of software and
performance of software and so on.

Advantages of Software Architecture

Good software architecture provides several advantages:

 Solid foundation: It lays the groundwork for a successful project, guiding development.

 Improved performance: A well-designed architecture can improve system efficiency.

 Reduced costs: Efficient development practices reduce costs over time.

 Scalability and flexibility: The system can adapt to future changes or demands.

Disadvantages of Software Architecture

While software architecture is essential, it does come with some challenges:

 Tooling and standardization: Obtaining the right tools and maintaining consistent standards
can sometimes be a challenge.

 Uncertain predictions: It's not always possible to predict the success of a project based
solely on its architecture.
Parallel and Distributed Computing

Threads and shared memory


Parallel and Distributed Computing

Processes and message passing

Distributed shared memory (DSM)

Distributed shared data (DSD)

Possible research and project topics

Parallel Algorithms

Concurrency and synchronization

Data and work partitioning

Common parallelization strategies

Granularity

Load balancing, Examples: parallel search, parallel sorting, etc.

Shared-Memory Programming:

Threads, Pthreads, Locks and semaphores

Distributed-Memory Programming:

Message Passing, MPI, PVM.

Other Parallel Programming Systems

Distributed shared memory

Aurora: Scoped behavior and abstract data types

Enterprise:
Parallel and Distributed Computing

Process templates

Research Topics

You might also like