0% found this document useful (0 votes)

31 views9 pages

Lecture 3

Uploaded by

huzaifazafi593

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views9 pages

Lecture 3

Uploaded by

huzaifazafi593

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Exploring the Architecture of Parallel

Computing
Parallel computing architecture involves the simultaneous execution of
multiple computational tasks to enhance performance and efficiency. This
tutorial provides an in-depth exploration of parallel computing architecture,
including its components, types, and real-world applications.

Components of Parallel Computing Architecture

In parallel computing, the architecture comprises essential components such as
processors, memory hierarchy, interconnects, and software stack. These
components work together to facilitate efficient communication, data
processing, and task coordination across multiple processing units.
Understanding the roles and interactions of these components is crucial for
designing and optimizing parallel computing systems.

Processors
Processors are the central processing units responsible for executing
instructions and performing computations in parallel computing systems.
Different types of processors, such as CPUs, GPUs, and APUs, offer varying
degrees of parallelism and computational capabilities.

Central Processing Units (CPU)

 Multi-core CPUs: These CPUs feature multiple processing cores

integrated onto a single chip, allowing parallel execution of tasks. Each
core can independently execute instructions, enabling higher
performance and efficiency in multi-threaded applications.
 Multi-threaded CPUs: Multi-threaded CPUs support the simultaneous
execution of multiple threads within each core. This feature enhances
throughput and responsiveness by overlapping the execution of multiple
tasks, particularly in applications with parallelizable workloads.

Parallel & Distributed Computing 1 Deptt of CS & IT, HU

Graphical Processing Units (GPU)

 Stream processors: GPUs consist of numerous stream processors, also

known as shader cores, responsible for executing computational tasks in
parallel. These processors are optimized for data-parallel operations and
are particularly well suited for graphics rendering, scientific computing,
and machine learning tasks.
 CUDA cores: CUDA (Compute Unified Device Architecture) cores are
specialized processing units found in NVIDIA GPUs. These cores are
designed to execute parallel computing tasks programmed using the
CUDA parallel computing platform and application programming
interface (API). CUDA cores offer high throughput and efficiency for
parallel processing workloads.

Accelerated Processing Units (APU)

 CPU cores: Accelerated Processing Units (APUs) integrate both CPU and
GPU cores on a single chip. The CPU cores within APUs are responsible
for general-purpose computing tasks, such as executing application code,
handling system operations, and managing memory.
 GPU cores: Alongside CPU cores, APUs also include GPU cores optimized
for parallel computation and graphics processing. These GPU cores
provide accelerated performance for tasks such as image rendering, video
decoding, and parallel computing workloads.

Memory Hierarchy
Memory hierarchy comprises various levels of memory, including registers,
cache memory, main memory (RAM), and secondary storage (disk). Effective
management of memory hierarchy is crucial for optimizing data access and
minimizing latency in parallel computing systems.

Registers

 General-purpose registers: Registers directly accessible by the CPU

cores for storing temporary data and intermediate computation results.

Parallel & Distributed Computing 2 Deptt of CS & IT, HU

 Special-purpose registers: Registers dedicated to specific functions,
such as program counter, stack pointer, and status flags, essential for CPU
operations and control flow.

Cache Memory

 L1 Cache: Level 1 cache located closest to the CPU cores, offering fast
access to frequently accessed data and instructions.
 L2 Cache: Level 2 cache situated between L1 cache and main memory,
providing larger storage capacity and slightly slower access speeds.
 L3 Cache: Level 3 cache shared among multiple CPU cores, offering a
larger cache size and serving as a shared resource for improving data
locality and reducing memory access latency.

Main Memory (RAM)

 Dynamic RAM (DRAM): Main memory modules composed of dynamic

random-access memory cells, used for storing program instructions and
data during program execution.
 Static RAM (SRAM): Caches and buffer memory within the memory
hierarchy, offering faster access speeds and lower latency compared to
DRAM.
 Video RAM (VRAM): Dedicated memory on GPUs used for storing
textures, framebuffers, and other graphical data required for rendering
images and videos. VRAM enables high-speed access to graphics data
and enhances the performance of GPU-accelerated applications.

Secondary Storage (Disk)

 Hard Disk Drives (HDDs): Magnetic storage devices used for long-term
data storage and retrieval in parallel computing systems. HDDs provide
high-capacity storage but slower access speeds compared to main
memory.
 Solid State Drives (SSDs): Flash-based storage devices offer faster
access speeds and lower latency than HDDs. SSDs are commonly used as

Parallel & Distributed Computing 3 Deptt of CS & IT, HU

secondary storage in parallel computing systems to improve I/O
performance and reduce data access latency.

Interconnects
Interconnects facilitate communication and data transfer between processors
and memory units in parallel computing systems. High-speed interconnects,
such as buses, switches, and networks, enable efficient data exchange among
processing elements.

Buses

 System Bus: Connects the CPU, memory, and other internal components
within a computer system. It facilitates communication and data transfer
between these components.
 Memory Bus: Dedicated bus for transferring data between the CPU and
main memory (RAM). It ensures fast and efficient access to memory
resources.
 I/O Bus: Input/output bus connects peripheral devices, such as storage
devices, network interfaces, and accelerators, to the CPU and memory in
a parallel computing system.

Switches

 Crossbar Switches: High-performance switches that provide multiple

paths for data transmission between input and output ports. They enable
simultaneous communication between multiple pairs of devices,
improving bandwidth and reducing latency.
 Packet Switches: Switches that forward data in discrete packets based
on destination addresses. They efficiently manage network traffic by
dynamically allocating bandwidth and prioritizing packets based on
quality of service (QoS) parameters.

Parallel & Distributed Computing 4 Deptt of CS & IT, HU

Networks

 Ethernet: A widely used networking technology for local area networks

(LANs) and wide area networks (WANs). It employs Ethernet cables and
switches to transmit data packets between devices within a network.
 InfiniBand: A high-speed interconnect technology commonly used in
high-performance computing (HPC) environments. It offers low-latency,
high-bandwidth communication between computing nodes in clustered
systems.
 Fiber Channel: A storage area network (SAN) technology that enables
high-speed data transfer between servers and storage devices over fiber
optic cables. It provides reliable and scalable connectivity for enterprise
storage solutions.

Software Stack
The software stack consists of programming models, libraries, and operating
systems tailored for parallel computing. Parallel programming models, such as
MPI (Message Passing Interface) and OpenMP (Open Multi-Processing), provide
abstractions for expressing parallelism and coordinating tasks across
processors.

Parallel Programming Models

 Message Passing Interface (MPI): A standardized and widely used

parallel programming model for distributed memory systems. MPI
enables communication and coordination between parallel processes
running on different nodes in a parallel computing system.
 Open Multi-Processing (OpenMP): A parallel programming API
designed for shared memory systems. OpenMP simplifies parallel
programming by providing directives for specifying parallel regions, loop
parallelization, and thread synchronization.

Parallel & Distributed Computing 5 Deptt of CS & IT, HU

Parallel Libraries

 CUDA (Compute Unified Device Architecture): A parallel computing

platform and programming model developed by NVIDIA for GPU-
accelerated computing. CUDA enables developers to harness the
computational power of NVIDIA GPUs for general-purpose parallel
processing tasks.
 OpenCL (Open Computing Language): An open standard for parallel
programming across CPUs, GPUs, and other accelerators. OpenCL allows
developers to write parallel programs that execute efficiently on
heterogeneous computing platforms.

Operating Systems for Parallel Computing

 Linux: Linux-based operating systems, such as CentOS, Ubuntu, and Red

Hat Enterprise Linux, are widely used in parallel computing environments
due to their scalability, stability, and support for high-performance
computing (HPC) clusters.
 Windows Server: Microsoft Windows Server operating system provides
support for parallel computing workloads through features such as
Windows HPC Server and Windows Subsystem for Linux (WSL).
 HPC-specific OS distributions: Specialized operating system
distributions tailored for high-performance computing (HPC)
environments, such as CentOS HPC, Rocks Cluster Distribution, and SUSE
Linux Enterprise Server for HPC. These distributions offer optimized
configurations and tools for parallel computing applications.

Types of Parallel Computing Architectures

Parallel computing encompasses various architectures tailored to exploit
concurrency and enhance computational efficiency. These architectures,
including shared-memory, distributed-memory, and hybrid systems, offer
distinct approaches to harnessing parallelism.

Shared-Memory Architecture: In shared-memory architecture, multiple

processors share access to a common memory space. This architecture

Parallel & Distributed Computing 6 Deptt of CS & IT, HU

simplifies communication and data sharing among processors but requires
mechanisms for synchronization and mutual exclusion to prevent data hazards.

Shared Memory Parallelism

From a hardware perspective, a shared memory parallel architecture is a

computer that has a common physical memory accessible to a number of
physical processors. The two basic types of shared memory
architectures are Uniform Memory Access (UMA) and Non-Uniform Memory
Access (NUMA), as shown in Fig. Today, the most common form of UMA
architecture is the Symmetric Multiprocessor (SMP) machine, which consists of
multiple identical processors with equal level of access and access time to the
shared memory. Whereas the most common form of NUMA, architecture is the
machine made by inter-linking a number of SMPs. It is characterized by the fact
that the access time to different memory locations might vary for a processor.

Distributed-Memory Architecture: Distributed-memory architecture

comprises multiple independent processing units, each with its own memory
space. Communication between processors is achieved through message
passing over a network. This architecture offers scalability and fault tolerance
but requires explicit data distribution and communication protocols.

Parallel & Distributed Computing 7 Deptt of CS & IT, HU

Hybrid Architectures: Hybrid architectures combine elements of both shared-
memory and distributed-memory systems. These architectures leverage the
benefits of shared-memory parallelism within individual nodes and distributed-
memory scalability across multiple nodes, making them suitable for a wide
range of applications.

Parallel & Distributed Computing 8 Deptt of CS & IT, HU

Real-World Applications
Real-world applications of parallel computing span diverse domains, from
scientific simulations to big data analytics and high-performance computing.
Parallel computing architectures enable efficient processing and analysis of
large datasets, sophisticated simulations, and complex computational tasks.

Scientific Simulations and Modeling: Parallel computing architectures are

widely used in scientific simulations and modeling tasks, such as weather
forecasting, computational fluid dynamics, and molecular dynamics simulations.

Big Data Analytics: Parallel computing architectures power big data analytics
platforms, enabling processing and analysis of large datasets in distributed
environments. Applications include data mining, machine learning, and
predictive analytics.

High-Performance Computing (HPC): High-performance computing relies

on parallel computing architectures to solve computationally intensive
problems, including simulations, numerical analysis, and optimization tasks.

Image and Signal Processing: Parallel computing architectures are employed

in image and signal processing applications, such as image recognition, video
compression, and digital signal processing, to achieve real-time performance
and efficiency.

Parallel computing architecture offers a powerful framework for accelerating

computational tasks and solving complex problems efficiently. By
understanding the components, types, and real-world applications of parallel
computing architecture, developers and architects can design and deploy
scalable, high-performance computing systems across various domains.

Parallel & Distributed Computing 9 Deptt of CS & IT, HU

1.2.technologies For Network-Based Systems
No ratings yet
1.2.technologies For Network-Based Systems
20 pages
Lecture 4
No ratings yet
Lecture 4
27 pages
Unit-1 (Cloud Computing) 1. (Accessible) Scalable Computing Over The Internet
100% (1)
Unit-1 (Cloud Computing) 1. (Accessible) Scalable Computing Over The Internet
17 pages
PDC Lecture 09
No ratings yet
PDC Lecture 09
36 pages
Unit 3-Mca
No ratings yet
Unit 3-Mca
23 pages
Lecture 2
No ratings yet
Lecture 2
21 pages
Technologies For Network
No ratings yet
Technologies For Network
3 pages
Unit-1 Part-1
No ratings yet
Unit-1 Part-1
14 pages
Architecture
No ratings yet
Architecture
67 pages
Introduction to Parallel Computing
No ratings yet
Introduction to Parallel Computing
57 pages
Unit 1
No ratings yet
Unit 1
33 pages
Lecture 1 Introduction 1
No ratings yet
Lecture 1 Introduction 1
49 pages
RG1 Intro ParallelArch HPCAI Jan2020
No ratings yet
RG1 Intro ParallelArch HPCAI Jan2020
47 pages
Lecture 1 Introduction
No ratings yet
Lecture 1 Introduction
34 pages
Cuda
No ratings yet
Cuda
69 pages
Unit 1 - Part 1
No ratings yet
Unit 1 - Part 1
51 pages
Cloud Computing Unit-1
100% (1)
Cloud Computing Unit-1
88 pages
Understanding PGPU and CUDA Basics
No ratings yet
Understanding PGPU and CUDA Basics
70 pages
Lecture Notes: (R15A0529) B.Tech Iv Year - I Sem (R15) (2019 - 20)
No ratings yet
Lecture Notes: (R15A0529) B.Tech Iv Year - I Sem (R15) (2019 - 20)
223 pages
Parallel Programming FDP
No ratings yet
Parallel Programming FDP
43 pages
Distributed Systems & Virtualization
No ratings yet
Distributed Systems & Virtualization
41 pages
Intro Parallel Computing PDF
No ratings yet
Intro Parallel Computing PDF
58 pages
Understanding Parallel Computing Basics
No ratings yet
Understanding Parallel Computing Basics
22 pages
Module 1-Topic 1
No ratings yet
Module 1-Topic 1
36 pages
HPC BOOk
No ratings yet
HPC BOOk
68 pages
Module - 01 CC (BCS601)
No ratings yet
Module - 01 CC (BCS601)
47 pages
Module 2
No ratings yet
Module 2
124 pages
ch6 Notes
No ratings yet
ch6 Notes
5 pages
High-Performance Computing Overview
No ratings yet
High-Performance Computing Overview
17 pages
Lecture Week - 1 Introduction 1 - SP-24
No ratings yet
Lecture Week - 1 Introduction 1 - SP-24
51 pages
Parallel Computing Concepts Guide
No ratings yet
Parallel Computing Concepts Guide
32 pages
Cloud Computing
No ratings yet
Cloud Computing
27 pages
Parallel and Distributed Computing Complete Notes
No ratings yet
Parallel and Distributed Computing Complete Notes
41 pages
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
No ratings yet
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
41 pages
CC UNIT-1 Material
No ratings yet
CC UNIT-1 Material
26 pages
BDS Session 2
No ratings yet
BDS Session 2
59 pages
Parallel Computing Main
No ratings yet
Parallel Computing Main
47 pages
Parallel and Distributed Computing-1
No ratings yet
Parallel and Distributed Computing-1
23 pages
Module 2 CC
No ratings yet
Module 2 CC
52 pages
Overview of Parallel Hardware Concepts
No ratings yet
Overview of Parallel Hardware Concepts
60 pages
Chapter # 1
No ratings yet
Chapter # 1
117 pages
Chapter 1updatef
No ratings yet
Chapter 1updatef
26 pages
CC Unit 1.2
No ratings yet
CC Unit 1.2
39 pages
I Notes
No ratings yet
I Notes
27 pages
Introduction to Parallel Computing
No ratings yet
Introduction to Parallel Computing
90 pages
Introduction To Parallel Computing LLNL
No ratings yet
Introduction To Parallel Computing LLNL
44 pages
HPA - Notes
No ratings yet
HPA - Notes
5 pages
Module 4 - Architecture
No ratings yet
Module 4 - Architecture
22 pages
Co 1
No ratings yet
Co 1
66 pages
Unit 4
100% (1)
Unit 4
48 pages
Basics CUDA
No ratings yet
Basics CUDA
55 pages
Inf3380 para HW 2014
No ratings yet
Inf3380 para HW 2014
28 pages
Week 1
No ratings yet
Week 1
74 pages
Parallel Programming Module 4
No ratings yet
Parallel Programming Module 4
93 pages
Understanding CUDA Architecture and GPU
No ratings yet
Understanding CUDA Architecture and GPU
6 pages
Research of Fine Grit Access Control Based On Time in Cloud Computing 1
No ratings yet
Research of Fine Grit Access Control Based On Time in Cloud Computing 1
77 pages
Codigo Aranha Da Web
No ratings yet
Codigo Aranha Da Web
13 pages
Awap-Ms: Aircraft Wireless Access Point With Media Server
No ratings yet
Awap-Ms: Aircraft Wireless Access Point With Media Server
2 pages
Intel 8255 PPI Overview and Pinout
No ratings yet
Intel 8255 PPI Overview and Pinout
19 pages
Lab Manual - Getting Started With OT Pentesting
No ratings yet
Lab Manual - Getting Started With OT Pentesting
66 pages
BEE602 Microprocessor Ruchika - Singh@kiet - Edu SET B
No ratings yet
BEE602 Microprocessor Ruchika - Singh@kiet - Edu SET B
2 pages
CN 12-14 Multiple Access Protocols
No ratings yet
CN 12-14 Multiple Access Protocols
35 pages
4 - Fundamentals of Sensors
No ratings yet
4 - Fundamentals of Sensors
21 pages
Classified Ad Listing: Guided By: Prof. Rachana Modi Department of Computer Engineering
No ratings yet
Classified Ad Listing: Guided By: Prof. Rachana Modi Department of Computer Engineering
14 pages
DS Linked List 1
No ratings yet
DS Linked List 1
6 pages
Huanyang VFD Spindle Control: Setting Up The VFD
100% (2)
Huanyang VFD Spindle Control: Setting Up The VFD
5 pages
How To Connect To MySQL With VB6
No ratings yet
How To Connect To MySQL With VB6
6 pages
Siteplayer™ Spk1 Web Server Coprocessor Developer Kit: Description
No ratings yet
Siteplayer™ Spk1 Web Server Coprocessor Developer Kit: Description
1 page
Classification of Embedded Systems
100% (1)
Classification of Embedded Systems
10 pages
Engineering Student Project Report
No ratings yet
Engineering Student Project Report
8 pages
Lect19 Sequential Logic Part-3
No ratings yet
Lect19 Sequential Logic Part-3
29 pages
Naukri Abilash (5y 0m)
No ratings yet
Naukri Abilash (5y 0m)
2 pages
Paper Class 12 Computer Science - 045937
No ratings yet
Paper Class 12 Computer Science - 045937
3 pages
Software Projects at NIT Rourkela
No ratings yet
Software Projects at NIT Rourkela
4 pages
Saha 2015
No ratings yet
Saha 2015
13 pages
Python Basic Programming
No ratings yet
Python Basic Programming
87 pages
ICICC-2019 Dayanand Sagar
No ratings yet
ICICC-2019 Dayanand Sagar
3 pages
Bring Your Goals: Into Focus
No ratings yet
Bring Your Goals: Into Focus
9 pages
Cloud Computing
100% (1)
Cloud Computing
233 pages
Unit 1 Big Data Notes
No ratings yet
Unit 1 Big Data Notes
48 pages
IB Computer Science
No ratings yet
IB Computer Science
10 pages
Wireless Communication Dissertation
100% (2)
Wireless Communication Dissertation
5 pages
Unit V - Inheritance
No ratings yet
Unit V - Inheritance
19 pages
Safelisting Proofpoint PSAT Steps
No ratings yet
Safelisting Proofpoint PSAT Steps
4 pages
Zscaler DNS Security and Control Reference Architecture
No ratings yet
Zscaler DNS Security and Control Reference Architecture
26 pages

Lecture 3

Uploaded by

Lecture 3

Uploaded by

Exploring the Architecture of Parallel

Components of Parallel Computing Architecture

Central Processing Units (CPU)

 Multi-core CPUs: These CPUs feature multiple processing cores

Parallel & Distributed Computing 1 Deptt of CS & IT, HU

 Stream processors: GPUs consist of numerous stream processors, also

Accelerated Processing Units (APU)

 General-purpose registers: Registers directly accessible by the CPU

Parallel & Distributed Computing 2 Deptt of CS & IT, HU

Main Memory (RAM)

 Dynamic RAM (DRAM): Main memory modules composed of dynamic

Secondary Storage (Disk)

Parallel & Distributed Computing 3 Deptt of CS & IT, HU

 Crossbar Switches: High-performance switches that provide multiple

Parallel & Distributed Computing 4 Deptt of CS & IT, HU

 Ethernet: A widely used networking technology for local area networks

Parallel Programming Models

 Message Passing Interface (MPI): A standardized and widely used

Parallel & Distributed Computing 5 Deptt of CS & IT, HU

 CUDA (Compute Unified Device Architecture): A parallel computing

Operating Systems for Parallel Computing

 Linux: Linux-based operating systems, such as CentOS, Ubuntu, and Red

Types of Parallel Computing Architectures

Shared-Memory Architecture: In shared-memory architecture, multiple

Parallel & Distributed Computing 6 Deptt of CS & IT, HU

Shared Memory Parallelism

From a hardware perspective, a shared memory parallel architecture is a

Distributed-Memory Architecture: Distributed-memory architecture

Parallel & Distributed Computing 7 Deptt of CS & IT, HU

Parallel & Distributed Computing 8 Deptt of CS & IT, HU

Scientific Simulations and Modeling: Parallel computing architectures are

High-Performance Computing (HPC): High-performance computing relies

Image and Signal Processing: Parallel computing architectures are employed

Parallel computing architecture offers a powerful framework for accelerating

Parallel & Distributed Computing 9 Deptt of CS & IT, HU

You might also like