0% found this document useful (0 votes)
31 views9 pages

Lecture 3

Uploaded by

huzaifazafi593
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views9 pages

Lecture 3

Uploaded by

huzaifazafi593
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Exploring the Architecture of Parallel

Computing
Parallel computing architecture involves the simultaneous execution of
multiple computational tasks to enhance performance and efficiency. This
tutorial provides an in-depth exploration of parallel computing architecture,
including its components, types, and real-world applications.

Components of Parallel Computing Architecture


In parallel computing, the architecture comprises essential components such as
processors, memory hierarchy, interconnects, and software stack. These
components work together to facilitate efficient communication, data
processing, and task coordination across multiple processing units.
Understanding the roles and interactions of these components is crucial for
designing and optimizing parallel computing systems.

Processors
Processors are the central processing units responsible for executing
instructions and performing computations in parallel computing systems.
Different types of processors, such as CPUs, GPUs, and APUs, offer varying
degrees of parallelism and computational capabilities.

Central Processing Units (CPU)

 Multi-core CPUs: These CPUs feature multiple processing cores


integrated onto a single chip, allowing parallel execution of tasks. Each
core can independently execute instructions, enabling higher
performance and efficiency in multi-threaded applications.
 Multi-threaded CPUs: Multi-threaded CPUs support the simultaneous
execution of multiple threads within each core. This feature enhances
throughput and responsiveness by overlapping the execution of multiple
tasks, particularly in applications with parallelizable workloads.

Parallel & Distributed Computing 1 Deptt of CS & IT, HU


Graphical Processing Units (GPU)

 Stream processors: GPUs consist of numerous stream processors, also


known as shader cores, responsible for executing computational tasks in
parallel. These processors are optimized for data-parallel operations and
are particularly well suited for graphics rendering, scientific computing,
and machine learning tasks.
 CUDA cores: CUDA (Compute Unified Device Architecture) cores are
specialized processing units found in NVIDIA GPUs. These cores are
designed to execute parallel computing tasks programmed using the
CUDA parallel computing platform and application programming
interface (API). CUDA cores offer high throughput and efficiency for
parallel processing workloads.

Accelerated Processing Units (APU)

 CPU cores: Accelerated Processing Units (APUs) integrate both CPU and
GPU cores on a single chip. The CPU cores within APUs are responsible
for general-purpose computing tasks, such as executing application code,
handling system operations, and managing memory.
 GPU cores: Alongside CPU cores, APUs also include GPU cores optimized
for parallel computation and graphics processing. These GPU cores
provide accelerated performance for tasks such as image rendering, video
decoding, and parallel computing workloads.

Memory Hierarchy
Memory hierarchy comprises various levels of memory, including registers,
cache memory, main memory (RAM), and secondary storage (disk). Effective
management of memory hierarchy is crucial for optimizing data access and
minimizing latency in parallel computing systems.

Registers

 General-purpose registers: Registers directly accessible by the CPU


cores for storing temporary data and intermediate computation results.

Parallel & Distributed Computing 2 Deptt of CS & IT, HU


 Special-purpose registers: Registers dedicated to specific functions,
such as program counter, stack pointer, and status flags, essential for CPU
operations and control flow.

Cache Memory

 L1 Cache: Level 1 cache located closest to the CPU cores, offering fast
access to frequently accessed data and instructions.
 L2 Cache: Level 2 cache situated between L1 cache and main memory,
providing larger storage capacity and slightly slower access speeds.
 L3 Cache: Level 3 cache shared among multiple CPU cores, offering a
larger cache size and serving as a shared resource for improving data
locality and reducing memory access latency.

Main Memory (RAM)

 Dynamic RAM (DRAM): Main memory modules composed of dynamic


random-access memory cells, used for storing program instructions and
data during program execution.
 Static RAM (SRAM): Caches and buffer memory within the memory
hierarchy, offering faster access speeds and lower latency compared to
DRAM.
 Video RAM (VRAM): Dedicated memory on GPUs used for storing
textures, framebuffers, and other graphical data required for rendering
images and videos. VRAM enables high-speed access to graphics data
and enhances the performance of GPU-accelerated applications.

Secondary Storage (Disk)

 Hard Disk Drives (HDDs): Magnetic storage devices used for long-term
data storage and retrieval in parallel computing systems. HDDs provide
high-capacity storage but slower access speeds compared to main
memory.
 Solid State Drives (SSDs): Flash-based storage devices offer faster
access speeds and lower latency than HDDs. SSDs are commonly used as

Parallel & Distributed Computing 3 Deptt of CS & IT, HU


secondary storage in parallel computing systems to improve I/O
performance and reduce data access latency.

Interconnects
Interconnects facilitate communication and data transfer between processors
and memory units in parallel computing systems. High-speed interconnects,
such as buses, switches, and networks, enable efficient data exchange among
processing elements.

Buses

 System Bus: Connects the CPU, memory, and other internal components
within a computer system. It facilitates communication and data transfer
between these components.
 Memory Bus: Dedicated bus for transferring data between the CPU and
main memory (RAM). It ensures fast and efficient access to memory
resources.
 I/O Bus: Input/output bus connects peripheral devices, such as storage
devices, network interfaces, and accelerators, to the CPU and memory in
a parallel computing system.

Switches

 Crossbar Switches: High-performance switches that provide multiple


paths for data transmission between input and output ports. They enable
simultaneous communication between multiple pairs of devices,
improving bandwidth and reducing latency.
 Packet Switches: Switches that forward data in discrete packets based
on destination addresses. They efficiently manage network traffic by
dynamically allocating bandwidth and prioritizing packets based on
quality of service (QoS) parameters.

Parallel & Distributed Computing 4 Deptt of CS & IT, HU


Networks

 Ethernet: A widely used networking technology for local area networks


(LANs) and wide area networks (WANs). It employs Ethernet cables and
switches to transmit data packets between devices within a network.
 InfiniBand: A high-speed interconnect technology commonly used in
high-performance computing (HPC) environments. It offers low-latency,
high-bandwidth communication between computing nodes in clustered
systems.
 Fiber Channel: A storage area network (SAN) technology that enables
high-speed data transfer between servers and storage devices over fiber
optic cables. It provides reliable and scalable connectivity for enterprise
storage solutions.

Software Stack
The software stack consists of programming models, libraries, and operating
systems tailored for parallel computing. Parallel programming models, such as
MPI (Message Passing Interface) and OpenMP (Open Multi-Processing), provide
abstractions for expressing parallelism and coordinating tasks across
processors.

Parallel Programming Models

 Message Passing Interface (MPI): A standardized and widely used


parallel programming model for distributed memory systems. MPI
enables communication and coordination between parallel processes
running on different nodes in a parallel computing system.
 Open Multi-Processing (OpenMP): A parallel programming API
designed for shared memory systems. OpenMP simplifies parallel
programming by providing directives for specifying parallel regions, loop
parallelization, and thread synchronization.

Parallel & Distributed Computing 5 Deptt of CS & IT, HU


Parallel Libraries

 CUDA (Compute Unified Device Architecture): A parallel computing


platform and programming model developed by NVIDIA for GPU-
accelerated computing. CUDA enables developers to harness the
computational power of NVIDIA GPUs for general-purpose parallel
processing tasks.
 OpenCL (Open Computing Language): An open standard for parallel
programming across CPUs, GPUs, and other accelerators. OpenCL allows
developers to write parallel programs that execute efficiently on
heterogeneous computing platforms.

Operating Systems for Parallel Computing

 Linux: Linux-based operating systems, such as CentOS, Ubuntu, and Red


Hat Enterprise Linux, are widely used in parallel computing environments
due to their scalability, stability, and support for high-performance
computing (HPC) clusters.
 Windows Server: Microsoft Windows Server operating system provides
support for parallel computing workloads through features such as
Windows HPC Server and Windows Subsystem for Linux (WSL).
 HPC-specific OS distributions: Specialized operating system
distributions tailored for high-performance computing (HPC)
environments, such as CentOS HPC, Rocks Cluster Distribution, and SUSE
Linux Enterprise Server for HPC. These distributions offer optimized
configurations and tools for parallel computing applications.

Types of Parallel Computing Architectures


Parallel computing encompasses various architectures tailored to exploit
concurrency and enhance computational efficiency. These architectures,
including shared-memory, distributed-memory, and hybrid systems, offer
distinct approaches to harnessing parallelism.

Shared-Memory Architecture: In shared-memory architecture, multiple


processors share access to a common memory space. This architecture

Parallel & Distributed Computing 6 Deptt of CS & IT, HU


simplifies communication and data sharing among processors but requires
mechanisms for synchronization and mutual exclusion to prevent data hazards.

Shared Memory Parallelism

From a hardware perspective, a shared memory parallel architecture is a


computer that has a common physical memory accessible to a number of
physical processors. The two basic types of shared memory
architectures are Uniform Memory Access (UMA) and Non-Uniform Memory
Access (NUMA), as shown in Fig. Today, the most common form of UMA
architecture is the Symmetric Multiprocessor (SMP) machine, which consists of
multiple identical processors with equal level of access and access time to the
shared memory. Whereas the most common form of NUMA, architecture is the
machine made by inter-linking a number of SMPs. It is characterized by the fact
that the access time to different memory locations might vary for a processor.

Distributed-Memory Architecture: Distributed-memory architecture


comprises multiple independent processing units, each with its own memory
space. Communication between processors is achieved through message
passing over a network. This architecture offers scalability and fault tolerance
but requires explicit data distribution and communication protocols.

Parallel & Distributed Computing 7 Deptt of CS & IT, HU


Hybrid Architectures: Hybrid architectures combine elements of both shared-
memory and distributed-memory systems. These architectures leverage the
benefits of shared-memory parallelism within individual nodes and distributed-
memory scalability across multiple nodes, making them suitable for a wide
range of applications.

Parallel & Distributed Computing 8 Deptt of CS & IT, HU


Real-World Applications
Real-world applications of parallel computing span diverse domains, from
scientific simulations to big data analytics and high-performance computing.
Parallel computing architectures enable efficient processing and analysis of
large datasets, sophisticated simulations, and complex computational tasks.

Scientific Simulations and Modeling: Parallel computing architectures are


widely used in scientific simulations and modeling tasks, such as weather
forecasting, computational fluid dynamics, and molecular dynamics simulations.

Big Data Analytics: Parallel computing architectures power big data analytics
platforms, enabling processing and analysis of large datasets in distributed
environments. Applications include data mining, machine learning, and
predictive analytics.

High-Performance Computing (HPC): High-performance computing relies


on parallel computing architectures to solve computationally intensive
problems, including simulations, numerical analysis, and optimization tasks.

Image and Signal Processing: Parallel computing architectures are employed


in image and signal processing applications, such as image recognition, video
compression, and digital signal processing, to achieve real-time performance
and efficiency.

Parallel computing architecture offers a powerful framework for accelerating


computational tasks and solving complex problems efficiently. By
understanding the components, types, and real-world applications of parallel
computing architecture, developers and architects can design and deploy
scalable, high-performance computing systems across various domains.

Parallel & Distributed Computing 9 Deptt of CS & IT, HU

You might also like