1) Serial Computation
• Serial computation means executing one instruction at a time
• step by step, on a single CPU.
• Runs on one computer with one CPU.
• Breaks the problem into small steps.
• Executes steps one after another.
• Processes only one instruction at a time.
2) Types of Parallelism
i) Bit-Level Parallelism:
• Depends on the processor's word size.
• If the processor has a smaller word size, tasks are broken into multiple steps.
• Example: An 8-bit processor needs 2 steps to process a 16-bit number, but a 16-bit
processor can do it in one step.
ii) Instruction-Level Parallelism:
• Executes multiple instructions in one CPU cycle.
• The processor decides how many instructions can run at the same time.
• Helps speed up execution by running instructions in parallel within a single core.
iii) Task Parallelism:
• Breaks a big task into smaller tasks.
• Different processors (or cores/threads) work on different tasks at the same time.
• Example: In a project, each person works on a separate part, finishing the work faster
3) Distributed Computing:
• A distributed system is a group of computers that work together as one.
• They communicate over a network, share data and tasks, and help complete a common
goal efficiently.
Applications: Internet & web services, Cloud computing, Social media platforms, Financial systems
5) Inter-Process Communication (IPC) :
• IPC is essential in modern operating systems.
• It allows processes to exchange data and synchronize their actions.
• There are two primary modes of IPC: Shared Memory and Message Passing.
Message Passing Model
• In this model, processes communicate by exchanging messages using send and receive
operations over a communication line.
• Example: A client sends data to a server for modification, and the server sends the
response back.
Advantages: Simple to implement, No explicit synchronization, Works well in distributed systems
Disadvantages: Slower communication, Higher kernel involvement, Higher resource consumption
Shared Memory Model
• In this system, processes communicate by reading from and writing to a shared memory
region created by one process.
• Requires synchronization (e.g., using semaphores or mutexes) to avoid data conflicts.
• Example: Web browsers and web servers using shared caching or memory.
Difference b/w MP and MC :
Feature Multiprocessor Architecture Multicomputer Architecture
Multiple processors share one Multiple computers with separate memory
Structure
memory communicate via a network
Shared memory for all
Memory Access Each computer has its own memory
processors
Communication Uses shared memory Uses message passing
Cost &
Cheaper and simpler More expensive and complex
Complexity
Network Type Dynamic Static
Speed Faster execution Comparatively slower
Scalability Limited scalability Highly scalable
Fault Tolerance Failure affects all processors Failure in one node doesn't impact others
SMP (Symmetric
Examples Cloud computing, grid computing
Multiprocessing)
Advantages: Fast communication, Efficient for large data transfers, Less kernel involvement
Disadvantages: Complex synchronization, Security risks, Limited to a single machine (not suitable for
distributed systems)
6) Interconnection Networks for Parallel Processors
Interconnection networks enable efficient communication among processors in parallel computing
systems.
They are broadly categorized into:
1) Static Interconnection Networks (Fixed Connections)
These networks have a fixed layout; connections between nodes do not change during execution.
• Unidirectional:
Data flows in one direction only.
• Bidirectional:
Data flows in both directions, allowing two-way communication.
Types of Static Networks:
a) Fully (Completely) Connected Network
• Each node is directly connected to every other node.
• Advantages: Very high bandwidth, Very low latency, Efficient and fast data transfer
• Disadvantages:Expensive and complex to implement (requires many links)
b) Limited Connection Network
It is a partial network that connects each node to only a few others.
Characteristics:
• More cost-effective and less complex than a fully connected network
• Supports optimized communication patterns
• Scales better than a fully connected network
• Has lower bandwidth and reduced fault tolerance
Examples of Limited Connection Networks: LAN, RN
➤ Linear Array Network
• Nodes (systems) are connected in a single row
• Each node (except endpoints) is connected to two neighbors
• If one system fails, all systems after it may stop working (in unidirectional setups)
➤ Ring Network
• Extension of the linear array where the last node connects back to the first
• Forms a closed loop
• Each node has two neighbors, allowing communication in both directions
• Better fault tolerance than linear arrays (especially if bidirectional)
2) Dynamic Interconnection Network (Flexible Connections)
Dynamic interconnection networks reconfigure connections based on communication needs during
execution.
They use switching mechanisms to establish communication paths dynamically, offering:
• Flexibility, Scalability, Fault tolerance Better adaptability
Common dynamic network topologies used in parallel computing include Mesh and Hypercube.
a) Mesh Topology
• A 2D mesh extends a linear array into two dimensions.
• Each internal node (except edge nodes) connects to four neighbors:
Up, Down, Left, Right
• Nodes are identified using coordinates, forming a grid structure.
• Wraparound links can be added to connect edge nodes, forming a torus structure for
improved connectivity.
Examples:
2D Mesh without wraparound:
Each node connects only to adjacent nodes in the grid.
i.e., no edge-to-edge connection.
2D Mesh with wraparound (Torus):
Edges are connected, so nodes at the edge also connect across to the opposite side, improving
fault tolerance and reducing communication time.
b) 3D Mesh (Hypercube Extension)
• A 3D mesh extends the 2D mesh into three dimensions.
• Each internal node connects to 6 neighbors: front, back, left, right, top, and bottom.
• This structure is scalable and widely used in high-performance computing for:
o Scientific simulations
o Image processing
o Large-scale data analysis
Advantages of Mesh/Hypercube Networks:
• Scalable and modular
• Fault-tolerant (alternative paths available)
• Efficient for structured applications (e.g., matrix operations)
•
Limitations:
• Lower bandwidth compared to hypercube networks
• Longer communication paths (especially in large networks)
• Complexity increases with scale
Hypercube Network
A Hypercube (also called an n-cube) is a highly structured and scalable interconnection network
used in parallel computing.
• A Hypercube 2 nodes along each dimension and logp dimensions.
• Each node is connected to n other nodes.
• Nodes are labeled with binary numbers of length n
Construction of a Hypercube:
• 0-Dimensional Hypercube:A single node
• 1-D Hypercube: Two nodes connected by one link (like a line).
• 2-D Hypercube: Formed by connecting two 1-D hypercubes.
• 3-D Hypercube: Formed by connecting two 2-D hypercubes.
• General Rule:
o An n-dimensional hypercube is formed by linking corresponding nodes
of two (n−1)-dimensional hypercubes.
9) Omega Network
• Omega network connects processors to memory
• Each stage follows the same pattern with N / 2 Switch boxes having 4 states.
• It has log stages of 2x2 switches.
• It uses perfect shuffle exchange for interconnection.
000 -> 000 -> 000 -> 000
001 -> 010 -> 100 -> 001
010 -> 100 -> 001 -> 010
100 -> 001 -> 010 -> 100
011 -> 110 -> 101 -> 011
101 -> 011 -> 110 -> 101
110 -> 101 -> 011 -> 110
111 -> 111 -> 111 -> 111
10) Butterfly Networks
A scalable interconnection network used in parallel computing for efficient data routing and
processing.
• Resembles butterfly wings for predictable performance
• Has multiple stages with systematically connected nodes
• Each node connects to 2 nodes in the next stage for fast data transfer
• Logarithmic distance between nodes reduces latency
• Ensures every input reaches every output through specific routes
• Indirect topology with no processor nodes; only (n+1)(n + 1)(n+1) switching nodes
Routing Algorithms:
i) Deterministic Routing – Fixed paths for consistent performance
ii) Adaptive Routing – Adjusts path to avoid congestion
iii) Minimal Routing – Selects the shortest path for efficiency
iv) Non-minimal Routing – Uses longer paths to avoid congestion
v) Fault-Tolerant Routing – Ensures data delivery despite failures
Communication Patterns:
• One-to-One, One-to-All, All-to-One, All-to-All, Many-to-Many, Pipelined communication
Advantages: Scalable, High efficiency, Low latency, Fault tolerance
Disadvantages: Complex design, High cost, Maintenance issues, Limited flexibility
Q1: 4x4 Mesh Network – Calculate Links, Diameter, Bisection Width, and Arc Connectivity
A 4×4 mesh has 16 nodes arranged in 4 rows and 4 columns.
1. Total Number of Links:
• Each internal node connects to up to 4 neighbors (up, down, left, right)
• In a 4×4 mesh:
• Horizontal links: 4 rows × (4 - 1) = 4 × 3 = 12
• Vertical links: 4 columns × (4 - 1) = 4 × 3 = 12
• Total links = 12 + 12 = 24
2. Diameter:
• Diameter = Maximum shortest distance between any two nodes
• In a 4×4 grid:
Diameter = (rows - 1) + (columns - 1) = 3 + 3 = 6
3. Bisection Width:
• Minimum number of links that need to be removed to divide the network into two equal
halves
• For 4×4 mesh: Bisection width = 4
4. Arc Connectivity:
• Minimum number of arcs (edges) that need to be removed to disconnect the network
• For 2D mesh, Arc connectivity = 2 (based on node degree of edge nodes)
Q2: Flynn’s Taxonomy – Diagram
Flynn’s taxonomy classifies computer architectures based on the number of instruction and data
streams:
• SISD: Traditional single-core computer
• SIMD: GPUs, vector processors
• MISD: Rare, used in fault tolerance
• MIMD: Multiprocessors, distributed system
Q4: 6-Node Ring and Linear Network
Linear Network (6 nodes):
• Links: 5
• Diameter: 5
• Bisection Width: 1
• Arc Connectivity: 1
Ring Network (6 nodes):
• Links: 6 (one extra to close the ring)
• Diameter: ⌊n/2⌋ = ⌊6/2⌋ = 3
• Bisection Width: 2
• Arc Connectivity: 2
Q5: Parallel Compare-Exchange vs. Compare-Split
Compare-Exchange:
• Two processors compare their values and exchange if needed.
• Used in bitonic sort or odd-even transposition sort.
Compare-Split:
• Used in sample sort or merge-based sorting.
• Two processors share their data, sort locally, and split into low/high partitions.
Q6: Parallel Algorithm Models (with Examples)
1. Parallel Random Access Machine (PRAM)
a. Example: Prefix sums
2. Data Parallel Model
a. Example: Matrix-vector multiplication using SIMD
3. Task Parallel Model
a. Example: Different threads performing I/O, computation, and logging
4. Message Passing Model
a. Example: MPI-based distributed matrix multiplication
5. Shared Memory Model
a. Example: OpenMP for parallel loops
6. Pipeline Model
a. Example: Image processing stages (input → filter → compress)
Q7: Prefix Sums on 8-node Hypercube
• Each node has an initial value.
• Log₂(8) = 3 steps required for prefix sum.
• In each step, nodes exchange data with neighbors differing in one bit and update their prefix
sum.
• Step-by-step:
• Step 1: Exchange with 1-bit difference
• Step 2: Exchange with 2-bit difference
• Step 3: Exchange with 4-bit difference
Result: Each node now holds the sum of all previous nodes in the binary order.
Q8: Multiprocessor vs. Multicomputer Systems
Feature Multiprocessor Multicomputer
Memory Shared memory Distributed memory
Communication Through shared variables Message passing via network
Speed Faster (less latency) Slower due to network overhead
Cost Higher (tight integration) Scalable and cost-effective
Scalability Limited Highly scalable
Example SMP (Symmetric Multiprocessing) Cluster, Grid, Cloud
Q9: Parallel Execution Time (Two Computers Adding n Numbers)
• Step 1: Computer 1 holds all n numbers
• Step 2: Sends n/2 numbers to Computer 2
• Step 3: Both compute partial sums in parallel
• Step 4: Computer 2 sends its result to Computer 1 for final sum
Q10: Parallel Runtime for Adding n Numbers with m Processors
Assumptions:
• nnn is a power of 2
• Using a tree-based reduction pattern
• Each process adds its share and participates in log₂(m) reduction steps
• Separate send() and recv() calls used