SMAG-J ENTERPRISES
MODULE-1
1. Classification of Parallel Computers
Parallel computers are systems that use multiple processors to perform
computations simultaneously. These are generally classified based on:
A. Flynn’s Taxonomy
Flynn's classification divides systems based on the number of instruction and
data streams:
Type Instruction Stream Data Stream Example
SISD Single Single Traditional CPU
SIMD Single Multiple GPUs, Vector processors
MISD Multiple Single (Rare) Fault-tolerant systems
MIMD Multiple Multiple Multi-core systems, Clusters
Diagram:
Flynn's Taxonomy:
1|Page
SMAG-J ENTERPRISES
Email: [email protected] | Phone: 9392680519
2. SIMD (Single Instruction Multiple Data) Systems
Definition:
SIMD architectures execute one instruction on multiple data elements
simultaneously.
Features:
• Good for data-parallel tasks like image processing, scientific simulation.
• Used in GPUs, vector processors.
Example:
Matrix operations, image filters.
Diagram:
3. MIMD (Multiple Instruction Multiple Data) Systems
Definition:
Each processor works independently, executing different instructions on
different data.
Types:
• Tightly Coupled: Shared memory multiprocessors.
• Loosely Coupled: Distributed systems or clusters.
2|Page
SMAG-J ENTERPRISES
Email: [email protected] | Phone: 9392680519
Examples:
• Supercomputers.
• Cloud computing environments.
Diagram:
4. Interconnection Networks
Used to connect processors and memory units.
Types:
• Bus-based: All processors share a common bus.
• Crossbar switch: Full connectivity between processors and memory.
• Multistage (Omega, Butterfly): Indirect connection using switches.
• Hypercube, Mesh: Scalable topologies for large systems.
Diagram Example – 2D Mesh Network:
3|Page
SMAG-J ENTERPRISES
Email: [email protected] | Phone: 9392680519
5. Cache Coherence
Definition:
Maintaining a consistent view of data stored in multiple caches of processors.
Other way of defining
Cache Coherence refers to the consistency of data stored in multiple caches
that share the same memory space. In multi-core or multi-processor systems,
each processor typically has its own cache, and problems arise when one
processor updates a memory location that others have also cached.
Problem Statement:
How do we ensure that all processors see the most recent value of a shared
variable?
Why Is Cache Coherence Important?
In parallel computing, especially in shared-memory architectures, maintaining
data correctness is vital:
• Avoids stale reads (a processor using old data).
• Prevents data races.
• Ensures correct execution order.
Example Scenario
Let’s say we have:
• Variable X initialized to 0.
• Two processors: P1 and P2, each with its own cache.
Initial State
Main Memory: X = 0
4|Page
SMAG-J ENTERPRISES
Email: [email protected] | Phone: 9392680519
Cache P1: X = 0
Cache P2: X = 0
P1 Updates X = 10 (in its cache)
Main Memory: X = 0
Cache P1: X = 10
Cache P2: X = 0
If P2 reads X, it gets stale data (0), which violates coherence.
Types of Cache Coherence Problems
Problem Description
Stale Data Reading old values not reflecting recent writes.
Two processors write to different variables in the same
False Sharing
cache block.
A write in one cache should propagate to others or main
Write Propagation
memory.
Transaction Writes should appear in a global sequence across all
Ordering processors.
Cache Coherence Protocols
1. Write-Invalidate Protocol
• When a processor writes, all other cached copies are invalidated.
• Only one processor has a valid copy at a time.
Simple and bandwidth efficient.
5|Page
SMAG-J ENTERPRISES
Email: [email protected] | Phone: 9392680519
Example:
P1 writes X = 5 → invalidate X in all other caches.
2. Write-Update Protocol
• When a processor writes, it broadcasts the new value to all caches.
Ensures everyone gets the new value immediately.
Higher bandwidth usage.
3. MESI Protocol (Most common)
A widely used protocol with 4 states per cache line:
State Meaning
M Modified → This cache has modified data not in memory
E Exclusive → Cache has the only valid copy
S Shared → Cache has valid data, others may have it too
I Invalid → Data is not valid
Transitions occur when a processor reads or writes to data, and other caches
respond or update accordingly.
MESI State Diagram (Textual Description):
• Read Miss → Bus Read → Shared or Exclusive
• Write → Invalidate others → Modified
• Others read → Downgrade from Modified to Shared
6|Page
SMAG-J ENTERPRISES
Email: [email protected] | Phone: 9392680519
6. Shared-Memory vs. Distributed-Memory
Feature Shared Memory Distributed Memory
Local memory per
Memory access Global memory space
processor
Through shared
Communication Through message passing
variables
Programming Easy (OpenMP,
Complex (MPI, RPC)
model Pthreads)
Scalability Limited Highly scalable
Diagram:
Shared Memory:
Distributed Memory:
7|Page
SMAG-J ENTERPRISES
Email: [email protected] | Phone: 9392680519
7. Coordinating Processes/Threads in Parallel Computing
-> Why Coordination is Needed?
When multiple threads or processes work on shared tasks or data,
coordination ensures:
• Correctness (avoid data races)
• Synchronization (maintain order)
• Deadlock prevention
• Efficient resource usage
Common Problems in Parallel Programs
Problem Description
Race Two or more threads access shared data at the same time
Condition causing incorrect results
Deadlock Two threads wait for each other to release resources
Threads change state in response to each other but make
Livelock
no progress
Key Synchronization Tools
Tool Purpose Example
Only one thread can enter a critical
Mutex (Lock) pthread_mutex_t
section
Allows limited threads to access a
Semaphore sem_t
resource
8|Page
SMAG-J ENTERPRISES
Email: [email protected] | Phone: 9392680519
Tool Purpose Example
All threads must reach a point #pragma omp
Barrier
before continuing barrier
Condition Threads wait for a specific
pthread_cond_t
Variable condition to become true
Thread Coordination – Code Examples
Example 1: Using Mutex to Avoid Race Conditions (Pthreads in C)
Problem: Two threads increment a shared counter.
#include <stdio.h>
#include <pthread.h>
int counter = 0;
pthread_mutex_t lock;
void* increment(void* arg) {
for (int i = 0; i < 100000; i++) {
pthread_mutex_lock(&lock);
counter++;
pthread_mutex_unlock(&lock);
}
return NULL;
}
int main() {
pthread_t t1, t2;
pthread_mutex_init(&lock, NULL);
pthread_create(&t1, NULL, increment, NULL);
9|Page
SMAG-J ENTERPRISES
Email: [email protected] | Phone: 9392680519
pthread_create(&t2, NULL, increment, NULL);
pthread_join(t1, NULL);
pthread_join(t2, NULL);
pthread_mutex_destroy(&lock);
printf("Final Counter: %d\n", counter); // Should be 200000
return 0;
}
Without mutex, the output may be wrong due to race conditions.
Example 2: Using Barrier (OpenMP in C)
#include <stdio.h>
#include <omp.h>
int main() {
#pragma omp parallel num_threads(4)
{
int id = omp_get_thread_num();
printf("Thread %d is doing initial work\n", id);
// Synchronize all threads here
#pragma omp barrier
printf("Thread %d is doing final work after barrier\n", id);
}
return 0;
}
8. Shared-Memory Programming
10 | P a g e
SMAG-J ENTERPRISES
Email: [email protected] | Phone: 9392680519
Definition:
All processors access a common memory space.
Languages/Tools:
• OpenMP
• POSIX Threads
Features:
• Easier to program.
• Efficient for small-scale systems.
Code Snippet (OpenMP):
#pragma omp parallel for
for (int i = 0; i < n; i++) {
a[i] = b[i] + c[i];
}
9. Distributed-Memory Programming
Definition:
Each processor has its own private memory and communicates via messages.
Languages/Tools:
• MPI (Message Passing Interface)
• Remote Procedure Calls (RPC)
Advantages:
• High scalability
• Suitable for large clusters or cloud systems
Code Snippet (MPI):
MPI_Send(&data, count, MPI_INT, dest, tag, MPI_COMM_WORLD);
MPI_Recv(&buffer, count, MPI_INT, source, tag, MPI_COMM_WORLD, &status);
11 | P a g e