0% found this document useful (0 votes)
23 views11 pages

Module 1

The document provides an overview of parallel computing, detailing classifications such as Flynn's Taxonomy, SIMD, and MIMD systems. It discusses cache coherence, interconnection networks, and the differences between shared-memory and distributed-memory systems, along with synchronization tools for coordinating processes. Additionally, it includes code examples for shared-memory and distributed-memory programming using OpenMP and MPI.

Uploaded by

Raghavendra gs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views11 pages

Module 1

The document provides an overview of parallel computing, detailing classifications such as Flynn's Taxonomy, SIMD, and MIMD systems. It discusses cache coherence, interconnection networks, and the differences between shared-memory and distributed-memory systems, along with synchronization tools for coordinating processes. Additionally, it includes code examples for shared-memory and distributed-memory programming using OpenMP and MPI.

Uploaded by

Raghavendra gs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

SMAG-J ENTERPRISES

Email: [email protected] | Phone: 9392680519

MODULE-1
1. Classification of Parallel Computers
Parallel computers are systems that use multiple processors to perform
computations simultaneously. These are generally classified based on:
A. Flynn’s Taxonomy
Flynn's classification divides systems based on the number of instruction and
data streams:

Type Instruction Stream Data Stream Example

SISD Single Single Traditional CPU

SIMD Single Multiple GPUs, Vector processors

MISD Multiple Single (Rare) Fault-tolerant systems

MIMD Multiple Multiple Multi-core systems, Clusters

Diagram:
Flynn's Taxonomy:

1|Page
SMAG-J ENTERPRISES
Email: [email protected] | Phone: 9392680519

2. SIMD (Single Instruction Multiple Data) Systems


Definition:
SIMD architectures execute one instruction on multiple data elements
simultaneously.
Features:
• Good for data-parallel tasks like image processing, scientific simulation.
• Used in GPUs, vector processors.
Example:
Matrix operations, image filters.
Diagram:

3. MIMD (Multiple Instruction Multiple Data) Systems


Definition:
Each processor works independently, executing different instructions on
different data.
Types:
• Tightly Coupled: Shared memory multiprocessors.
• Loosely Coupled: Distributed systems or clusters.

2|Page
SMAG-J ENTERPRISES
Email: [email protected] | Phone: 9392680519

Examples:
• Supercomputers.
• Cloud computing environments.
Diagram:

4. Interconnection Networks
Used to connect processors and memory units.
Types:
• Bus-based: All processors share a common bus.
• Crossbar switch: Full connectivity between processors and memory.
• Multistage (Omega, Butterfly): Indirect connection using switches.
• Hypercube, Mesh: Scalable topologies for large systems.
Diagram Example – 2D Mesh Network:

3|Page
SMAG-J ENTERPRISES
Email: [email protected] | Phone: 9392680519

5. Cache Coherence
Definition:
Maintaining a consistent view of data stored in multiple caches of processors.
Other way of defining
Cache Coherence refers to the consistency of data stored in multiple caches
that share the same memory space. In multi-core or multi-processor systems,
each processor typically has its own cache, and problems arise when one
processor updates a memory location that others have also cached.

Problem Statement:
How do we ensure that all processors see the most recent value of a shared
variable?

Why Is Cache Coherence Important?


In parallel computing, especially in shared-memory architectures, maintaining
data correctness is vital:
• Avoids stale reads (a processor using old data).
• Prevents data races.
• Ensures correct execution order.

Example Scenario
Let’s say we have:
• Variable X initialized to 0.
• Two processors: P1 and P2, each with its own cache.

Initial State
Main Memory: X = 0

4|Page
SMAG-J ENTERPRISES
Email: [email protected] | Phone: 9392680519

Cache P1: X = 0
Cache P2: X = 0

P1 Updates X = 10 (in its cache)


Main Memory: X = 0
Cache P1: X = 10
Cache P2: X = 0
If P2 reads X, it gets stale data (0), which violates coherence.

Types of Cache Coherence Problems

Problem Description

Stale Data Reading old values not reflecting recent writes.

Two processors write to different variables in the same


False Sharing
cache block.

A write in one cache should propagate to others or main


Write Propagation
memory.

Transaction Writes should appear in a global sequence across all


Ordering processors.

Cache Coherence Protocols

1. Write-Invalidate Protocol
• When a processor writes, all other cached copies are invalidated.
• Only one processor has a valid copy at a time.

Simple and bandwidth efficient.

5|Page
SMAG-J ENTERPRISES
Email: [email protected] | Phone: 9392680519

Example:
P1 writes X = 5 → invalidate X in all other caches.

2. Write-Update Protocol
• When a processor writes, it broadcasts the new value to all caches.

Ensures everyone gets the new value immediately.

Higher bandwidth usage.

3. MESI Protocol (Most common)


A widely used protocol with 4 states per cache line:

State Meaning

M Modified → This cache has modified data not in memory

E Exclusive → Cache has the only valid copy

S Shared → Cache has valid data, others may have it too

I Invalid → Data is not valid

Transitions occur when a processor reads or writes to data, and other caches
respond or update accordingly.
MESI State Diagram (Textual Description):
• Read Miss → Bus Read → Shared or Exclusive
• Write → Invalidate others → Modified
• Others read → Downgrade from Modified to Shared

6|Page
SMAG-J ENTERPRISES
Email: [email protected] | Phone: 9392680519

6. Shared-Memory vs. Distributed-Memory


Feature Shared Memory Distributed Memory

Local memory per


Memory access Global memory space
processor

Through shared
Communication Through message passing
variables

Programming Easy (OpenMP,


Complex (MPI, RPC)
model Pthreads)

Scalability Limited Highly scalable

Diagram:
Shared Memory:

Distributed Memory:

7|Page
SMAG-J ENTERPRISES
Email: [email protected] | Phone: 9392680519

7. Coordinating Processes/Threads in Parallel Computing


-> Why Coordination is Needed?
When multiple threads or processes work on shared tasks or data,
coordination ensures:

• Correctness (avoid data races)


• Synchronization (maintain order)

• Deadlock prevention
• Efficient resource usage

Common Problems in Parallel Programs

Problem Description

Race Two or more threads access shared data at the same time
Condition causing incorrect results

Deadlock Two threads wait for each other to release resources

Threads change state in response to each other but make


Livelock
no progress

Key Synchronization Tools

Tool Purpose Example

Only one thread can enter a critical


Mutex (Lock) pthread_mutex_t
section

Allows limited threads to access a


Semaphore sem_t
resource

8|Page
SMAG-J ENTERPRISES
Email: [email protected] | Phone: 9392680519

Tool Purpose Example

All threads must reach a point #pragma omp


Barrier
before continuing barrier

Condition Threads wait for a specific


pthread_cond_t
Variable condition to become true

Thread Coordination – Code Examples

Example 1: Using Mutex to Avoid Race Conditions (Pthreads in C)


Problem: Two threads increment a shared counter.

#include <stdio.h>
#include <pthread.h>

int counter = 0;
pthread_mutex_t lock;

void* increment(void* arg) {


for (int i = 0; i < 100000; i++) {
pthread_mutex_lock(&lock);
counter++;
pthread_mutex_unlock(&lock);
}
return NULL;
}

int main() {
pthread_t t1, t2;
pthread_mutex_init(&lock, NULL);

pthread_create(&t1, NULL, increment, NULL);

9|Page
SMAG-J ENTERPRISES
Email: [email protected] | Phone: 9392680519

pthread_create(&t2, NULL, increment, NULL);

pthread_join(t1, NULL);
pthread_join(t2, NULL);

pthread_mutex_destroy(&lock);
printf("Final Counter: %d\n", counter); // Should be 200000

return 0;
}
Without mutex, the output may be wrong due to race conditions.

Example 2: Using Barrier (OpenMP in C)


#include <stdio.h>
#include <omp.h>

int main() {
#pragma omp parallel num_threads(4)
{
int id = omp_get_thread_num();
printf("Thread %d is doing initial work\n", id);

// Synchronize all threads here


#pragma omp barrier

printf("Thread %d is doing final work after barrier\n", id);


}
return 0;
}

8. Shared-Memory Programming

10 | P a g e
SMAG-J ENTERPRISES
Email: [email protected] | Phone: 9392680519

Definition:
All processors access a common memory space.
Languages/Tools:
• OpenMP
• POSIX Threads
Features:
• Easier to program.
• Efficient for small-scale systems.
Code Snippet (OpenMP):
#pragma omp parallel for
for (int i = 0; i < n; i++) {
a[i] = b[i] + c[i];
}

9. Distributed-Memory Programming
Definition:
Each processor has its own private memory and communicates via messages.
Languages/Tools:
• MPI (Message Passing Interface)
• Remote Procedure Calls (RPC)
Advantages:
• High scalability
• Suitable for large clusters or cloud systems
Code Snippet (MPI):
MPI_Send(&data, count, MPI_INT, dest, tag, MPI_COMM_WORLD);
MPI_Recv(&buffer, count, MPI_INT, source, tag, MPI_COMM_WORLD, &status);
11 | P a g e

You might also like