PARALLEL COMPUTER MODELS
A Comprehensive Overview: Theoretical Concepts and Practical
Considerations
Architecture Parallelism Performance Models
Types of Parallelism
ILP DLP
Instruction-Level Parallelism Data-Level Parallelism
Multiple instructions per CPU Same operation, multiple data
Pipelining Vector processors
Superscalar processors SIMD units in GPUs/CPUs
EXAMPLE EXAMPLE
Modern Intel/AMD CPUs Adding arrays element-wise
TLP Task-Level
Thread/Process-Level Parallelism Task-Level Parallelism
Independent threads/processes Split problem into tasks
Multiple cores required Different operations per task
Distributed systems Heterogeneous processing
EXAMPLE EXAMPLE
Web server handling clients Compiler: lexical, syntax, optimization
Flynn's Taxonomy
Classification based on instruction and data streams:
SISD SIMD
Single Instruction, Single Data Single Instruction, Multiple Data
Sequential execution Same instruction, multiple data
No parallelism Data-parallel tasks
Traditional von Neumann Vector processing
EXAMPLE EXAMPLE
Old uniprocessors GPUs, vector processors
MISD MIMD
Multiple Instruction, Single Data Multiple Instruction, Multiple Data
Multiple instructions, same data Multiple instructions, multiple data
Rare in practice Most flexible model
Fault-tolerant systems Widely used today
EXAMPLE EXAMPLE
Space shuttle flight control Modern clusters, multi-core CPUs
Semantic Attributes
How parallel programs behave and interact with the system:
Communication Granularity
Shared Memory: Common memory access Fine-grain: Small tasks
Message Passing: Private memory + messages Coarse-grain: Large tasks
Communication frequency varies
Synchronization Determinism Concurrency Control
Task coordination Deterministic: Same result always Prevents race conditions
Methods: barriers, locks, semaphores Non-deterministic: Results may vary Tools: critical sections, mutex, monitors
Message Passing
Explicit
communication
Each process has
private memory
Performance Attributes
Key metrics for evaluating parallel systems:
Execution Time Speedup % Efficiency
Time to complete task Tserial/Tparallel Speedup/Processors
Most fundamental metric Ideal: linear with processors Max: 1 (100%)
Scalability Throughput Latency
Maintain efficiency Tasks per unit time Task start/finish delay
Strong vs. weak scaling Batch processing focus Critical for real-time
Speedup vs. Processors Speedup Formula
S = Tserial / Tparallel
Ratio of serial to parallel execution time
% Efficiency Formula
E=S/N
Ratio of speedup to number of processors
Abstract Machine Models
Theoretical models for studying parallel computing:
PRAM LogP
Parallel Random Access Machine Realistic Communication Model
Unlimited processors Distributed memory
Shared memory Network characteristics
Idealized model Communication overhead
VARIANTS PARAMETERS
EREW: Exclusive read, exclusive L: Latency o: Overhead g: Gap P: Processors
write
CREW: Concurrent read, exclusive
write
CRCW: Concurrent read,
concurrent write
BSP Dataflow
Bulk Synchronous Parallel Data-Driven Execution
Superstep execution Data availability driven
Local computation Not program order
Barrier synchronization Natural parallelism
SUPERSTEPS APPLICATIONS
1. Local computation Functional programming
2. Communication Streaming systems
3. Barrier synchronization Dataflow architectures
Summary
Key Concepts Practical Considerations
Memory Requirements can be complicated by data
Parallelism Types Flynn's Taxonomy Semantic replication and additional arrays
Attributes
ILP DLP TLP SISD SIMD MISD Algorithm Compatibility varies across architectures
Granularity
Task-Level MIMD
Turnaround Time is more important to users than just
Communication
speedup
Synchronization
Benchmarking Challenges include workload variations
Determinism and hardware limitations
Performance Abstract Models Key Formulas
Metrics
PRAM LogP BSP S = Ts/Tp E = S/N
Speedup Efficiency
Dataflow Amdahl's Law
Scalability Throughput
Gustafson's Law
"Time to solution is more important than linear speedup
for most users"