0% found this document useful (0 votes)
9 views4 pages

Parallel Computing and BigData

Uploaded by

puniths1011
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views4 pages

Parallel Computing and BigData

Uploaded by

puniths1011
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

1.

Difference between SIMD and MIMD Architectures


1 SIMD (Single Instruction, Multiple Data): One instruction stream applied to many data elements in
lockstep.
2 MIMD (Multiple Instruction, Multiple Data): Multiple independent processors execute different
instructions on different data.
3 Control: SIMD uses a single control unit; MIMD has multiple independent control units.
4 Synchrony: SIMD executes the same instruction simultaneously; MIMD executes asynchronously.
5 Use cases: SIMD for data-parallel tasks (e.g., vector math, image filtering); MIMD for task-parallel or
general-purpose multiprocessing (OS, servers).
6 Programming: SIMD is simpler for SPMD patterns; MIMD requires explicit concurrency control
(threads, MPI).
7 Hardware: SIMD hardware is simpler; MIMD requires richer interconnect and coherence mechanisms.
8 Real-world SIMD examples: GPUs (shader cores), SIMD instructions (Intel AVX, ARM NEON).
9 Advantages of SIMD: High throughput, energy-efficient for data-parallel loops.
10 Real-world MIMD examples: Multi-core CPUs (Linux servers), distributed-memory clusters.
11 Advantages of MIMD: Flexibility for heterogeneous tasks, good for task-level parallelism.

2. Classification of Parallel Computers


1 By memory organization: Shared-memory (UMA, NUMA), Distributed-memory (clusters), Hybrid.
2 By control/instruction streams (Flynn’s taxonomy): SISD, SIMD, MISD, MIMD.
3 By coupling: Tightly coupled (multicore), Loosely coupled (clusters).
4 By programming model: Shared memory (threads, OpenMP), Distributed memory (MPI),
Data-parallel (GPU, MapReduce).
5 Architecture examples:
6 • Shared-memory multiprocessor: CPUs connected via coherence fabric to shared memory.
7 • Distributed-memory cluster: Each node has CPU+RAM, connected via a network.

3. Synchronization Mechanism in Shared-Memory Systems


1 Race condition: Occurs when multiple threads access shared data without synchronization.
2 Common mechanisms: Mutex, Spinlock, Semaphores, Barriers, Atomic operations.
3 Mutex ensures only one thread enters a critical section at a time.
4 Barriers synchronize all threads at a certain point.
5 Atomic operations provide lock-free synchronization.
6 Best practices: Keep critical sections short, avoid deadlocks, prefer lock-free methods when possible.

4. Impact of Cache Coherence in Shared-Memory Systems


1 Cache coherence ensures consistency across multiple caches.
2 Positive impact: Transparent programming model, faster access with caches.
3 Negative impact: Coherence traffic, false sharing, scalability limits.
4 Protocols: Write-invalidate (MESI), Write-update.
5 Mitigation: Align variables, use thread-local data, minimize sharing.

5. Routing in a 4×4 Mesh


1 Mesh topology: Nodes connected to up/down/left/right neighbors.
2 Routing algorithm: XY routing – move along X, then Y.
3 Example: (0,0) → (0,3) → (3,3), total 6 hops.
4 Alternative: YX routing or adaptive routing to reduce congestion.
6. Shared-Memory vs Distributed-Memory Systems
1 UMA: Equal memory access latency (e.g., small SMPs).
2 NUMA: Latency depends on locality (e.g., multi-socket servers).
3 Distributed-memory: Each node has private memory, uses message passing (MPI).
4 Pros/cons: Shared memory – simple but limited scalability; Distributed – scalable but complex
programming.

7. Performance of Interconnection Networks


1 Bus: Low cost, poor scalability.
2 Crossbar: High bandwidth, expensive, O(N²) hardware.
3 Ring: Moderate cost, O(N) latency.
4 Mesh: Scalable, moderate latency, used in NoCs and large systems.

8. Scalability Calculation Example


1 Tserial = 40s, Tparallel = 10s, p=8.
2 Speedup S = 40/10 = 4.
3 Efficiency E = 4/8 = 0.5 (50%).
4 Interpretation: Only half of processor capacity utilized.

9. Simple CUDA Kernel for Vector Addition


1 Kernel: Each thread computes one element C[i] = A[i] + B[i].
2 Threads organized into blocks (e.g., 256 threads per block).
3 Blocks organized into grid: (n + threadsPerBlock - 1) / threadsPerBlock.
4 Guard condition ensures threads outside range do not execute.

10. Amdahl’s Law Example


1 Parallel fraction = 80%, Serial = 20%, p=8.
2 Speedup S = 1 / (0.2 + 0.8/8) = 3.33.
3 Even with 8 processors, limited speedup due to 20% serial part.

11. Example Program with Serial/Parallel Analysis


1 Tserial=24 ms, Tparallel=4 ms, p=8.
2 Observed speedup = 24/4 = 6.
3 Serial fraction ≈ 4.76%, Parallel ≈ 95.2%.
4 Program design: Partition workload, balance threads, minimize synchronization.

12. Scalability in MIMD Systems


1 Strong scalability: Runtime decreases with more processors for fixed problem size.
2 Weak scalability: Runtime remains constant with proportional increase in workload and processors.
3 Factors: Communication, synchronization, load balancing, cache coherence.
4 Strategies: Reduce communication, improve locality, balance load.

13. Message Passing vs Shared Memory


1 MPI: Explicit communication, scalable to large clusters, overhead from message latency.
2 Shared-memory (Pthreads/OpenMP): Easier programming, limited scalability due to memory
bandwidth and coherence issues.
3 Hybrid MPI+OpenMP combines strengths for multi-node parallelism.

14. Flynn’s Taxonomy


1 SISD: Single Instruction Single Data (uniprocessor).
2 SIMD: Single Instruction Multiple Data (vector processors, GPUs).
3 MISD: Multiple Instruction Single Data (rare, fault-tolerance pipelines).
4 MIMD: Multiple Instruction Multiple Data (multicore, clusters).

15. Vector Processors


1 Operate on entire vectors with one instruction.
2 Use vector registers to hold multiple data elements.
3 Highly pipelined functional units for throughput.
4 Efficient for scientific computing and linear algebra.
5 Limitations: Less efficient for irregular data access patterns.

16. Cache Coherence Approaches


1 Snooping-based coherence: Caches monitor bus and invalidate/update on writes (e.g., MESI protocol).
2 Directory-based coherence: Directory tracks sharers and manages invalidations/updates.
3 Snooping is simple but not scalable; Directory scales better but requires extra storage.

17. Scalability in MIMD Systems (Expanded)


1 Granularity: Fine-grained vs coarse-grained tasks.
2 Communication: All-to-all is costly, nearest-neighbor scales better.
3 Topology: Low-diameter networks improve scalability.
4 Software: Efficient runtime, thread management, load balancing.
5 Metrics: Speedup, efficiency, isoefficiency function.

18. Assumptions and Rules for I/O in Parallel Programs


1 Shared file systems may serialize access.
2 I/O is slower than memory operations.
3 Best practices: Batch I/O, use collective I/O (MPI-IO), avoid small writes, use parallel file systems,
manage consistency with locks.

19. GPU Programming Overview


1 GPUs contain many cores optimized for data-parallel workloads.
2 Programming models: CUDA, OpenCL, Vulkan compute, high-level libraries.
3 Execution model: Threads organized into blocks and grids.
4 Memory hierarchy: Global, shared, registers, caches.
5 Performance: Maximize occupancy, coalesce memory accesses, avoid warp divergence.
6 Use cases: Linear algebra, deep learning, image processing.

20. Performance of MIMD Systems


1 Factors: Processor speed, memory hierarchy, interconnect, cache coherence, synchronization, I/O.
2 Metrics: Speedup, efficiency, throughput, latency, scalability.
3 Optimizations: Improve locality, reduce synchronization, use efficient communication, optimize
scheduling.

Technologies Used in Big Data Environments


1 In-Memory Analytics: Stores data in RAM for faster access and insights.
2 In-Database Processing: Analytics performed inside the database, avoiding export overhead.
3 Symmetric Multiprocessor System (SMP): Multiple processors share memory and I/O under one OS.
4 Massively Parallel Processing (MPP): Independent processors with local memory working in parallel.
5 Parallel vs Distributed Systems: Parallel = tightly coupled, Distributed = loosely coupled.
6 Shared Nothing Architecture: No shared memory or disk among processors; contrasts with Shared
Memory and Shared Disk.

You might also like