Student Friendly Notes Module2

Uploaded by

prajwalsv1234

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views5 pages

Student Friendly Notes Module2

Uploaded by

prajwalsv1234

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

2.4.

5 GPU Programming
GPUs work with a CPU host that manages memory, I/O, and program start. They can run
thousands of threads grouped under SIMD (Single Instruction Multiple Data). Each processor has
small fast memory (cache) and a shared memory block. Performance drops when threads branch
differently since some become idle. Hardware scheduler handles threads efficiently, but
programmers must minimize branching.

2.4.6 Programming Hybrid Systems

Hybrid systems combine shared-memory API (within a node) and distributed-memory API (between
nodes). They are mainly used in high-performance applications like scientific simulations.
Development is more complex, so most prefer a single distributed-memory API for simplicity. Hybrid
models are powerful but harder to maintain.
2.5.1 MIMD Systems
Parallel I/O involves multiple processes accessing disks/devices simultaneously. Most programs do
little I/O, but when multiple threads use printf/scanf, the output becomes nondeterministic. To avoid
confusion, usually only one process/thread handles input. Output may appear in mixed or jumbled
order if many threads print together.

2.5.2 GPUs
In GPU programming, the host CPU usually performs all input/output operations. GPU threads can
write to stdout during debugging, but the output order is unpredictable. GPU threads don’t have
access to stderr, stdin, or secondary storage. This ensures smoother execution, with I/O being
centralized at the CPU.

2.6.1 Speedup and Efficiency in MIMD Systems

Speedup (S) = Tserial / Tparallel. Ideally, speedup equals the number of cores (linear speedup).
Efficiency (E) = S / p, showing how well resources are used. In practice, overhead (mutex locks,
communication delays) reduces efficiency. As more cores are added, overhead grows and
efficiency drops. Graphs show speedup and efficiency for different problem sizes.
2.6.2 Amdahl’s Law
Amdahl’s Law states that the maximum speedup is limited by the serial part of a program. If 10% of
code is serial, maximum speedup ≤ 10, no matter how many cores are used. Even with perfect
parallelization, serial sections cap performance. This shows why minimizing serial code is crucial
for scalability.

2.6.3 Scalability in MIMD Systems

A program is scalable if efficiency remains constant when both problem size and cores increase.
Strong scalability: efficiency constant without increasing problem size. Weak scalability: efficiency
constant when problem size grows with cores. Example: If processes × k, problem size must also ×
k for weak scalability.

2.6.4 Taking Timings of MIMD Programs

Timings measure how fast a parallel program runs. Wall-clock time is preferred over CPU time
since it includes waiting periods. Synchronization (barriers) is used before timing starts. Usually,
maximum time across processes is considered. Multiple runs are taken, and the minimum time is
reported for accuracy.

2.6.5 GPU Performance

GPU performance is compared with CPU programs, but linear speedup concepts don’t apply
directly. Scalability in GPUs is defined informally: performance improves with larger GPUs.
Amdahl’s Law still applies if serial code runs on CPU. Timers from CPU or GPU APIs are used for
measuring performance. Even small speedups can be useful in real applications.

PC Module2
No ratings yet
PC Module2
10 pages
Parallel Computing-Module2 Notes
No ratings yet
Parallel Computing-Module2 Notes
10 pages
Parallel Programming FDP
No ratings yet
Parallel Programming FDP
43 pages
Chapter 8
No ratings yet
Chapter 8
58 pages
BCS702 Module 5 Textbook
No ratings yet
BCS702 Module 5 Textbook
48 pages
BCS702 Module1 Detailed Notes
No ratings yet
BCS702 Module1 Detailed Notes
14 pages
BCS702 Module 2 Textbook
No ratings yet
BCS702 Module 2 Textbook
13 pages
PDC Lecture 09
No ratings yet
PDC Lecture 09
36 pages
Kirk+Hwu GPU
No ratings yet
Kirk+Hwu GPU
92 pages
Parallel Detailed Explanations
No ratings yet
Parallel Detailed Explanations
2 pages
Lecture GPUArchCUDA01
No ratings yet
Lecture GPUArchCUDA01
57 pages
Chapter2 Part 3
No ratings yet
Chapter2 Part 3
27 pages
L 1 ParallelProcess Challenges
No ratings yet
L 1 ParallelProcess Challenges
82 pages
GPGPU
100% (1)
GPGPU
139 pages
Cpus: Latency Oriented Design
No ratings yet
Cpus: Latency Oriented Design
2 pages
VTU Previous Year Questions
No ratings yet
VTU Previous Year Questions
1 page
Understanding PGPU and CUDA Basics
No ratings yet
Understanding PGPU and CUDA Basics
70 pages
Lecture 2
No ratings yet
Lecture 2
77 pages
Cuda Program + Wait For User Input
No ratings yet
Cuda Program + Wait For User Input
2 pages
Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
100% (1)
Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
57 pages
High Performance Computing On Gpu
No ratings yet
High Performance Computing On Gpu
37 pages
Patterson6e MIPS Ch06 PPT
No ratings yet
Patterson6e MIPS Ch06 PPT
63 pages
Aca
No ratings yet
Aca
13 pages
GPU Architecture
33% (3)
GPU Architecture
28 pages
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
No ratings yet
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
43 pages
Baker CHPT 5 SIMD Good
No ratings yet
Baker CHPT 5 SIMD Good
94 pages
Lecture 2
No ratings yet
Lecture 2
21 pages
Multi
No ratings yet
Multi
5 pages
Section 2 TR
No ratings yet
Section 2 TR
26 pages
Prebook MCAP
No ratings yet
Prebook MCAP
11 pages
TFM - Unfinished
No ratings yet
TFM - Unfinished
17 pages
Parallel & Distributed Computing Report
No ratings yet
Parallel & Distributed Computing Report
4 pages
Comp Arch Project 2 Final
No ratings yet
Comp Arch Project 2 Final
29 pages
Lec 44 Multicore
No ratings yet
Lec 44 Multicore
23 pages
Chapter 06
No ratings yet
Chapter 06
57 pages
Onur Digitaldesign 2020 Lecture20 Gpu Beforelecture
No ratings yet
Onur Digitaldesign 2020 Lecture20 Gpu Beforelecture
73 pages
GPU Performance Modeling Guide
No ratings yet
GPU Performance Modeling Guide
12 pages
Unit 4
100% (1)
Unit 4
48 pages
Cuda
No ratings yet
Cuda
69 pages
Thara Module2 BCS702
No ratings yet
Thara Module2 BCS702
25 pages
Parallel Computing
No ratings yet
Parallel Computing
32 pages
Analysis of Programs For GPGPU Architectures
No ratings yet
Analysis of Programs For GPGPU Architectures
4 pages
Cea201 Summary Theories
No ratings yet
Cea201 Summary Theories
49 pages
Lec 14
No ratings yet
Lec 14
52 pages
So sánh RAM DDR3 và DDR4
No ratings yet
So sánh RAM DDR3 và DDR4
49 pages
PDC Lecture 01
No ratings yet
PDC Lecture 01
36 pages
Lecture 0: Cpus and Gpus: Prof. Mike Giles
No ratings yet
Lecture 0: Cpus and Gpus: Prof. Mike Giles
36 pages
Lecture 1: Introduction: Graphics Processing Units (Gpus) : Architecture and Programming
No ratings yet
Lecture 1: Introduction: Graphics Processing Units (Gpus) : Architecture and Programming
33 pages
Hardware
No ratings yet
Hardware
54 pages
Synthesis Gpgpu Draft2012 09
No ratings yet
Synthesis Gpgpu Draft2012 09
100 pages
Advanced Computer Architecture Fall 2019 Multithreaded Architectures
No ratings yet
Advanced Computer Architecture Fall 2019 Multithreaded Architectures
31 pages
GPU Insights for CPU Experts
100% (1)
GPU Insights for CPU Experts
70 pages
Dept. of Electrical Engineering COMSATS University Islamabad Fall 2018
No ratings yet
Dept. of Electrical Engineering COMSATS University Islamabad Fall 2018
26 pages
Understanding Parallel Processing Concepts
No ratings yet
Understanding Parallel Processing Concepts
38 pages
Introduction About ACA Syllabus
No ratings yet
Introduction About ACA Syllabus
18 pages
24-25 - Parallel Processing PDF
No ratings yet
24-25 - Parallel Processing PDF
36 pages
Lecture 1: An Introduction To CUDA: Mike Giles
No ratings yet
Lecture 1: An Introduction To CUDA: Mike Giles
40 pages
Introduction to CUDA Programming Basics
No ratings yet
Introduction to CUDA Programming Basics
247 pages
String Equality and Interning in Java
No ratings yet
String Equality and Interning in Java
2 pages
Abstract Classes and Pure Virtual Functions: Window Door
No ratings yet
Abstract Classes and Pure Virtual Functions: Window Door
8 pages
Lesson 2 Variables Data Types and Operators
No ratings yet
Lesson 2 Variables Data Types and Operators
6 pages
HTML Forms Lab for CSC336 Students
No ratings yet
HTML Forms Lab for CSC336 Students
10 pages
Lab-04-Pseudocode INTRO TO COMPUTING
No ratings yet
Lab-04-Pseudocode INTRO TO COMPUTING
12 pages
CHP 7 - Software Engineering, A Practitioner's Approach, 7th Ed.
No ratings yet
CHP 7 - Software Engineering, A Practitioner's Approach, 7th Ed.
18 pages
Design and Implementation of A Document Repository and Work Flow System in The Parliament of Kenya
No ratings yet
Design and Implementation of A Document Repository and Work Flow System in The Parliament of Kenya
23 pages
Cyber Ethics: by Aditya Behl CLASS: 10
No ratings yet
Cyber Ethics: by Aditya Behl CLASS: 10
9 pages
Internship Report: Computer Science Engineering
No ratings yet
Internship Report: Computer Science Engineering
46 pages
Scsa1301 Dbms Unit-1
No ratings yet
Scsa1301 Dbms Unit-1
60 pages
Hardware Software Codesign Question Paper
No ratings yet
Hardware Software Codesign Question Paper
2 pages
Database Consolidation Methods Pros & Cons
No ratings yet
Database Consolidation Methods Pros & Cons
4 pages
BPO Management System
100% (1)
BPO Management System
16 pages
Windows Heap Overflow Exploitation
No ratings yet
Windows Heap Overflow Exploitation
7 pages
Database Programming With PL/SQL 2-3: Practice Activities: Recognizing Data Types
No ratings yet
Database Programming With PL/SQL 2-3: Practice Activities: Recognizing Data Types
3 pages
Java Programming: Packages, Abstract Classes & Interfaces Lab
No ratings yet
Java Programming: Packages, Abstract Classes & Interfaces Lab
11 pages
XSLT Training
No ratings yet
XSLT Training
78 pages
Software Testing Essentials
No ratings yet
Software Testing Essentials
4 pages
Hol-2201-12-Cmp PDF en Simulation
No ratings yet
Hol-2201-12-Cmp PDF en Simulation
317 pages
Supercharged Python Take Your Code To The Next Level Brian Overland Download
No ratings yet
Supercharged Python Take Your Code To The Next Level Brian Overland Download
162 pages
WP-35 Imp-Qa
No ratings yet
WP-35 Imp-Qa
46 pages
SAP SD Payment Card Processing Guide
No ratings yet
SAP SD Payment Card Processing Guide
6 pages
Introduction to R: Installation Guide
No ratings yet
Introduction to R: Installation Guide
233 pages
Compiler Overview and Phases Explained
No ratings yet
Compiler Overview and Phases Explained
56 pages
Builder Design Pattern Explained
No ratings yet
Builder Design Pattern Explained
12 pages
Tic Tac Toe Project Report
No ratings yet
Tic Tac Toe Project Report
21 pages
Shell Scripting
No ratings yet
Shell Scripting
33 pages
Wa0016.
No ratings yet
Wa0016.
16 pages
0000.julia. (Programming - Language) Wikipedia
No ratings yet
0000.julia. (Programming - Language) Wikipedia
6 pages
Java Polymorphism Program Examples
No ratings yet
Java Polymorphism Program Examples
5 pages

Student Friendly Notes Module2

Uploaded by

Student Friendly Notes Module2

Uploaded by

2.4.

2.4.6 Programming Hybrid Systems

2.6.1 Speedup and Efficiency in MIMD Systems

2.6.3 Scalability in MIMD Systems

2.6.4 Taking Timings of MIMD Programs

2.6.5 GPU Performance

You might also like