An Introduction to Parallel Programming
Peter Pacheco
Chapter 2
Parallel Hardware and Parallel
Software
Copyright © 2010, Elsevier Inc. All rights Reserved 1
# Chapter Subtitle
Roadmap
Some background
Modifications to the von Neumann model
Parallel hardware
Parallel software
Input and output
Performance
Parallel program design
Writing and running parallel programs
Assumptions
Copyright © 2010, Elsevier Inc. All rights Reserved 2
Serial hardware and software
programs
input
Computer runs one
program at a time.
output
Copyright © 2010, Elsevier Inc. All rights Reserved 3
# Chapter Subtitle
The von Neumann Architecture
Figure 2.1
Copyright © 2010, Elsevier Inc. All rights Reserved 4
Main memory
This is a collection of locations, each of
which is capable of storing both
instructions and data.
Every location consists of an address,
which is used to access the location, and
the contents of the location.
Copyright © 2010, Elsevier Inc. All rights Reserved 5
Central processing unit (CPU)
Divided into two parts.
Control unit - responsible for
deciding which instruction in add 2+2
a program should be
executed. (the boss)
Arithmetic and logic unit (ALU) -
responsible for executing the actual
instructions. (the worker)
Copyright © 2010, Elsevier Inc. All rights Reserved 6
Key terms
Register – very fast storage, part of the
CPU.
Program counter – stores address of the
next instruction to be executed.
Bus – wires and hardware that connects
the CPU and memory.
Copyright © 2010, Elsevier Inc. All rights Reserved 7
memory
fetch/read
CPU
Copyright © 2010, Elsevier Inc. All rights Reserved 8
memory
write/store
CPU
Copyright © 2010, Elsevier Inc. All rights Reserved 9
von Neumann bottleneck
Copyright © 2010, Elsevier Inc. All rights Reserved 10
An operating system “process”
An instance of a computer program that is
being executed.
Components of a process:
The executable machine language program.
A block of memory.
Descriptors of resources the OS has allocated
to the process.
Security information.
Information about the state of the process.
Copyright © 2010, Elsevier Inc. All rights Reserved 11
Multitasking
Gives the illusion that a single processor
system is running multiple programs
simultaneously.
Each process takes turns running. (time
slice)
After its time is up, it waits until it has a
turn again. (blocks)
Copyright © 2010, Elsevier Inc. All rights Reserved 12
Threading
Threads are contained within processes.
They allow programmers to divide their
programs into (more or less) independent
tasks.
The hope is that when one thread blocks
because it is waiting on a resource,
another will have work to do and can run.
Copyright © 2010, Elsevier Inc. All rights Reserved 13
A process and two threads
the “master” thread
terminating a thread
starting a thread
Is called joining
Is called forking
Figure 2.2
Copyright © 2010, Elsevier Inc. All rights Reserved 14
MODIFICATIONS TO THE VON
NEUMANN MODEL
Copyright © 2010, Elsevier Inc. All rights Reserved 15
Basics of caching
A collection of memory locations that can
be accessed in less time than some other
memory locations.
A CPU cache is typically located on the
same chip, or one that can be accessed
much faster than ordinary memory.
Copyright © 2010, Elsevier Inc. All rights Reserved 16
Principle of locality
Accessing one location is followed by an
access of a nearby location.
Spatial locality – accessing a nearby
location.
Temporal locality – accessing in the near
future.
Copyright © 2010, Elsevier Inc. All rights Reserved 17
Principle of locality
float z[1000];
…
sum = 0.0;
for (i = 0; i < 1000; i++)
sum += z[i];
Copyright © 2010, Elsevier Inc. All rights Reserved 18
Levels of Cache
smallest & fastest
L1
L2
L3
largest & slowest
Copyright © 2010, Elsevier Inc. All rights Reserved 19
Cache hit
fetch x
L1 x sum
L2 y z total
L3 A[ ] radius r1 center
Copyright © 2010, Elsevier Inc. All rights Reserved 20
Cache miss
fetch x x
main
L1 y sum memory
L2 r1 z total
L3 A[ ] radius center
Copyright © 2010, Elsevier Inc. All rights Reserved 21