+
Parallel
Chapter Processing
17
William Stallings, Computer Organization and Architecture, 9 th Edition
+
Objectives
You are profiting from multiple CPU computers, You should
know about them.
After studying this chapter, you should be able to:
Summarize the types of parallel processor organizations.
Present an overview of design features of symmetric
multiprocessors. Understand the issue of cache
coherence in a multiple processor system.
Explain the key features of the MESI protocol.
Explain the difference between implicit and explicit
multithreading. Summarize key design issues for clusters.
+
Contents
17.1 Multiple Processor Organizations
17.2 Symmetric Multiprocessors
17.3 Cache Coherence and the MESI Protocol
17.4 Multithreading and Chip Multiprocessors
+
17.1- Multiple Processor
Organization
Single instruction, single data Multiple instruction, single
(SISD) stream data (MISD) stream
Single processor executes a A sequence of data is
single instruction stream to transmitted to a set of
operate on data stored in a processors, each of which
single memory executes a different instruction
Uniprocessors fall into this sequence
category Not commercially implemented
Single instruction, multiple data Multiple instruction, multiple
(SIMD) stream data (MIMD) stream
A single machine instruction A set of processors
controls the simultaneous simultaneously execute different
execution of a number of instruction sequences on different
processing elements on a data sets
lockstep basis SMPs, clusters and NUMA systems
Vector and array processors fall fit this category
into this category
Parallel Organizations
Parallel Organizations
17.2- Symmetric Multiprocessor
(SMP)
A stand alone computer
with the following
characteristics:
Processors All System
share same processors controlled by
memory and share access All integrated
I/O facilities to I/O processors operating
Two or more • Processors are devices
can perform system
similar connected by a • Either through • Provides
the same
processors of bus or other same channels interaction
internal or different functions between
comparable connection channels giving (hence processors and
capacity • Memory access paths to same “symmetric” their programs
time is devices at job, task, file
approximately ) and data
the same for element levels
each processor
Multiprogramming and Multiprocessing
The operating system of an SMP schedules processes or threads across all of the
processors. SMP has a number of potential advantages over a uni-processor
organization, including the following: Performance, availability, incremental
growth (user can add processors), scaling (Vendors can offer a range of products
with different configures)
Organization: Tightly Coupled
• Each processor is self-
contained (CU, registers, one
or more caches).
• Shared main memory and I/O
devices through some form
of interconnection
mechanism.
• Processors can communicate
with each other through
memory.
• A processor can exchange
signals directly to each other.
• The memory is often
organized so that multiple
simultaneous accesses to
separate blocks of memory
are possible.
• In some configurations, each
processor may also have its
own private main memory
and I/O channels in addition
Organization: Symmetric Multiprocessor
• The most common
organization for
personal
computers,
workstations, and
servers is the time-
shared bus. The
time-shared bus is
the simplest
mechanism for
constructing a
multiprocessor
system.
• The structure and
interfaces are
DMA:
basically the same
• Addressing: <source, destination> as for a single-
• Arbitration: Any I/O module can be “master.” processor system
• Time-sharing that uses a bus
interconnection.
+
The bus organization has several
attractive features:
Simplicity
Simplest approach to multiprocessor organization
Flexibility
Generally easy to expand the system by attaching more
processors to the bus
Reliability
The bus is essentially a passive medium and the failure of
any attached device should not cause failure of the whole
system
+
Disadvantages of the bus organization:
Main drawback is performance
All memory references pass through the common bus
Performance is limited by bus cycle time
Each processor should have cache memory
Reduces the number of bus accesses
Leads to problems with cache coherence
If a word is altered in one cache it could conceivably
invalidate a word in another cache
To prevent this the other processors must be alerted that
an update has taken place
Typically addressed in hardware rather than the operating
system
+ Multiprocessor Operating
System Design
Considerations
Simultaneous concurrent processes
OS routines need to be reentrant (center) to allow several
processors to execute the same OS code (OS service)
simultaneously
OS tables and management structures must be managed
properly to avoid deadlock or invalid operations
Scheduling
Any processor may perform scheduling so conflicts must be
avoided
Scheduler must assign ready processes to available
processors
Synchronization
With multiple active processes having potential access to shared
address spaces or I/O resources, care must be taken to provide
mutualeffective
exclusion: loạisynchronization
trừ hỗ tương, cơ chế độc chiếm tài nguyên, một nguyên nhân gây deadlock
+ Multiprocessor Operating System
Design Considerations…
Memory management
In addition to dealing with all of the issues found on
uniprocessor machines, the OS needs to exploit the available
hardware parallelism to achieve the best performance
Paging mechanisms on different processors must be
coordinated to enforce consistency when several processors
share a page or segment and to decide on page replacement
Reliability and fault tolerance
OS should provide graceful degradation (suy giảm) in the face of
processor failure
Scheduler and other portions of the operating system must
recognize the loss of a processor and restructure
accordingly
17.3- Cache Coherence and the
+
MESI Protocol Review
:
Write back: Write operations are usually made only to the cache.
Main memory is only updated when the corresponding cache line
is flushed from the cache can result in inconsistency
Write through: All write operations are made to main memory as
well as to the cache, ensuring that main memory is always valid.
Even with the write-through policy, inconsistency can occur
unless other caches monitor the memory traffic or receive some
direct notification of the update
MESI (modified/exclusive/shared/invalid) protocol is
recommended here.
Coherent: sticking together – cố kết
Consistency: disambiguation- nhất quán, không nhập nhằng
Protocol: way including some steps for communication- giao thức
+ Cache Coherence…
Software Solutions
Attempt to avoid the need for additional hardware
circuitry and logic by relying on the compiler and
operating system to deal with the problem (không
muốn thêm phần cứng)
Attractive because the overhead of detecting
potential problems is transferred from run time to
compile time, and the design complexity is transferred
from hardware to software
However, compile-time software approaches generally must
make conservative decisions, leading to inefficient cache
utilization
+
Cache Coherence…
Hardware-Based Solutions
Generally referred to as cache coherence protocols
These solutions provide dynamic recognition at run time
of potential inconsistency conditions
Because the problem is only dealt with when it actually
arises there is more effective use of caches, leading to
improved performance over a software approach
Approaches are transparent to the programmer and the
compiler, reducing the software development burden
Can be divided into two categories:
Directory protocols
Snoopy protocols
Transparent: unable to see- trong suốt
Snoop: spy, rình mò
Directory Protocols
Effective in large
Collect and maintain
scale systems with
information about
complex
There is a copies of data in
cache
interconnection
schemes
centralized
controller that
is part of the
Directory stored in Creates central
main memory main memory bottleneck
controller
Requests are Appropriate
checked against transfers are
directory performed
Snoopy Protocols
Distribute the responsibility for maintaining cache coherence
among all of the cache controllers in a multiprocessor
A cache must recognize when a line that it holds is shared with other
caches
When updates are performed on a shared cache line, it must be
announced to other caches by a broadcast mechanism
Each cache controller is able to “snoop” on the network to observe these
broadcast notifications and react accordingly
Suited to bus-based multiprocessor because the shared bus provides
a simple means for broadcasting and snooping
Care must be taken that the increased bus traffic required for broadcasting
and snooping does not cancel out the gains from the use of local caches
Two basic approaches have been explored:
Write invalidate
Write update (or write broadcast)
+
Write Invalidate
Multiple readers, but only one writer at a time
When a write is required, all other caches of the line
are invalidated (marked)
Writing processor then has exclusive (độc chiếm-cheap)
access until line is required by another processor
Most widely used in commercial multiprocessor systems
such as the Pentium 4 and PowerPC
State of every line is marked as modified, exclusive,
shared or invalid
For this reason the write-invalidate protocol is called MESI
+
Write Update
Can be multiple readers and writers
When a processor wishes to update a shared line
the word to be updated is distributed to all others
and caches containing that line can update it
Some systems use an adaptive mixture of both write-
invalidate and write-update mechanisms
+
MESI Protocol
To provide cache consistency on an SMP (symmetric
multi-processor) the data cache supports a protocol
known as MESI:
Modified
The line in the cache has been modified and is available
only in this cache
Exclusive
The line in the cache is the same as that in main memory
and is not present in any other cache
Shared
The line in the cache is the same as that in main memory
and may be present in another cache
Invalid
The line in the cache does not contain valid data
Table 17.1
MESI Cache Line States
Table 17.1 summarizes the meaning of the four states.
MESI State Transition Diagram
+
17.4- Multithreading and Chip
Multiprocessors
Processor performance can be measured by the rate at
which it executes instructions
MIPS rate = f * IPC // Million Instructions Per second
f = processor clock frequency, in MHz
IPC=average Instructions Per Cycle
Increase performance by increasing clock frequency and
increasing instructions that complete during cycle
Multithreading
Allows for a high degree of instruction-level parallelism without
increasing circuit complexity or power consumption Increase
IPC
Instruction stream is divided into several smaller streams,
known as threads, that can be executed in parallel
Definitions of Threads and Processes
Thread in
multithreaded
processors may or may
not be the same as the
concept of software
threads in a Thread is concerned
Thread switch multiprogrammed with scheduling and
• The act of switching operating system execution, whereas a
processor control
process is concerned
between threads
within the same with both
process scheduling/execution
• Typically less costly and resource and
Thread:
than process switch resource ownership
• Dispatchable unit of
work within a process Process:
• Includes processor • An instance of
context (which program running on
includes the program computer
counter and stack • Two key
pointer) and data Process switch characteristics:
area for stack • Operation that • Resource
• Executes sequentially switches the processor ownership
and is interruptible so from one process to • Scheduling/
that the processor another by saving all execution
can turn to another the process control
thread data, registers, and
other information for
the first and replacing
them with the process
Implicit and Explicit
Multithreading
All commercial processors and most
experimental ones use explicit
multithreading
Concurrently execute instructions from
different explicit threads
Interleave instructions from different
threads on shared pipelines or parallel
execution on parallel pipelines
+ Implicit multithreading is concurrent
execution of multiple threads
extracted from single sequential
program
Implicit threads defined statically by
compiler or dynamically by hardware
+ Approaches to Explicit
Multithreading
Interleaved Blocked
Fine-grained (divided) Coarse-grained (no fine)
Processor deals with two or Thread executed until
more thread contexts at a event causes delay (IO)
time Effective on in-order
Switching thread at each processor
clock cycle Avoids pipeline stall
If thread is blocked it is (failure)
skipped
Chip multiprocessing
Simultaneous (SMT) Processor is replicated on a
Instructions are single chip
simultaneously issued from
multiple threads to
Each processor handles
execution units of separate threads
superscalar processor Advantage is that the
available logic area on a
SMT: Simultaneous Multithreading chip is used effectively
+
Approache
s to
Executing
Multiple
Threads
+
Example Systems
Pentium 4 IBM Power5
More recent models of
Chip used in high-end
the Pentium 4 use a PowerPC products
multithreading technique
that Intel refers to as
Combines chip
hyperthreading multiprocessing with SMT
Has two separate processors,
Approach is to use SMT each of which is a
with support for two multithreaded processor
threads capable of supporting two
threads concurrently using SMT
Thus the single Designers found that having
multithreaded processor two two-way SMT processors on
is logically two a single chip provided superior
performance to a single four-
processors
way SMT processor
+
Exercises
17.1 List and briefly define three types of computer system
organization.
17.2 What are the chief characteristics of an SMP(symmetric
multiprocessor)?
17.3 What are some of the potential advantages of an SMP
compared with a uniprocessor?
17.4 What are some of the key OS design issues for an SMP?
17.5 What is the difference between software and hardware
cache coherent schemes?
17.6 What is the meaning of each of the four states in the
MESI protocol?
+ Summary Parallel
Processing
Chapter 17
Multiple processor Cache coherence and the
organizations MESI protocol
Types of parallel Software solutions
processor systems
Hardware solutions
Parallel organizations
The MESI protocol
Symmetric
multiprocessors Multithreading and chip
multiprocessors
Organization
Multiprocessor operating
Implicit and explicit
system design multithreading
considerations Approaches to explicit
multithreading
Example systems