0% found this document useful (0 votes)
33 views32 pages

Slot28 CH17 ParallelProcessing 32 Slides

This chapter discusses parallel processing, focusing on multiple processor organizations, symmetric multiprocessors, cache coherence, and multithreading. Key topics include types of parallel processor organizations (SISD, SIMD, MISD, MIMD), the MESI protocol for cache coherence, and the differences between implicit and explicit multithreading. The chapter also highlights the design considerations for multiprocessor operating systems and the advantages of symmetric multiprocessors over uniprocessor systems.

Uploaded by

baros1562004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views32 pages

Slot28 CH17 ParallelProcessing 32 Slides

This chapter discusses parallel processing, focusing on multiple processor organizations, symmetric multiprocessors, cache coherence, and multithreading. Key topics include types of parallel processor organizations (SISD, SIMD, MISD, MIMD), the MESI protocol for cache coherence, and the differences between implicit and explicit multithreading. The chapter also highlights the design considerations for multiprocessor operating systems and the advantages of symmetric multiprocessors over uniprocessor systems.

Uploaded by

baros1562004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

+

Parallel
Chapter Processing
17
William Stallings, Computer Organization and Architecture, 9 th Edition
+
Objectives

You are profiting from multiple CPU computers, You should


know about them.

After studying this chapter, you should be able to:


 Summarize the types of parallel processor organizations.
 Present an overview of design features of symmetric
multiprocessors. Understand the issue of cache
coherence in a multiple processor system.
 Explain the key features of the MESI protocol.
 Explain the difference between implicit and explicit
multithreading. Summarize key design issues for clusters.
+
Contents

 17.1 Multiple Processor Organizations


 17.2 Symmetric Multiprocessors
 17.3 Cache Coherence and the MESI Protocol
 17.4 Multithreading and Chip Multiprocessors
+
17.1- Multiple Processor
Organization
 Single instruction, single data  Multiple instruction, single
(SISD) stream data (MISD) stream
 Single processor executes a  A sequence of data is
single instruction stream to transmitted to a set of
operate on data stored in a processors, each of which
single memory executes a different instruction
 Uniprocessors fall into this sequence
category  Not commercially implemented

 Single instruction, multiple data  Multiple instruction, multiple


(SIMD) stream data (MIMD) stream
 A single machine instruction  A set of processors
controls the simultaneous simultaneously execute different
execution of a number of instruction sequences on different
processing elements on a data sets
lockstep basis  SMPs, clusters and NUMA systems
 Vector and array processors fall fit this category
into this category
Parallel Organizations
Parallel Organizations
17.2- Symmetric Multiprocessor
(SMP)
A stand alone computer
with the following
characteristics:
Processors All System
share same processors controlled by
memory and share access All integrated
I/O facilities to I/O processors operating
Two or more • Processors are devices
can perform system
similar connected by a • Either through • Provides
the same
processors of bus or other same channels interaction
internal or different functions between
comparable connection channels giving (hence processors and
capacity • Memory access paths to same “symmetric” their programs
time is devices at job, task, file
approximately ) and data
the same for element levels
each processor
Multiprogramming and Multiprocessing

The operating system of an SMP schedules processes or threads across all of the
processors. SMP has a number of potential advantages over a uni-processor
organization, including the following: Performance, availability, incremental
growth (user can add processors), scaling (Vendors can offer a range of products
with different configures)
Organization: Tightly Coupled
• Each processor is self-
contained (CU, registers, one
or more caches).
• Shared main memory and I/O
devices through some form
of interconnection
mechanism.
• Processors can communicate
with each other through
memory.
• A processor can exchange
signals directly to each other.
• The memory is often
organized so that multiple
simultaneous accesses to
separate blocks of memory
are possible.
• In some configurations, each
processor may also have its
own private main memory
and I/O channels in addition
Organization: Symmetric Multiprocessor
• The most common
organization for
personal
computers,
workstations, and
servers is the time-
shared bus. The
time-shared bus is
the simplest
mechanism for
constructing a
multiprocessor
system.
• The structure and
interfaces are
DMA:
basically the same
• Addressing: <source, destination> as for a single-
• Arbitration: Any I/O module can be “master.” processor system
• Time-sharing that uses a bus
interconnection.
+
The bus organization has several
attractive features:

 Simplicity
 Simplest approach to multiprocessor organization

 Flexibility
 Generally easy to expand the system by attaching more
processors to the bus

 Reliability
 The bus is essentially a passive medium and the failure of
any attached device should not cause failure of the whole
system
+
Disadvantages of the bus organization:

 Main drawback is performance


 All memory references pass through the common bus
 Performance is limited by bus cycle time

 Each processor should have cache memory


 Reduces the number of bus accesses

 Leads to problems with cache coherence


 If a word is altered in one cache it could conceivably
invalidate a word in another cache
 To prevent this the other processors must be alerted that
an update has taken place
 Typically addressed in hardware rather than the operating
system
+ Multiprocessor Operating
System Design
Considerations
Simultaneous concurrent processes
 OS routines need to be reentrant (center) to allow several
processors to execute the same OS code (OS service)
simultaneously
 OS tables and management structures must be managed
properly to avoid deadlock or invalid operations

 Scheduling
 Any processor may perform scheduling so conflicts must be
avoided
 Scheduler must assign ready processes to available
processors

 Synchronization
With multiple active processes having potential access to shared
address spaces or I/O resources, care must be taken to provide
mutualeffective
exclusion: loạisynchronization
trừ hỗ tương, cơ chế độc chiếm tài nguyên, một nguyên nhân gây deadlock

+ Multiprocessor Operating System
Design Considerations…
 Memory management
 In addition to dealing with all of the issues found on
uniprocessor machines, the OS needs to exploit the available
hardware parallelism to achieve the best performance
 Paging mechanisms on different processors must be
coordinated to enforce consistency when several processors
share a page or segment and to decide on page replacement

 Reliability and fault tolerance


 OS should provide graceful degradation (suy giảm) in the face of
processor failure
 Scheduler and other portions of the operating system must
recognize the loss of a processor and restructure
accordingly
17.3- Cache Coherence and the
+
MESI Protocol Review
:
Write back: Write operations are usually made only to the cache.
Main memory is only updated when the corresponding cache line
is flushed from the cache  can result in inconsistency

Write through: All write operations are made to main memory as


well as to the cache, ensuring that main memory is always valid.
Even with the write-through policy, inconsistency can occur
unless other caches monitor the memory traffic or receive some
direct notification of the update

MESI (modified/exclusive/shared/invalid) protocol is
recommended here.
Coherent: sticking together – cố kết
Consistency: disambiguation- nhất quán, không nhập nhằng
Protocol: way including some steps for communication- giao thức
+ Cache Coherence…

Software Solutions

 Attempt to avoid the need for additional hardware


circuitry and logic by relying on the compiler and
operating system to deal with the problem (không
muốn thêm phần cứng)
 Attractive because the overhead of detecting
potential problems is transferred from run time to
compile time, and the design complexity is transferred
from hardware to software
 However, compile-time software approaches generally must
make conservative decisions, leading to inefficient cache
utilization
+
Cache Coherence…
Hardware-Based Solutions
 Generally referred to as cache coherence protocols
 These solutions provide dynamic recognition at run time
of potential inconsistency conditions
 Because the problem is only dealt with when it actually
arises there is more effective use of caches, leading to
improved performance over a software approach
 Approaches are transparent to the programmer and the
compiler, reducing the software development burden
 Can be divided into two categories:
 Directory protocols
 Snoopy protocols
Transparent: unable to see- trong suốt
Snoop: spy, rình mò
Directory Protocols
Effective in large
Collect and maintain
scale systems with
information about
complex
There is a copies of data in
cache
interconnection
schemes
centralized
controller that
is part of the
Directory stored in Creates central
main memory main memory bottleneck
controller

Requests are Appropriate


checked against transfers are
directory performed
Snoopy Protocols
 Distribute the responsibility for maintaining cache coherence
among all of the cache controllers in a multiprocessor
 A cache must recognize when a line that it holds is shared with other
caches
 When updates are performed on a shared cache line, it must be
announced to other caches by a broadcast mechanism
 Each cache controller is able to “snoop” on the network to observe these
broadcast notifications and react accordingly

 Suited to bus-based multiprocessor because the shared bus provides


a simple means for broadcasting and snooping
 Care must be taken that the increased bus traffic required for broadcasting
and snooping does not cancel out the gains from the use of local caches

 Two basic approaches have been explored:


 Write invalidate
 Write update (or write broadcast)
+
Write Invalidate

 Multiple readers, but only one writer at a time


 When a write is required, all other caches of the line
are invalidated (marked)
 Writing processor then has exclusive (độc chiếm-cheap)
access until line is required by another processor
 Most widely used in commercial multiprocessor systems
such as the Pentium 4 and PowerPC
 State of every line is marked as modified, exclusive,
shared or invalid
 For this reason the write-invalidate protocol is called MESI
+
Write Update

 Can be multiple readers and writers


 When a processor wishes to update a shared line
the word to be updated is distributed to all others
and caches containing that line can update it
 Some systems use an adaptive mixture of both write-
invalidate and write-update mechanisms
+
MESI Protocol
To provide cache consistency on an SMP (symmetric
multi-processor) the data cache supports a protocol
known as MESI:
 Modified
 The line in the cache has been modified and is available
only in this cache

 Exclusive
 The line in the cache is the same as that in main memory
and is not present in any other cache

 Shared
 The line in the cache is the same as that in main memory
and may be present in another cache

 Invalid
 The line in the cache does not contain valid data
Table 17.1
MESI Cache Line States

Table 17.1 summarizes the meaning of the four states.


MESI State Transition Diagram
+
17.4- Multithreading and Chip
Multiprocessors
 Processor performance can be measured by the rate at
which it executes instructions
 MIPS rate = f * IPC // Million Instructions Per second
 f = processor clock frequency, in MHz
 IPC=average Instructions Per Cycle

 Increase performance by increasing clock frequency and


increasing instructions that complete during cycle
 Multithreading
 Allows for a high degree of instruction-level parallelism without
increasing circuit complexity or power consumption  Increase
IPC
 Instruction stream is divided into several smaller streams,
known as threads, that can be executed in parallel
Definitions of Threads and Processes
Thread in
multithreaded
processors may or may
not be the same as the
concept of software
threads in a Thread is concerned
Thread switch multiprogrammed with scheduling and
• The act of switching operating system execution, whereas a
processor control
process is concerned
between threads
within the same with both
process scheduling/execution
• Typically less costly and resource and
Thread:
than process switch resource ownership
• Dispatchable unit of
work within a process Process:
• Includes processor • An instance of
context (which program running on
includes the program computer
counter and stack • Two key
pointer) and data Process switch characteristics:
area for stack • Operation that • Resource
• Executes sequentially switches the processor ownership
and is interruptible so from one process to • Scheduling/
that the processor another by saving all execution
can turn to another the process control
thread data, registers, and
other information for
the first and replacing
them with the process
Implicit and Explicit
Multithreading
 All commercial processors and most
experimental ones use explicit
multithreading
 Concurrently execute instructions from
different explicit threads
 Interleave instructions from different
threads on shared pipelines or parallel
execution on parallel pipelines

+ Implicit multithreading is concurrent



execution of multiple threads
extracted from single sequential
program
 Implicit threads defined statically by
compiler or dynamically by hardware
+ Approaches to Explicit
Multithreading
 Interleaved  Blocked
 Fine-grained (divided)  Coarse-grained (no fine)
 Processor deals with two or  Thread executed until
more thread contexts at a event causes delay (IO)
time  Effective on in-order
 Switching thread at each processor
clock cycle  Avoids pipeline stall
 If thread is blocked it is (failure)
skipped
 Chip multiprocessing
 Simultaneous (SMT)  Processor is replicated on a
 Instructions are single chip
simultaneously issued from
multiple threads to
 Each processor handles
execution units of separate threads
superscalar processor  Advantage is that the
available logic area on a
SMT: Simultaneous Multithreading chip is used effectively
+
Approache
s to
Executing
Multiple
Threads
+
Example Systems
Pentium 4 IBM Power5

 More recent models of


 Chip used in high-end
the Pentium 4 use a PowerPC products
multithreading technique
that Intel refers to as
 Combines chip
hyperthreading multiprocessing with SMT
 Has two separate processors,
 Approach is to use SMT each of which is a
with support for two multithreaded processor
threads capable of supporting two
threads concurrently using SMT
 Thus the single  Designers found that having
multithreaded processor two two-way SMT processors on
is logically two a single chip provided superior
performance to a single four-
processors
way SMT processor
+
Exercises
 17.1 List and briefly define three types of computer system
organization.
 17.2 What are the chief characteristics of an SMP(symmetric
multiprocessor)?
 17.3 What are some of the potential advantages of an SMP
compared with a uniprocessor?
 17.4 What are some of the key OS design issues for an SMP?
 17.5 What is the difference between software and hardware
cache coherent schemes?
 17.6 What is the meaning of each of the four states in the
MESI protocol?
+ Summary Parallel
Processing
Chapter 17
 Multiple processor  Cache coherence and the
organizations MESI protocol
 Types of parallel  Software solutions
processor systems
 Hardware solutions
 Parallel organizations
 The MESI protocol
 Symmetric
multiprocessors  Multithreading and chip
multiprocessors
 Organization
 Multiprocessor operating
 Implicit and explicit
system design multithreading
considerations  Approaches to explicit
multithreading
 Example systems

You might also like