0% found this document useful (0 votes)

33 views32 pages

Slot28 CH17 ParallelProcessing 32 Slides

This chapter discusses parallel processing, focusing on multiple processor organizations, symmetric multiprocessors, cache coherence, and multithreading. Key topics include types of parallel processor organizations (SISD, SIMD, MISD, MIMD), the MESI protocol for cache coherence, and the differences between implicit and explicit multithreading. The chapter also highlights the design considerations for multiprocessor operating systems and the advantages of symmetric multiprocessors over uniprocessor systems.

Uploaded by

baros1562004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views32 pages

Slot28 CH17 ParallelProcessing 32 Slides

Uploaded by

baros1562004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 32

+

Parallel
Chapter Processing
17
William Stallings, Computer Organization and Architecture, 9 th Edition
+
Objectives

You are profiting from multiple CPU computers, You should

know about them.

After studying this chapter, you should be able to:

 Summarize the types of parallel processor organizations.
 Present an overview of design features of symmetric
multiprocessors. Understand the issue of cache
coherence in a multiple processor system.
 Explain the key features of the MESI protocol.
 Explain the difference between implicit and explicit
multithreading. Summarize key design issues for clusters.
+
Contents

 17.1 Multiple Processor Organizations

 17.2 Symmetric Multiprocessors
 17.3 Cache Coherence and the MESI Protocol
 17.4 Multithreading and Chip Multiprocessors
+
17.1- Multiple Processor
Organization
 Single instruction, single data  Multiple instruction, single
(SISD) stream data (MISD) stream
 Single processor executes a  A sequence of data is
single instruction stream to transmitted to a set of
operate on data stored in a processors, each of which
single memory executes a different instruction
 Uniprocessors fall into this sequence
category  Not commercially implemented

 Single instruction, multiple data  Multiple instruction, multiple

(SIMD) stream data (MIMD) stream
 A single machine instruction  A set of processors
controls the simultaneous simultaneously execute different
execution of a number of instruction sequences on different
processing elements on a data sets
lockstep basis  SMPs, clusters and NUMA systems
 Vector and array processors fall fit this category
into this category
Parallel Organizations
Parallel Organizations
17.2- Symmetric Multiprocessor
(SMP)
A stand alone computer
with the following
characteristics:
Processors All System
share same processors controlled by
memory and share access All integrated
I/O facilities to I/O processors operating
Two or more • Processors are devices
can perform system
similar connected by a • Either through • Provides
the same
processors of bus or other same channels interaction
internal or different functions between
comparable connection channels giving (hence processors and
capacity • Memory access paths to same “symmetric” their programs
time is devices at job, task, file
approximately ) and data
the same for element levels
each processor
Multiprogramming and Multiprocessing

The operating system of an SMP schedules processes or threads across all of the
processors. SMP has a number of potential advantages over a uni-processor
organization, including the following: Performance, availability, incremental
growth (user can add processors), scaling (Vendors can offer a range of products
with different configures)
Organization: Tightly Coupled
• Each processor is self-
contained (CU, registers, one
or more caches).
• Shared main memory and I/O
devices through some form
of interconnection
mechanism.
• Processors can communicate
with each other through
memory.
• A processor can exchange
signals directly to each other.
• The memory is often
organized so that multiple
simultaneous accesses to
separate blocks of memory
are possible.
• In some configurations, each
processor may also have its
own private main memory
and I/O channels in addition
Organization: Symmetric Multiprocessor
• The most common
organization for
personal
computers,
workstations, and
servers is the time-
shared bus. The
time-shared bus is
the simplest
mechanism for
constructing a
multiprocessor
system.
• The structure and
interfaces are
DMA:
basically the same
• Addressing: <source, destination> as for a single-
• Arbitration: Any I/O module can be “master.” processor system
• Time-sharing that uses a bus
interconnection.
+
The bus organization has several
attractive features:

 Simplicity
 Simplest approach to multiprocessor organization

 Flexibility
 Generally easy to expand the system by attaching more
processors to the bus

 Reliability
 The bus is essentially a passive medium and the failure of
any attached device should not cause failure of the whole
system
+
Disadvantages of the bus organization:

 Main drawback is performance

 All memory references pass through the common bus
 Performance is limited by bus cycle time

 Each processor should have cache memory

 Reduces the number of bus accesses

 Leads to problems with cache coherence

 If a word is altered in one cache it could conceivably
invalidate a word in another cache
 To prevent this the other processors must be alerted that
an update has taken place
 Typically addressed in hardware rather than the operating
system
+ Multiprocessor Operating
System Design
Considerations
Simultaneous concurrent processes
 OS routines need to be reentrant (center) to allow several
processors to execute the same OS code (OS service)
simultaneously
 OS tables and management structures must be managed
properly to avoid deadlock or invalid operations

 Scheduling
 Any processor may perform scheduling so conflicts must be
avoided
 Scheduler must assign ready processes to available
processors

 Synchronization
With multiple active processes having potential access to shared
address spaces or I/O resources, care must be taken to provide
mutualeffective
exclusion: loạisynchronization
trừ hỗ tương, cơ chế độc chiếm tài nguyên, một nguyên nhân gây deadlock

+ Multiprocessor Operating System
Design Considerations…
 Memory management
 In addition to dealing with all of the issues found on
uniprocessor machines, the OS needs to exploit the available
hardware parallelism to achieve the best performance
 Paging mechanisms on different processors must be
coordinated to enforce consistency when several processors
share a page or segment and to decide on page replacement

 Reliability and fault tolerance

 OS should provide graceful degradation (suy giảm) in the face of
processor failure
 Scheduler and other portions of the operating system must
recognize the loss of a processor and restructure
accordingly
17.3- Cache Coherence and the
+
MESI Protocol Review
:
Write back: Write operations are usually made only to the cache.
Main memory is only updated when the corresponding cache line
is flushed from the cache  can result in inconsistency

Write through: All write operations are made to main memory as

well as to the cache, ensuring that main memory is always valid.
Even with the write-through policy, inconsistency can occur
unless other caches monitor the memory traffic or receive some
direct notification of the update

MESI (modified/exclusive/shared/invalid) protocol is
recommended here.
Coherent: sticking together – cố kết
Consistency: disambiguation- nhất quán, không nhập nhằng
Protocol: way including some steps for communication- giao thức
+ Cache Coherence…

Software Solutions

 Attempt to avoid the need for additional hardware

circuitry and logic by relying on the compiler and
operating system to deal with the problem (không
muốn thêm phần cứng)
 Attractive because the overhead of detecting
potential problems is transferred from run time to
compile time, and the design complexity is transferred
from hardware to software
 However, compile-time software approaches generally must
make conservative decisions, leading to inefficient cache
utilization
+
Cache Coherence…
Hardware-Based Solutions
 Generally referred to as cache coherence protocols
 These solutions provide dynamic recognition at run time
of potential inconsistency conditions
 Because the problem is only dealt with when it actually
arises there is more effective use of caches, leading to
improved performance over a software approach
 Approaches are transparent to the programmer and the
compiler, reducing the software development burden
 Can be divided into two categories:
 Directory protocols
 Snoopy protocols
Transparent: unable to see- trong suốt
Snoop: spy, rình mò
Directory Protocols
Effective in large
Collect and maintain
scale systems with
information about
complex
There is a copies of data in
cache
interconnection
schemes
centralized
controller that
is part of the
Directory stored in Creates central
main memory main memory bottleneck
controller

Requests are Appropriate

checked against transfers are
directory performed
Snoopy Protocols
 Distribute the responsibility for maintaining cache coherence
among all of the cache controllers in a multiprocessor
 A cache must recognize when a line that it holds is shared with other
caches
 When updates are performed on a shared cache line, it must be
announced to other caches by a broadcast mechanism
 Each cache controller is able to “snoop” on the network to observe these
broadcast notifications and react accordingly

 Suited to bus-based multiprocessor because the shared bus provides

a simple means for broadcasting and snooping
 Care must be taken that the increased bus traffic required for broadcasting
and snooping does not cancel out the gains from the use of local caches

 Two basic approaches have been explored:

 Write invalidate
 Write update (or write broadcast)
+
Write Invalidate

 Multiple readers, but only one writer at a time

 When a write is required, all other caches of the line
are invalidated (marked)
 Writing processor then has exclusive (độc chiếm-cheap)
access until line is required by another processor
 Most widely used in commercial multiprocessor systems
such as the Pentium 4 and PowerPC
 State of every line is marked as modified, exclusive,
shared or invalid
 For this reason the write-invalidate protocol is called MESI
+
Write Update

 Can be multiple readers and writers

 When a processor wishes to update a shared line
the word to be updated is distributed to all others
and caches containing that line can update it
 Some systems use an adaptive mixture of both write-
invalidate and write-update mechanisms
+
MESI Protocol
To provide cache consistency on an SMP (symmetric
multi-processor) the data cache supports a protocol
known as MESI:
 Modified
 The line in the cache has been modified and is available
only in this cache

 Exclusive
 The line in the cache is the same as that in main memory
and is not present in any other cache

 Shared
 The line in the cache is the same as that in main memory
and may be present in another cache

 Invalid
 The line in the cache does not contain valid data
Table 17.1
MESI Cache Line States

Table 17.1 summarizes the meaning of the four states.

MESI State Transition Diagram
+
17.4- Multithreading and Chip
Multiprocessors
 Processor performance can be measured by the rate at
which it executes instructions
 MIPS rate = f * IPC // Million Instructions Per second
 f = processor clock frequency, in MHz
 IPC=average Instructions Per Cycle

 Increase performance by increasing clock frequency and

increasing instructions that complete during cycle
 Multithreading
 Allows for a high degree of instruction-level parallelism without
increasing circuit complexity or power consumption  Increase
IPC
 Instruction stream is divided into several smaller streams,
known as threads, that can be executed in parallel
Definitions of Threads and Processes
Thread in
multithreaded
processors may or may
not be the same as the
concept of software
threads in a Thread is concerned
Thread switch multiprogrammed with scheduling and
• The act of switching operating system execution, whereas a
processor control
process is concerned
between threads
within the same with both
process scheduling/execution
• Typically less costly and resource and
Thread:
than process switch resource ownership
• Dispatchable unit of
work within a process Process:
• Includes processor • An instance of
context (which program running on
includes the program computer
counter and stack • Two key
pointer) and data Process switch characteristics:
area for stack • Operation that • Resource
• Executes sequentially switches the processor ownership
and is interruptible so from one process to • Scheduling/
that the processor another by saving all execution
can turn to another the process control
thread data, registers, and
other information for
the first and replacing
them with the process
Implicit and Explicit
Multithreading
 All commercial processors and most
experimental ones use explicit
multithreading
 Concurrently execute instructions from
different explicit threads
 Interleave instructions from different
threads on shared pipelines or parallel
execution on parallel pipelines

+ Implicit multithreading is concurrent


execution of multiple threads
extracted from single sequential
program
 Implicit threads defined statically by
compiler or dynamically by hardware
+ Approaches to Explicit
Multithreading
 Interleaved  Blocked
 Fine-grained (divided)  Coarse-grained (no fine)
 Processor deals with two or  Thread executed until
more thread contexts at a event causes delay (IO)
time  Effective on in-order
 Switching thread at each processor
clock cycle  Avoids pipeline stall
 If thread is blocked it is (failure)
skipped
 Chip multiprocessing
 Simultaneous (SMT)  Processor is replicated on a
 Instructions are single chip
simultaneously issued from
multiple threads to
 Each processor handles
execution units of separate threads
superscalar processor  Advantage is that the
available logic area on a
SMT: Simultaneous Multithreading chip is used effectively
+
Approache
s to
Executing
Multiple
Threads
+
Example Systems
Pentium 4 IBM Power5

 More recent models of

 Chip used in high-end
the Pentium 4 use a PowerPC products
multithreading technique
that Intel refers to as
 Combines chip
hyperthreading multiprocessing with SMT
 Has two separate processors,
 Approach is to use SMT each of which is a
with support for two multithreaded processor
threads capable of supporting two
threads concurrently using SMT
 Thus the single  Designers found that having
multithreaded processor two two-way SMT processors on
is logically two a single chip provided superior
performance to a single four-
processors
way SMT processor
+
Exercises
 17.1 List and briefly define three types of computer system
organization.
 17.2 What are the chief characteristics of an SMP(symmetric
multiprocessor)?
 17.3 What are some of the potential advantages of an SMP
compared with a uniprocessor?
 17.4 What are some of the key OS design issues for an SMP?
 17.5 What is the difference between software and hardware
cache coherent schemes?
 17.6 What is the meaning of each of the four states in the
MESI protocol?
+ Summary Parallel
Processing
Chapter 17
 Multiple processor  Cache coherence and the
organizations MESI protocol
 Types of parallel  Software solutions
processor systems
 Hardware solutions
 Parallel organizations
 The MESI protocol
 Symmetric
multiprocessors  Multithreading and chip
multiprocessors
 Organization
 Multiprocessor operating
 Implicit and explicit
system design multithreading
considerations  Approaches to explicit
multithreading
 Example systems

CH17 ParallelProcessing 32 Slides
No ratings yet
CH17 ParallelProcessing 32 Slides
32 pages
Parallel Processing in Computer Architecture
No ratings yet
Parallel Processing in Computer Architecture
32 pages
Parallel Processor Insights
No ratings yet
Parallel Processor Insights
32 pages
CH17 COA9e Parallel Processing
No ratings yet
CH17 COA9e Parallel Processing
52 pages
Chapter 8 - Parallel Processing
No ratings yet
Chapter 8 - Parallel Processing
50 pages
CH17 COA9e
No ratings yet
CH17 COA9e
51 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
51 pages
PART17
No ratings yet
PART17
45 pages
Lecture-7 SMP NUMA Cache Coherence
No ratings yet
Lecture-7 SMP NUMA Cache Coherence
34 pages
William Stallings Computer Organization and Architecture 10 Edition
No ratings yet
William Stallings Computer Organization and Architecture 10 Edition
34 pages
William Stallings Computer Organization and Architecture 10 Edition
No ratings yet
William Stallings Computer Organization and Architecture 10 Edition
34 pages
CH17 COA10e
No ratings yet
CH17 COA10e
45 pages
William Stallings Computer Organization and Architecture: Parallel Processing
No ratings yet
William Stallings Computer Organization and Architecture: Parallel Processing
40 pages
CH5 Parallel Processing
No ratings yet
CH5 Parallel Processing
30 pages
CH17-COA10e - Parallel Processing
No ratings yet
CH17-COA10e - Parallel Processing
45 pages
L32 SMP
No ratings yet
L32 SMP
47 pages
Parallel Processor Overview
No ratings yet
Parallel Processor Overview
32 pages
CH20 COA11e
No ratings yet
CH20 COA11e
40 pages
Parallel Prrocessor
No ratings yet
Parallel Prrocessor
12 pages
MULTIPROCTLPA
No ratings yet
MULTIPROCTLPA
99 pages
Multi Processors and Thread Level Parallelism
No ratings yet
Multi Processors and Thread Level Parallelism
74 pages
Multi-Processor / Parallel Processing
No ratings yet
Multi-Processor / Parallel Processing
12 pages
Multi-Processor-Parallel Processing PDF
No ratings yet
Multi-Processor-Parallel Processing PDF
12 pages
Multi-Processor / Parallel Processing
No ratings yet
Multi-Processor / Parallel Processing
12 pages
Parallel Processing
No ratings yet
Parallel Processing
28 pages
Unit VI
No ratings yet
Unit VI
50 pages
Unit6 - Microprocessor - Final 1
No ratings yet
Unit6 - Microprocessor - Final 1
30 pages
Chapter - 5 Introduction To Advanced Architecture 5.1 Introduction To Parallel Processing
No ratings yet
Chapter - 5 Introduction To Advanced Architecture 5.1 Introduction To Parallel Processing
11 pages
Parallel Processing:: Multiple Processor Organization
No ratings yet
Parallel Processing:: Multiple Processor Organization
24 pages
Group 6 Task
No ratings yet
Group 6 Task
11 pages
MODULE 4 HPC
No ratings yet
MODULE 4 HPC
41 pages
Multiprocessors and Multithreading: CS151B/EE M116C Computer Systems Architecture
No ratings yet
Multiprocessors and Multithreading: CS151B/EE M116C Computer Systems Architecture
13 pages
Types of Computer System Organizations
No ratings yet
Types of Computer System Organizations
2 pages
Multiprocessor Systems Overview
No ratings yet
Multiprocessor Systems Overview
51 pages
Week 5
No ratings yet
Week 5
52 pages
Parallel Arch 2
No ratings yet
Parallel Arch 2
9 pages
Multiprocessing: Flynn's Classification (1966)
No ratings yet
Multiprocessing: Flynn's Classification (1966)
8 pages
Understanding Multi-Core Processors
No ratings yet
Understanding Multi-Core Processors
31 pages
15 Parallel Processing
No ratings yet
15 Parallel Processing
36 pages
Multiprocessors
No ratings yet
Multiprocessors
39 pages
Unit 3
No ratings yet
Unit 3
28 pages
Chapter 17 - Exercises - Parallel Processing (AutoRecovered)
No ratings yet
Chapter 17 - Exercises - Parallel Processing (AutoRecovered)
9 pages
Overview of Multiprocessor Systems
No ratings yet
Overview of Multiprocessor Systems
45 pages
Chapter 10
No ratings yet
Chapter 10
6 pages
Unit 6
No ratings yet
Unit 6
36 pages
Contents:: Multiprocessors: Characteristics of Multiprocessor, Structure of Multiprocessor
No ratings yet
Contents:: Multiprocessors: Characteristics of Multiprocessor, Structure of Multiprocessor
52 pages
Operating Systems
No ratings yet
Operating Systems
52 pages
Unit 6 Mom
No ratings yet
Unit 6 Mom
23 pages
Chapter 3
No ratings yet
Chapter 3
35 pages
Week 5
No ratings yet
Week 5
35 pages
COA Assignment
No ratings yet
COA Assignment
21 pages
Multi Core
No ratings yet
Multi Core
7 pages
Computer System Architecture Overview
No ratings yet
Computer System Architecture Overview
16 pages
AOS Notes
No ratings yet
AOS Notes
61 pages
Thread Level Parallelism
No ratings yet
Thread Level Parallelism
21 pages
Multiprocessor Systems Overview
No ratings yet
Multiprocessor Systems Overview
6 pages
Multiprocessor vs. Multicomputer Systems
No ratings yet
Multiprocessor vs. Multicomputer Systems
52 pages
Unit - 1 Introduction
No ratings yet
Unit - 1 Introduction
52 pages
Chapter No. 1 Motherboard & Its Components
No ratings yet
Chapter No. 1 Motherboard & Its Components
19 pages
Tegra K1 DataSheet DS06742001v02
No ratings yet
Tegra K1 DataSheet DS06742001v02
83 pages
CS6303 Computer Architecture 2
No ratings yet
CS6303 Computer Architecture 2
56 pages
Ipq40x8 Ipq40x9 Product Brief
No ratings yet
Ipq40x8 Ipq40x9 Product Brief
1 page
Cambridge IGCSE: Computer Science 0478/12
No ratings yet
Cambridge IGCSE: Computer Science 0478/12
16 pages
Samsung QMR-B, QMT-B Series UHD Signage
No ratings yet
Samsung QMR-B, QMT-B Series UHD Signage
8 pages
z16 Technical Overview 50M KennyStine
No ratings yet
z16 Technical Overview 50M KennyStine
64 pages
Sure! Here's A Concise 1-2 Line Exp
No ratings yet
Sure! Here's A Concise 1-2 Line Exp
2 pages
Comsci 1 New
No ratings yet
Comsci 1 New
16 pages
Cache and Associative Memory Overview
No ratings yet
Cache and Associative Memory Overview
53 pages
Multicore Processors and Software Challenges
No ratings yet
Multicore Processors and Software Challenges
96 pages
Comparison of Intel Processors
No ratings yet
Comparison of Intel Processors
7 pages
SRRR
No ratings yet
SRRR
1 page
Using The ADSP-2100 Family Volume 1 PDF
No ratings yet
Using The ADSP-2100 Family Volume 1 PDF
606 pages
CUDA C Programming Course Overview
No ratings yet
CUDA C Programming Course Overview
30 pages
IBM POWER9 Performance Enhancements
No ratings yet
IBM POWER9 Performance Enhancements
55 pages
EXPT 10 Circular Conv Using DSP Kit
No ratings yet
EXPT 10 Circular Conv Using DSP Kit
11 pages
Lesson 7: Transforming Data Into Information
No ratings yet
Lesson 7: Transforming Data Into Information
35 pages
Open Rails Log
No ratings yet
Open Rails Log
27 pages
Pentium Pro Processor Specification Update: Release Date: January 1999
No ratings yet
Pentium Pro Processor Specification Update: Release Date: January 1999
107 pages
Cache Memory CALCULATIONS Lecturer
No ratings yet
Cache Memory CALCULATIONS Lecturer
24 pages
Cache Memory
No ratings yet
Cache Memory
11 pages
Chapter 2
No ratings yet
Chapter 2
65 pages
x86 Bus Cycles & Access Time Guide
No ratings yet
x86 Bus Cycles & Access Time Guide
8 pages
Computer Systems Architecture Exam Solutions
100% (1)
Computer Systems Architecture Exam Solutions
8 pages
501-Midterm Ps
No ratings yet
501-Midterm Ps
13 pages
Bus-Based Snoopy Protocol Overview
No ratings yet
Bus-Based Snoopy Protocol Overview
19 pages
Universal Full Report
No ratings yet
Universal Full Report
1,175 pages
NVIDIAFermiComputeArchitectureWhitepaper PDF
No ratings yet
NVIDIAFermiComputeArchitectureWhitepaper PDF
21 pages
Understanding Primary and Cache Memory
No ratings yet
Understanding Primary and Cache Memory
21 pages

Slot28 CH17 ParallelProcessing 32 Slides

Uploaded by

Slot28 CH17 ParallelProcessing 32 Slides

Uploaded by

+

You are profiting from multiple CPU computers, You should

After studying this chapter, you should be able to:

 17.1 Multiple Processor Organizations

 Single instruction, multiple data  Multiple instruction, multiple

 Main drawback is performance

 Each processor should have cache memory

 Leads to problems with cache coherence

 Reliability and fault tolerance

Write through: All write operations are made to main memory as

 Attempt to avoid the need for additional hardware

Requests are Appropriate

 Suited to bus-based multiprocessor because the shared bus provides

 Two basic approaches have been explored:

 Multiple readers, but only one writer at a time

 Can be multiple readers and writers

Table 17.1 summarizes the meaning of the four states.

 Increase performance by increasing clock frequency and

+ Implicit multithreading is concurrent

 More recent models of

You might also like