Outline
Introduction
Bus Interconnection Structures
Background
Evolution of Buses
Operation of the Bus
Classification of Bus Lines Based on Functions
Elements of Bus Design
PCI Bus
Point‐to‐point Interconnection Structures
Background
PCIe
1 2
Introduction Introduction…
A computer consists of a set of components or modules Connection requirements of the three basic components
of three basic types, that communicate with each other of a computer system (below)
Processor, memory, I/O modules
All the units must be connected, and the collection of
paths connecting various units is called
Interconnection structure
In effect interconnection structures are the glue that
holds computer system together
Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 3 Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 4
Computer Architecture and Organization (CoEng4091) - Tinbit
Introduction… Introduction…
CPU Memory
Reads instruction and data Receives and sends data
Writes out data (after processing) Receives addresses (of locations)
Sends control signals to other units Receives control signals
Receives (& acts on) interrupts Read
Write
Timing
Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 5 Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 6
Introduction… Introduction…
The interconnection structure must support the following types of transfers
Input/Output Module Memory to processor
Similar to memory from computer’s viewpoint The processor reads an instruction or a unit of data from memory
Processor to memory
Output The processor writes a unit of data to memory
I/O to processor
Receive data from computer The processor reads data from an I/O device via an I/O module
Send data to peripheral Processor to I/O
The processor sends data to the I/O device
Input I/O to or from memory
Receive data from peripheral An I/O module is allowed to exchange data directly with memory, without going through the processor,
using direct memory access
Send data to computer
The most common interconnection structures are
The bus and various multiple‐bus structures
E.g. PCI bus (many PCs), ISA bus (PC/AT), EISA (80386), SCSI bus (PCs and workstations), Nubus
(Macintosh), IBM PC bus (PC/XT), Universal Serial Bus (modern PCs), and FireWire (consumer electronics)
Point‐to‐point interconnection structures
E.g. PCI Express (PCIe), Quick Path Interconnect (QPI)
Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 7 Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 8
Computer Architecture and Organization (CoEng4091) - Tinbit
Bus Interconnection Structure ‐ Background Evolution of Buses
What is a BUS ? Early personal computers had a single external bus, called system bus
Connecting major components of the computer: CPU, Memory and I/O
A shared communication pathway connecting multiple (two or more)
devices
Consists of multiple lines, each line capable of transmitting signals
representing binary 1 or binary 0
Allowing parallel movement of information
Physically buses are a little more than bunches of wires
Key characteristic of a bus
As computer components (CPU, memory, I/O devices) got faster, a singe bus could
It is a shared transmission medium
no longer handle the load, hence
Multiple devices connect to the bus, and a signal transmitted by any Various types of buses have been proposed
one device is available for reception by all other devices attached to the Each having its own speed and performance characteristics
A hierarchy of buses employed to connect subset of computer components
bus Multiples buses laid out in hierarchy
Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 9 Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 10
Evolution of Buses… Evolution of Buses…
Single Bus Problems A Traditional Bus Architecture
If many number of devices are connected to a single bus, the A local bus
Directly connects CPU to cache via cache controller
computer system performance will be poor, due to The cache controller connects the cache to the system bus also
More devices means, the greater the bus length, the greater the A system bus
propagation delay Connects main memory, CPU, and some I/O
An expansion bus
Co‐ordination of bus use can adversely affect performance
Ties the system bus to I/O devices
The bus becomes a bottleneck as the aggregate data transfer
approaches the capacity of bus, solution
Increasing the data rate that the bus can carry
Using the wider bus
Most early systems use multiple buses, laid out in hierarchy, to
overcome these problem
Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 11 Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 12
Computer Architecture and Organization (CoEng4091) - Tinbit
Evolution of Buses… Evolution of Buses…
An Enhanced Traditional Bus Architecture Architecture of early Pentium Buses (Mid/late 1990s)
Incorporates a high speed bus that brings bandwidth intensive
devices into closer integration with the processor via the bridge
Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 13 Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 14
Evolution of Buses… Evolution of Buses…
Architecture of Intel PC buses(Mid 2000s) Buses of contemporary PCs (Early 2010s)
North bridge (Memory controller hub) Manufacturers continue to move bus control hardware onto the
Contains high‐bandwidth interfaces, connecting the same chip with the CPU
CPU, memory, and PCIe bus Intel included the functionality of the memory controller hub on the
Provides a fast communication pathway to the CPU same chip as the CPU in 2008, and AMD included it in 2011
through the front‐side bus
Provides a fast connection to main memory
South bridge (I/O controller hub)
Connects to slower I/O buses, like SATA, USB, and so
forth, that connects slower I/O devices to the
computer system
Contains legacy interfaces and devices:
ISA bus (audio, LAN), interrupt controller, DMA
controller, time/counter
Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 15 Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 16
Computer Architecture and Organization (CoEng4091) - Tinbit
Evolution of Buses… Evolution of Buses…
Multicore configurations, using QPI (Recent) Bus structure of Core i7 system
Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 17 Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 18
Bus Terminology Operations of the Bus
Bus Transaction If one module wishes to send data to another, it must
A sequence of bus operations that include a request and may include a
Obtain the use of the bus
response, either of which may carry data
A transaction is initiated by a single request and may take many individual bus Transfer data via the bus
operations
The complete activity of doing either of the following
Memory Read/Write, I/O Read/Write If one module wishes to request data from another, it
Bus Cycle Time must
The time between two consecutive ticks of the bus clock Obtain the use of the bus
Clock Skew
Transfer a request to the other module over the
Difference in propagation time of signals sent on parallel paths
Drift in the clock, occurs when signals on different lines travel at slightly appropriate control and address lines
different speed Wait for the second module to send the data
The longer the bus and the faster the clock speed/the bus, the more the skew
Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 19 Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 20
Computer Architecture and Organization (CoEng4091) - Tinbit
Operations of the Bus… Classification of Bus Lines Based on Functions
Devices attached to the bus are classified into Master and On any bus, the lines can be classified into three
Slave categories functional groups
Bus masters are active and initiate bus transfers Data, address, and control lines
Bus slaves are passive, wait for bus transfer requests
Memory is always a bus slave
Possible bus master and slave configuration
Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 21 Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 22
Data Bus Address Bus
The data lines provide a path for moving data among Identify the source or destination of data on the data bus
system modules If the processor wishes to read a word (8, 16, or 32…bits) of
Remember that there is no difference between “data” and data from memory, it puts the address of the desired word
“instruction” at this level on the address lines
These lines, collectively, are called the data bus
The data bus may consist of 32, 64, 128 or more separate Address bus width determines maximum memory
lines capacity of the system
The number of lines being referred to as width of the data Example Intel Processors Address Bus Width, bits Maximum Memory Capacity
bus 8080 16 64K
Width is a key determinant of performance 8086 20 1M
80286 24 16M
80386 32 4G
Pentium 4/Core 2 40 1T
Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 23 24
Computer Architecture and Organization (CoEng4091) - Tinbit
Control Bus Control Bus…
Transmit both command and timing information Lines forming the control bus can be roughly grouped into the following
Timing signals indicate the validity of data and address major categories
Bus Control, Interrupts, Bus Arbitration, Co‐processor signaling, Status,
information Miscellaneous
Command signals specify operations to be performed
Bus control
Memory write
Causes data on the bus to be written into the addressed location
Control the access to and the use of the data and address Memory read
lines Causes data from the addressed location to be placed on the bus
I/O write
As the data and address lines are shared by all components, Causes data on the bus to be output to the addressed I/O port
I/O read
there must be a means of controlling their use Causes data from the addressed I/O port to be placed on the bus
Transfer ACK
Indicates that data have been accepted from or placed on the bus
Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 25 Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 26
Control Bus… Elements of Bus Design
Interrupts Basic design elements that serve to classify and
Interrupt request
Indicates that an interrupt is pending
differentiate buses
Interrupt ACK Bus Types
Acknowledges that the pending interrupt has been recognized
Dedicated, Multiplexed
Bus arbitration
Bus request Method of Arbitration
Indicates that a module needs to gain control of the bus Centralized, Decentralized
Bus grant Timing
Indicates that a requesting module has been granted control of the bus
Synchronous, Asynchronous
Coprocessor signaling
Bus Width
Status
Address, Data
Clock
Used to synchronize operations Data Transfer Types
Miscellaneous Read, Write, Read‐modify‐write, Read‐after‐write , Block
Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 27 Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 28
Computer Architecture and Organization (CoEng4091) - Tinbit
Elements of Bus Design… Elements of Bus Design…
Bus Types Bus Types...
Dedicated Bus Multiplexed Bus
Functionally dedicated bus A bus that uses the same lines for multiple purposes at different
A bus line that is permanently assigned to one function times, using time multiplexing
E.g. separate data and address lines E.g. Address and data information may be transmitted over the
Physically dedicated bus same set of lines using Address Valid control line (8086)
Refers to use of multiple buses, each of which connects only a subset
Advantage
of modules Uses fewer lines, which saves space and usually cost
E.g. Use of an I/O bus to interconnect all I/O modules
Disadvantage
Advantage
More complex circuitry needed within each module
High throughput ‐‐‐ less contention
Reduction in performance
Disadvantage
Certain events that share the same lines cannot takes place in
Increases size and cost of the system
parallel
Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 29 Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 30
Elements of Bus Design… Elements of Bus Design…
Method of Arbitration Method of Arbitration...
Broadly classified into
In systems with more than one potential bus master device, Centralized
Decentralized
What happens if two or more devices all want to become
bus master at the same time ? Centralized
Solution ‐‐‐‐ some bus arbitration mechanism is needed to A single hardware device (bus controller/arbiter) controlling bus access (allocating time on
the bus)
prevent chaos It may be part of the processor or separate module
Potential bus master devices include Further divided into: Daisy‐Chain arbitration, Centralized parallel arbitration
CPU, I/O controllers, Coprocessors
Distributed
Bus arbitration schemes must There is no central controller
Each module contains access control logic
Provide priority to certain master devices and, at the same time
The module act together to share the bus
Make sure that low priority devices are not starved out Further divided into: Distributed arbitration using self‐selection, Distributed arbitration
using token passing, Distributed arbitration using collision detection
Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 31 Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 32
Computer Architecture and Organization (CoEng4091) - Tinbit
Elements of Bus Design… Elements of Bus Design…
Centralized Method of Arbitration Centralized Method of Arbitration...
Daisy‐Chain Arbitration Centralized Parallel Arbitration
Bus request line can be asserted by one/more devices at any time Uses multiple request/grant lines, one for each priority level
When the arbiter sees a bus request, it issues a grant by asserting Solves, daisy chained arbitration’s implicit priorities, based on
the bus grant line. This line is wired through all of the I/O devices distance from the arbiter
in series But, Grant line is daisy chained among devices of same priority
Devices are effectively assigned priorities depending on how level
close to the arbiter they are. The closest device wins
Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 33 Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 34
Elements of Bus Design… Elements of Bus Design…
Distributed Method of Arbitration Distributed Method of Arbitration...
Distributed Arbitration using Self‐Selection Distributed Arbitration using Token passing
Each device has its own request line, which is prioritized Uses only three lines, no matter how many devices are present
All devices monitor all the request lines, so at the end of each bus The BUSY line is asserted by the current bus master
cycle The arbitration line is daisy chained through all the devices,
The devices themselves determine who has highest priority and passes (grants)/denies token
who should is permitted to use the bus during the next cycle A device holding token has been given exclusive access to the bus
Requires more bus lines but avoids the potential cost of the
arbiter
Limits the number of devices to the number of request lines
Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 35 Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 36
Computer Architecture and Organization (CoEng4091) - Tinbit
Elements of Bus Design… Elements of Bus Design…
Distributed Method of Arbitration... Timing
Refers to the way in which events are coordinated on the bus
Distributed Arbitration using Collision Detection Buses use either synchronous timing or asynchronous timing
Each device is allowed to make a request for the bus
If the bus detects any collisions (multiple simultaneous requests), Synchronous Bus
Occurrence of events on the bus is determined by a master bus clock
the device must make another request
All bus activities take an integral number of bus clock cycles/Bus cycles
Much like the old Ethernet method of arbitration All devices on the bus can read clock line
All events start at the beginning of a clock cycle; Usually a single bus cycle for an
event
Drawbacks
Everything works in multiples of the bus cycles
The bus has to be geared to the slowest one and the fast ones cannot use their full potential
When heterogeneous collection of devices, some fast and some slow are located on the
bus
Difficult to take advantage of future improvements in technology
Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 37 Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 38
Elements of Bus Design… Elements of Bus Design…
Timing... Timing...
Synchronous Bus… Asynchronous Bus
A simplified timing diagram for synchronous read and write Control lines coordinate the bus operations/transaction, and a complex
handshaking protocol used to enforce timing
It does not tie everything to the clock
Each event is caused by a prior event, not by a clock pulse
The occurrence of one event on the bus follows and depends on the
occurrence of a previous events
If a particular master/slave pair is slow, there is no way a subsequent
master/slave pair, that is much faster, is affected
Scales better with technology and can support a wider variety of devices (as
protocols, not the clock is coordinating transactions)
39 Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 40
Computer Architecture and Organization (CoEng4091) - Tinbit
Elements of Bus Design… Elements of Bus Design…
Timing... Bus Width
Asynchronous Bus… Has an impact on system performance
A simplified timing diagram for asynchronous write The wider the data bus
The greater the number of bits transferred at one time
Example
Intel Processors Data Bus Width, bits
8080 8
8086 16
80286 16
80386 32
Pentium 4/Core 2 64
The wider the address bus
The greater the range of locations that can be referenced
Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 41 Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 42
Elements of Bus Design… PCI ‐ Bus
Data Transfer Types/ BUS operation types Introduced by intel in 1993 to succeed older buses, like
Write
Master to slave ISA, EISA…
Read An upgrade to the older ISA bus with higher speeds and
Slave to master
more bits transferred in parallel
Read‐modify‐write
A read followed immediately by a write to the same address Can be configured as a 32 or 64 bit bus
An indivisible operation, to prevent any access to the data element by other potential bus
masters
Successive generations operate at 33MHz, 66MHz
Read‐after‐write Superseded by PCI‐X (PCI eXtended) in 2004
An indivisible operation consisting of a write followed immediately by a read from the same PCI‐X basically doubled the bandwidth of regular PCI
address
The read operation may be performed for checking purposes
Operates at 133MHz
Block Every Intel‐based computer since the Pentium has a PCI
One address cycle is followed by n data cycles
The first data item is transferred to or from the specified address; the remaining data items bus
are transferred to or from subsequent addresses
PCI bus can be used in many configurations, (Top‐right)
43 Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 44
Computer Architecture and Organization (CoEng4091) - Tinbit
PCI ‐ Bus… PCI ‐ Bus…
PCI BUS Signals
PCI BUS Operation Divide into Mandatory (49 signals) and Optional (51 signals)
A synchronous bus, using centralized arbitration The signals can be divided into functional groups
Multiplexed Address and data lines, to keep low pin count Mandatory (Functional Groups)
Slave can insert “wait states”, when it is not ready to supply System pins
Include the clock and reset pins
the requested data, by activating appropriate control line Address and Data Pins
Include 32 lines, time multiplexed for addresses and data
Different kinds of bus cycles possible
Other lines in this group are used to interpret and validate the signal lines that carry the
Block transfers, …. addresses and data
Interface Control pins
Control the timing of transactions and provide coordination among initiators and targets
Arbitration pins
Each PCI master has its own pair of arbitration lines that connect it directly to the PCI bus
arbiter
Error reporting pins
Used to report parity and other errors
Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 45 Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 46
PCI ‐ Bus… PCI ‐ Bus…
Mandatory (Functional Groups) Optional (Functional Groups)
Interrupt pins
For PCI devices that must generate interrupt requests for service
Each PCI device has its own interrupt line or lines to an interrupt
controller
Cache support pins
Needed to support a memory on PCI that can be cached in the
processor or another device
64‐bit bus extension pins
Need to support extensions to support 64‐bit data transfer
JTAG/boundary scan pins
To support testing procedures
Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 47 Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 48
Computer Architecture and Organization (CoEng4091) - Tinbit
PCI ‐ Bus… PCI ‐ Bus…
Optional (Functional Groups)… PCI Commands
When a bus master acquires control of the bus, it determines the
type of transaction that will occur next
The C/BE lines (4 bits wide) are used to signal the transaction type,
during the address phase of the transaction
The commands are as follows
Memory Read/Read Line/Read Multiple
Memory Write/Write and Invalidate
I/O Read/Write
Interrupt Acknowledge
Special Cycle
Configuration Read/Write
Dual address Cycle
Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 49 Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 50
PCI ‐ Bus… Point‐to‐point Interconnection Structures ‐ Background
PCI Bus Transactions Bus was the dominant means of computer system component interconnection for
Example ‐ Timing diagram of a 32 bit PCI bus transactions. The first decades
For general‐purpose computers, it has gradually given way to various point‐to‐point
three cycles used for read operation, and then three cycles for a write
interconnection structures
operation
Reasons, Why BUS did not rise up to the challenge ?
Many I/O devices become increasingly too fast for PCI bus
Increasing further the bus clock frequency not a solution, due to electrical constraints
Problems with bus skew, crosstalk between the wires, and capacitance effects just
get worse
At higher and higher data rates, it becomes increasingly difficult to perform the
synchronization and arbitration functions in a timely fashion
The advent of multicore chips, with multiple processors and significant memory on a
single chip
Use of a conventional shared bus on the same chip magnified the difficulties of increasing
bus data rate and reducing bus latency to keep up with the processors
51 Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 52
Computer Architecture and Organization (CoEng4091) - Tinbit
Point‐to‐point Interconnection Structures ‐ Background Point‐to‐point Interconnection ‐ PCIe
Point to point interconnection structures uses a high speed serial connection Introduced in 2004, to replace PCI and its successor PCI‐X
As there is no clock skew, a serial connection at much higher speed offsets by big Progressively replaced the AGP (accelerated graphics port) graphics interface
margin, the loss of parallelism designed by Intel specifically for 3D graphics; PCIe generation depicted (below)
Popular Point to point interconnection structures include Represents a radical change from the PCI bus
PCIe (PCI Express), QPI (Intel’s Quick Path Interconnect) In fact, it is not even a bus at all
It is point‐to‐point network using bit‐serial lines and packet switching, more like the
Key characteristics of point‐to‐point interconnect schemes Internet than like a traditional bus
Multiple direct connections PCIe Generation Year Introduced Transfer Rate Effective one‐way data rate
Multiple components within the system enjoy direct pairwise connections to other
components 1.0a 2003 2.5GT/s 250 MB/s
This eliminates the need for arbitration found in shared transmission systems 2.0 2007 5GT/s 500 MB/s
Layered protocol architecture
3.0 2010 8GT/s 985 MB/s
The processor‐level interconnects use a layered protocol architecture, as in TCP/IP‐based
data networks, than control signals as found in shared bus arrangements 4.0 2017 16GT/s 1.969 GB/s
Packetized data transfer
5.0 2019 32GT/s 3.938 GB/s
Data sent not as a raw bit stream, rather as a sequence of packets
Each including control headers and error control codes 6.0 2022 64GT/s 7.563 GB/s
7.0 2025 (planned) 128 GT/s 15.125 GB/s
Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 53 Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 54
PCIe/PCI bridge
PCIe… PCIe Architecture
Sample architecture of a PCIe system with three PCIe ports (left) A root complex device/ a chipset/a host bridge
Connects the processor and memory subsystem to the PCI Express switch fabric comprising one or more PCIe
Typical configuration of a multi‐computer using PCIe (right) and PCIe switch devices
Acts as a buffering device
As depicted in the two figures, a PC with PCIe is a miniature packet‐
To deal with differences in data rates between I/O controllers and memory and processor components
switching network Translates between PCIe transaction formats and the processor and memory signal and control requirements
Typically support multiple PCIe ports, which can be attached to either of the following devices, that
implements PCIe: Switch, PCIe End point, PCIe/PCI bridge, Legacy endpoint
Physically its on the mother board
Switch
Manages multiple PCIe streams, coming from PCIe endpoint, Legacy endpoint, PCIe/PCI bridge
Could be connected to the root complex, or possible part of it or integrated directly into the processor
PCIe endpoint
An I/O device or controller that implements PCIe
E.g. A Gigabit ethernet switch, a graphics or video controller, disk interface, or a communications controller
PCIe/PCI bridge PCIe/PCI bridge
Allows older PCI devices to be connected to PCIe‐based system
Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 55 Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 56
Computer Architecture and Organization (CoEng4091) - Tinbit
PCIe Communications PCIe Protocol Stack…
Each I/O chip has a dedicated point‐to‐point connection to the switch
Each connection consists of a pair of unidirectional channels, one to the switch and one
The data packets generated and consumed by the
from it respective transaction layers and data link layers are
It is called a lane ‐ a bidirectional lane
DLLPs – Data Link Layer Packets
Each channel is made up of two wires
One for the signal and one for ground, to provide high noise immunity during high‐speed TLPs – Transaction Layer Packets
transmission
Devices are not limited to a single bidirectional lane to communicate with the root
complex or a switch
A device can have up to 32 lanes, which are not synchronous, so skew is not important
When the CPU wants to talk to a device, it sends a packet to the device and generally
later gets an answer. The packet goes through the root complex, which is on the
motherboard, and then on to the device, possibly through a switch (or if the device is a
PCI device, through the PCIe/PCI bridge)
This evolution of a system in which all devices listened to the same bus to one using point‐to‐
point communications parallels the development of Ethernet (a popular local area network)
Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 57 Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 58
PCIe Protocol Stack… PCIe Protocol Stack…
Physical Layer Link Layer
Deals with moving bits from a sender to receiver over a point‐ Responsible for ensuring reliable transmission and flow control across the PCIe link
to‐point connection
Has a provision for verification and retransmission of data sent over the PCIe link
Consists of the actual wires carrying the signals, as well as
Has a mechanism that controls a fast sender from overwhelming a slow receiver, with floods of
circuitry and logic to support ancillary features required in the
transmission and receipt of the 1s and 0s packets
Each PCIe port Data packets generated and consumed by the DLL are called Data Link Layer Packets (DLLPs)
Consists of a number of bidirectional lanes There are three important groups of DLLPs used in managing a link
Can provide 1, 4, 6, 16, or 32 lanes Flow control packets
PCIe relies on the receiver synchronizing with the transmitter Regulate the rate at which TLPs and DLLPs can be transmitted across a link
based on the transmitted signal (Bottom Figure (a) Transmitter Power management packets
block diagram and (b) Receiver block diagram) Used in managing power platform budgeting
As it does not use a clock to synchronize bit streams
TLP ACK and NAK packets
Used for ensuring reliable transmission
PCIe 3.0 uses the following techniques, to aid in
Used in TLP processing
synchronization
Multilane distribution (Top figure) Also adds two fields to the core of TLP created by the TL
Example: Distributing a Byte stream into a PCIe port of four lanes 16 bit sequence no & error correcting code (32 bit link layer CRC)
Scrambling
The two fields are processed at each intermediate node on the way from source to destination
128b/130b encoding
Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 59 Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 60
Computer Architecture and Organization (CoEng4091) - Tinbit
PCIe Protocol Stack… PCIe Protocol Stack…
Transaction Layer
Receives read and write requests from the Software Layer and
Software Layer
creates request packets for transmission to a destination via the link Generates read/write requests that are transported by
layer
Handles bus actions the TL to the I/O devices using a packet‐based
Data packets generated and consumed by the TL are called
Transaction Layer Packets (TLPs) transaction protocol
PCIe transactions are conveyed using TLPs Sends to the TL, the info needed to create the core of the
PCIe transactions can be of
Split transactions ‐‐‐ A request packet, which will be followed at a later TLP
time by a completion packet; i.e. a response is expected
Posted transactions ‐‐‐ Does not expect a response Header, Data and ECRC
TLPs originate at sending devices and terminates at the receiving devices
TLPs consists of the following fields
Interfaces the PCIe system to the OS, emulating PCI bus (to
Header ‐‐‐‐describes the type of packet, info need by the receiver to run existing OS unmodified on PCIe)
process the packet, including any need routing information
Data‐‐‐Up to 4096, some TLPs contain no data field
ECRC‐‐‐‐ End to end CRC field, for the destination TL
Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 61 Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 62
PCIe Protocol Stack…
Transaction Layer Packet Processing
When the TLP arrives at the device, the DLL
Strips of the seq no and CRC field
Checks the LCRC Field
If NO ERROR detected
The DLL sends an ACK packet, back to the transmitter at the other end of the link
The core portion of the TLP is handed up to the local TL
If the intermediate node is the intended final destination, the local TL processes the TLP
End of Chapter 3
If not, the TL determines a route for the TLP, passes the packet back down to the DLL for transmission over the next
link on the way to the destination
The DLL retains a copy of the TLP, which will be discarded from the buffer upon the reception of an ACK DLL packet from
the subsequent node
If ERROR detected
The DLL schedules a NAK DLL packet to return back to the remote transmitter, the TLP is eliminated
The remote transmitter upon reception of a NAK DLL for its TLP with right sequence no, it retransmits the TLP
Note:
The core fields created at the TL are only used at the destination TL
But the two fields added by the DLL to the TLP are processed at each intermediate node on the way
from source to destination
Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 63 Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 64
Computer Architecture and Organization (CoEng4091) - Tinbit
Buses – Physical Appearance Single Bus System
Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 65 Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 66
Expansion Bus – ISA Bus A System with PCI and ISA Bus
Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 67 Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 68
Computer Architecture and Organization (CoEng4091) - Tinbit
PCI Bus Based System PCI Bus Based System…
Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 69 Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 70
PCI Bus Based System… Core i7 Chip with Bus Control Hardware
Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 71 Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 72
Computer Architecture and Organization (CoEng4091) - Tinbit
Latest Intel Core i7 Based Systems Latest Intel Core i7 Based Systems.
Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 73 Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 74
Latest Intel Core i7 Based Systems. Motherboard Layout
Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 75 Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 76
Computer Architecture and Organization (CoEng4091) - Tinbit
Motherboard Expansion Slots Motherboard Layout…
Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 77 Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 78
Motherboard Layout… Various Buses Bandwidth Comparison
Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 79 Computer Architecture and Organization (CoEng4091) ‐ Tinbit A. 80
Computer Architecture and Organization (CoEng4091) - Tinbit