0% found this document useful (0 votes)
27 views10 pages

Pentium 4

The document provides a detailed comparison of various Intel processors, including the 8086, 80386, Pentium-I, Pentium-II, and Pentium-III, highlighting their features such as processor size, clock frequency, and memory capacity. It also discusses the architecture of the Pentium-4 processor, including its Net Burst Microarchitecture, front-end system, memory subsystem, execution units, and branch prediction logic. Additionally, it covers hyper-threading technology in the Pentium-4, explaining how it allows for simultaneous execution of multiple threads within a single physical processor.

Uploaded by

aryavinodk83
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views10 pages

Pentium 4

The document provides a detailed comparison of various Intel processors, including the 8086, 80386, Pentium-I, Pentium-II, and Pentium-III, highlighting their features such as processor size, clock frequency, and memory capacity. It also discusses the architecture of the Pentium-4 processor, including its Net Burst Microarchitecture, front-end system, memory subsystem, execution units, and branch prediction logic. Additionally, it covers hyper-threading technology in the Pentium-4, explaining how it allows for simultaneous execution of multiple threads within a single physical processor.

Uploaded by

aryavinodk83
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Pentium-4 Processor

Comparison of 8086, 80386, Pentium-I, Pentium-II &


Pentium-III
Feature 8086 80386 Pentium-I Pentium-II Pentium-III
Processor size 16 bit 32 bit 32 bit 32 bit 32 bit
CK frequency 5/8/10 MHz 16/20/25/33 MHz 60 to 133 MHz 233 to 450 MHz 450 to 1400 MHz
External data bus 16 bit 32 bit 64 bit 64 bit 64 bit
Address bus 20 bit 32 bit 32 bit 36 bit 36 bit
Memory address 220 bytes = 1 MB 232 bytes = 4 GB 232 bytes = 4 GB 236 bytes = 64 GB 236 bytes = 64 GB
capacity
Number of data 2 data paths (each 4 data paths (each 8 data paths (each 8 (each of 8 bit) 8 (each of 8 bit)
paths of 8 bit) of 8 bit) of 8 bit)
Number of 2 banks each of 4 banks each of 8 banks each of 8 or 9 banks each 8 or 9 banks each
memory banks 512 KB 1 GB 512 MB of 8 GB (nineth of 8 GB (nineth
bank if present is bank if present is
used for Error used for Error
Checking Code Checking Code

Processor First generation Third generation Fifth generation Sixth generation Sixth generation
generation pipelined (P1) pipelined (P3) pipelined (P5) pipelined (P6) pipelined (P6)
Microarchitecture Microarchitecture Microarchitecture Microarchitecture Microarchitecture

02/08/2025 2
Comparison of Intel’s Processors contd……
Feature 8086 80386 Pentium-I Pentium-II Pentium-III
Number of 2 stages 6 stages Dual pipeline with 14 (17 stages with 12 (15 stages with
pipeline stages 5 integer & 8 FP Load & Load &
stages Store/Retire) Store/Retire)
Operating modes Minimum & Real, Protected & Real, Protected, V- Real, Protected, V- Real, Protected, V-
Maximum mode V-86 Mode 86 & System 86 & System 86 & System
Management Management Management
Mode (SMM) Mode (SMM) Mode (SMM)

NDP for Floating Intel 8087 NDP to Intel 80287/80387 Built-in FPU (no Built-in FPU & also Built-in FPU & also
Point (FP) be connected to be connected need to connect has special has more special
operations externally externally external NDP) multimedia multimedia
instructions instructions
support & registers support & registers

L1 & L2 cache No cache memory No cache memory 16 KB Split L1 32 KB Split L1 L2 cache of 512 KB
memory of any type of any type cache & 64-512 KB cache & On the non-blocking cache
supported supported L2 cache support cartridge L2 cache or 256 KB of
up to 2 MB advanced transfer
supported cache

Processor versions Frequency versions 386DX, 386SX & Pentium processor Celeron & Xeon Coppermine,
are available 386EX Tanner & Katmai

02/08/2025 3
Pentium-4: Net Burst Microarchitecture
Pentium-4 architecture contains 4
major sections:
1. Front end module
2. Memory subsystem
3. Integer & FP Execution units
4. Out of order (OOO) execution
engine

Note: It is referred as the Net Burst


Microarchitecture as its design is
optimized for efficient use of internet
for audio as well as video streaming
02/08/2025 4
Front End system
It contains following sub-sections: • All such μops (obtained either after decoding of
1. Fetch/Decode unit simple instructions or read from Microcode
ROM for complex instructions) are subsequently
2. Execution Trace cache copied into Execution Trace Cache
3. Microcode ROM • Execution Trace cache is like L1 instruction
4. Branch Prediction logic cache (but it stores decoded μops instead of
Major functions of the Front end System are: fetched instructions from L2 cache)
• Initially, the instructions are fetched from the • Execution Trace cache can store upto 12 K μops
memory subsystem & functions in FIFO manner
• Decode unit decodes only simple instructions & • These ordered sequence of μops called as
translated into micro-operations (μops) “Traces” are supplied to execution units for
• Simple instructions are generating upto 4 μops. execution
• Complex instructions (more than 4 μops) read • Branch Prediction logic predicts the branches
the Microcode ROM to get the μops directly encountered using the similar “Dynamic Branch
instead of decoding in the decode unit (as Prediction” logic of the Pentium-I studied earlier
decoding takes more time for complex • It also uses “Static Branch Prediction”
instructions)
02/08/2025 5
Memory subsystem, Integer & FP
units
Memory subsystem: Integer & FP units:
It contains: • μops are issued to the Integer or FP
• Bus Unit – Used to connect the execution units through the 4 issue ports
microarchitecture to the external world (Port-0 through Port-3)
• External system bus – 64 bit, 100 MHz Quad • Some of the ports can issue more than one
Pumped system bus giving Bandwidth of 3.2 μops (combination of integer or FP)
GB/sec • Port-2 & port-3 are dedicated for the
3 level Cache memory of Pentium-4: Memory load & memory store only
• L1: 8 KB, 4-way set-associative, write- • Maximum 6 μops can be issued per cycle
through cache with 64 bytes/line • Each pipeline has several execution stages
• L2: 256 KB, 8-way set-associative, write back (some are shared amongst integer & FP
cache with 128 bytes/line units)
• Execution Trace cache: 12 K μops approx., • Net burst architecture has total 20 stages in
the other details are not disclosed by Intel the pipeline (Willamette Pipeline)
02/08/2025 6
“OOO” Execution engine
• Prepares the decoded μops stored • OOO engine can dispatch upto 6 μops
either in the execution trace cache or in one clock through the issue ports to
in the microcode ROM for execution the execution units
• Supplies these instructions to the • Once the μops completes its operation
Integer of FP Execution units for & write the result in the destination, it
execution is retired by ‘Retirement’ unit
• μops may be sequenced in the out of • Upto 3 μops can be retired per cycle
order manner (if required) • The ‘Reorder buffer’ which buffers the
• If certain μops is waiting for data, the completed μops, rearranges the
later μops can be taken ahead for the instructions in proper sequence &
execution purpose then manages the exceptions
• Processor uses several buffers to
smooth the flow of μops
02/08/2025 7
Branch Prediction Logic in Pentium-4
processor
• Uses both Static & Dynamic Prediction Dynamic Prediction:
Static Prediction: • Based on the “Speculative execution”
• Based on the statistical assumptions that most of • Processor performs the prefetching of instructions
the backward jumps occur in the context of from the proper address (either sequential or
repetitive loops target address) depending on the dynamic BPL,
• In such loops, most of the times the jump is before actual evaluation of the branching
condition
taken to such backward target address
• Branch History Table (BHT)- Records history of all
• Hence, Static Branch prediction always assumes
taken branches
that:
History can be: ST, SNT, WT, WNT similar
1. The backward branches are “taken” to the Pentium processor studied earlier
2. The forward branches are “not taken” • Branch Target Buffer (BTB)- Records only branch
• Static Prediction is fast & simple target address of taken branches
• Performs well in backward branches • Both BHT & BTB have 4 K entries each
• But, performs poorly in forward branches • If no entry is found in BHT, Static prediction is
degrading the performance of system used otherwise Dynamic prediction
02/08/2025 8
Instruction Translation Lookaside Buffer (I-TLB)
• I-TLB is used only when virtual memory is used by • In case of I-TLB miss, CPU performs
the processor
the complete virtual to physical
• TLB is used to speed up the translation of the
virtual address to physical address translation using Page Directory Table
• TLB is normally associative memory (Content & Page Table to locate the demanded
Addressable Memory – CAM) page (requires more time for the
• Located between CPU & main memory just like address translation)
cache memory
• TLB holds the mapping of virtual to physical
addresses referred by the processor in the form of
the Page Table
• In Instruction fetch, if there is I-TLB hit, then CPU
refers virtual to physical address translation
directly from the I-TLB entry, instead of
performing the virtual to physical address
translation for such memory reference (quick
address translation)
02/08/2025 9
Hyper-threading technology & its use in Pentium-4
• Thread: Light-weight process executed by the processor Design issues in HTT of Pentium-4 processor:
• Hyper-threaded technology (HTT): Executing multiple • Duplicating only few resources, we get 2 logical
threads simultaneously inside the CPU processors designed in the slightly more chip area
HTT in Pentium-4 processor: (5%)
• Though Pentium-4 is a single physical processor, but • Sometimes, there can be resources usage conflict
appears as two logical processors between the logical processors, hence performance
• All execution resources are shared between these 2 can’t be two-fold (25% improved)
logical processors • In case, if one logical processor gets stalled, the other
• Each logical processor maintains a complete set of logical processor should continue the execution
architecture set (GPRs, CRs, APIC registers & Machine degrading the performance partially (Logical
State Registers) processor may get stalled due to cache misses,
• Due to separate architecture set, it gives illusion of 2
handling the branch mispredictions, data dependency
processors running the application
etc.)
• Instructions are executed by both the logical processors Operating System (OS) role in HTT:
simultaneously using shared resources • Optimize the management of both logical processors
• Pentium-4 employs HTT at CK-frequency of 3.06 GHz or • Allocation of the resources to active logical processor
higher, with less than 5% in the added Silicon-chip area, only (resources to be deallocated from idle/stalled
but performance improvement of about 25% processors)

02/08/2025 10

You might also like