The Intel 8086 is a 16-bit microprocessor introduced by Intel in 1978.
Features
1. Word Length
The 8086 is a 16-bit processor, meaning its internal registers, ALU, and data bus
are all 16-bits wide.
It can process 16-bit data in one operation.
2. Address Bus
It has a 20-bit address bus, allowing access to 2²⁰ = 1 MB (1,048,576 bytes) of
physical memory.
Memory addressing is done using a segment:offset approach.
3. Data Bus
The processor has a 16-bit data bus, so it can read or write 16 bits (2 bytes) of data
in a single operation.
4. Registers
The 8086 has a rich set of registers, categorized as:
a. General Purpose Registers (GPRs) (16-bit, also accessible as 8-bit halves)
AX (16 bit) (AH (8 bit)- AL (8 bit))– Accumulator (used for arithmetic, logic
operations)
BX (16 bit) (BH (8 bit)- BL (8 bit)) – Base register (used in addressing)
CX (16 bit) (CH (8 bit)- CL (8 bit)) – Count register (used in loops and shifts)
DX (16 bit) (DH (8 bit)- DL (8 bit)) – Data register (used in I/O operations and
multiplication/division)
Each of AX, BX, CX, and DX is divided into high (H) and low (L) bytes (e.g., AH/AL).
b. Segment Registers
CS (Code Segment) – Points to the segment containing the current program code
DS (Data Segment) – Used for data variables
SS (Stack Segment) – Used for stack operations
ES (Extra Segment) – Used for string operations
c. Pointer and Index Registers
SP (Stack Pointer) – Points to the top of the stack
BP (Base Pointer) – Used to access parameters on the stack
SI (Source Index) – Used for string and array operations
DI (Destination Index) – Used with SI
d. Instruction Pointer
IP (Instruction Pointer) – Holds the offset address of the next instruction to be
executed.
e. Flag Register
8086 has a 16-bit Flag Register, with 9 active flags:
o Status Flags: CF (Carry), PF (Parity), AF (Auxiliary Carry), ZF (Zero), SF
(Sign), OF (Overflow)
o Control Flags: IF (Interrupt Enable), DF (Direction), TF (Trap)
5. Memory Segmentation
8086 uses segmented memory architecture to divide the 1MB memory into 64 KB
segments.
Memory is addressed as Segment:Offset, with physical address = (Segment × 16)
+ Offset.
6. Instruction Set
The 8086 supports a powerful set of instructions, including:
o Data transfer
o Arithmetic
o Logical
o Control transfer
o String manipulation
o Bit manipulation
o Processor control
Operating Modes
Operates in minimum mode (single processor environment) or maximum mode
(multiprocessor environment).
o Minimum mode is controlled by the 8086 itself.
o Maximum mode requires an external bus controller (Intel 8288).
8. Pipelining
It has a simple 2-stage pipeline:
o Fetch stage: Pre-fetches up to 6 bytes of instruction from memory into a
queue.
o Execute stage: Decodes and executes instructions.
This improves instruction throughput.
9. Clock Speed
Typically operates at clock speeds of 5 MHz, 8 MHz, or 10 MHz depending on the
variant.
10. Addressing Modes
The 8086 supports several addressing modes:
Immediate
Register
Direct
Register indirect
Based
Indexed
Based-indexed
Relative
11. Input/Output Operations
8086 uses IN and OUT instructions for I/O.
Supports 64K I/O ports via separate I/O address space.
12. Interrupt System
Supports 256 vectored interrupts (type 0 to type 255).
Interrupt types:
o Hardware interrupts
o Software interrupts (INT instruction)
o Internal interrupts (exceptions)
13. DMA Support
Can work with Direct Memory Access (DMA) to transfer data directly between
memory and peripherals without CPU involvement.
14. Co-processor Support
Can work with a math co-processor (8087) for floating-point arithmetic operations.
15. Bus Interface Unit (BIU) and Execution Unit (EU)
BIU: Handles fetching, reading/writing to memory and I/O, instruction queue
EU: Executes instructions, performs arithmetic/logical operations
8086 CPU
Basic Features
Feature Description
Word Size 16-bit
Address Bus 20-bit (1 MB addressable memory)
Data Bus 16-bit
Instruction Set x86 (original ISA)
Clock Speed Typically 5 MHz to 10 MHz
General Purpose Registers 8 (16-bit, also accessible as 8-bit)
Segmented Memory Yes (CS, DS, SS, ES)
Instruction Queue 6 bytes (for pipelining)
Transistor Count ~29,000
Feature Description
Fabrication Technology 3μm NMOS
Architecture / Block Diagram of 8086 CPU
The 8086 CPU is divided into two main blocks: BIU and EU
1. Execution Unit (EU)
This unit performs the actual data operations and interacts with the ALU and registers.
Components:
Arithmetic Logic Unit (ALU): Performs arithmetic and logic operations.
General Purpose Registers:
o AX (Accumulator)
o BX (Base Register)
o CX (Counter Register)
o DX (Data Register)
Segment Registers:
o CS (Code Segment)
o DS (Data Segment)
o SS (Stack Segment)
o ES (Extra Segment)
Pointer and Index Registers:
o SP (Stack Pointer)
o BP (Base Pointer)
o SI (Source Index)
o DI (Destination Index)
Flag Register (Status Flags): Indicates status of results (ZF, CF, OF, etc.).
Instruction Decoder: Decodes the instructions fetched by the BIU.
2. Bus Interface Unit (BIU)
This unit handles all data transfers between memory and the processor.
Components:
Segment Registers (shared with EU): Used to calculate physical addresses.
Instruction Queue (6 bytes): For prefetching instructions (basic pipelining).
Address Generation Circuit: Combines segment:offset to form 20-bit address.
Bus Control Logic: Manages read/write operations and I/O interactions.
Key Concept: Segmented Memory Architecture
The 8086 uses segmentation to access 1 MB of memory using 16-bit registers:
Physical Address = Segment × 16 + Offset
For example, if:
CS = 2000H
IP = 0100H
Then the physical address = 2000H × 10H + 0100H = 20000H + 0100H = 20100H
Instruction Queue (Pipelining)
The BIU prefetches up to 6 bytes of instructions and stores them in a FIFO queue.
This pipelining improves speed by overlapping instruction fetch and execution.
While EU executes one instruction, BIU fetches the next.
Register Organization in 8086
General Purpose Registers:
Register Description
AX Accumulator
Register Description
BX Base Register
CX Count Register (loops)
DX Data Register (I/O, multiplication)
Each 16-bit register can be accessed as two 8-bit registers:
AX = AH (high byte) + AL (low byte)
Similarly for BX, CX, DX
Segment Registers:
Register Description
CS Code Segment
DS Data Segment
SS Stack Segment
ES Extra Segment
Used to calculate physical addresses with an offset.
Pointer & Index Registers:
Register Use Case
SP Stack Pointer
BP Base Pointer
SI Source Index (string operations)
DI Destination Index (string ops)
Flags Register:
Flag Meaning
CF Carry Flag
Flag Meaning
ZF Zero Flag
SF Sign Flag
OF Overflow Flag
PF Parity Flag
AF Auxiliary Carry Flag
DF Direction Flag
IF Interrupt Enable Flag
TF Trap Flag (for debugging)
Modes of Operation
Minimum Mode: For single-processor systems.
Maximum Mode: For multiprocessor systems (with external 8087 or 8089).
Summary
The 8086 introduced the segmented memory model, 1 MB addressable space, and
instruction prefetching via pipelining.
It’s the first processor in Intel’s x86 family and influenced many subsequent
designs.
Although it's 16-bit internally, its 20-bit address bus allows access to 1 MB
memory, which was revolutionary at the time.
Features of 32 bit Pentium processor:
The 32-bit Pentium processor is part of Intel’s legacy x86 family of microprocessors and
was a significant step forward in performance and architecture when it was introduced in
1993.
Key Features of the 32-bit Pentium Processor
1. 32-bit Architecture
The Pentium uses a 32-bit data bus, address bus, and internal registers.
This means it can process 32 bits of data in a single instruction, allowing faster and
more powerful computing compared to earlier 16-bit processors.
2. Superscalar Architecture
It features two instruction pipelines (U and V pipe), which allow the processor to
execute two instructions per clock cycle.
This boosts performance by exploiting instruction-level parallelism.
3. Dual Integer Execution Units
Supports parallel execution of two integer instructions per clock cycle.
The U-pipe handles all instructions, and the V-pipe handles only simpler ones, but
together they improve throughput significantly.
4. Floating-Point Unit (FPU)
On-chip FPU (Math Co-Processor) for performing arithmetic operations on floating-
point numbers.
The FPU is pipelined and supports fast execution of complex math functions like
addition, multiplication, division, and square root.
5. 64-bit Data Bus
Although it's a 32-bit processor, it has a 64-bit wide external data bus, which allows
for faster memory access and increased bandwidth.
6. Separate Code and Data Caches (Split L1 Cache)
Features a dual 8 KB cache (8 KB instruction cache and 8 KB data cache).
This helps reduce the bottleneck caused by fetching instructions and data from the
same cache.
7. Pipelined Architecture
Supports five-stage instruction pipeline: Fetch, Decode 1, Decode 2, Execute, Write
Back.
Allows multiple instructions to be processed at different stages simultaneously,
enhancing performance.
8. Branch Prediction
Includes dynamic branch prediction to reduce delays caused by control instructions
like loops and conditional statements.
Improves the flow of instructions through the pipeline.
9. Instruction Set Architecture (ISA)
Based on the IA-32 architecture (x86), supporting a large variety of instructions
including:
o Arithmetic and logical instructions
o Control instructions
o String operations
o System-level instructions
10. Virtual Memory Support
Implements paging and segmentation, allowing programs to use more memory than
physically available.
Supports 4 GB addressable memory due to its 32-bit address space.
11. Multiprocessing Support
Pentium processors support Symmetric Multiprocessing (SMP) with appropriate
motherboards and OS support.
Allows use in multi-processor configurations for increased computing power.
12. Bus Interface Unit (BIU)
Interfaces with system memory, cache, and I/O using efficient control signals and
protocols.
Helps maintain synchronization between the processor and the rest of the system.
13. Clock Speed and Performance
Original Pentium processors had clock speeds ranging from 60 MHz to 300 MHz.
Supported high-speed computing for desktop, server, and embedded applications of
the time.
14. Fabrication Technology
Built using CMOS technology, with transistor counts ranging from 3.1 to 4.5 million.
Helped improve power efficiency and reduce heat generation.
Summary Table
Feature Description
Word Size 32-bit
Data Bus Width 64-bit
Address Bus Width 32-bit (4 GB memory access)
Clock Speed 60 MHz – 300 MHz
Instruction Pipelines Dual (U and V)
FPU Integrated, pipelined
Cache 8 KB instruction + 8 KB data
Branch Prediction Dynamic branch prediction
ISA IA-32 (x86)
Virtual Memory Supports segmentation and paging
SMP Support Yes (with appropriate motherboard/OS)
Fabrication Technology CMOS, 0.8μm – 0.35μm
Transistor Count 3.1 million to 4.5 million
The Pentium Superscalar Architecture is a significant development in Intel's x86
microprocessor family. Introduced with the original Pentium processor (P5) in 1993, it
marked a major shift by introducing superscalar execution, which means the CPU can
execute more than one instruction per clock cycle. Below is a detailed explanation of the
Pentium Superscalar Architecture:
What is Superscalar Architecture?
A superscalar architecture allows multiple instructions to be fetched, decoded, and
executed simultaneously using multiple execution units. This contrasts with scalar
processors that execute only one instruction per clock cycle.
🔹 Pentium Superscalar Architecture: Key Features
1. Dual Integer Pipelines (U and V pipelines)
The original Pentium processor has two integer pipelines:
o U-pipe (Primary): Can execute most integer instructions.
o V-pipe (Secondary): Can execute simpler instructions in parallel with the U-
pipe.
This enables the parallel execution of two instructions per clock cycle, under certain
conditions.
2. Pipelined Floating Point Unit (FPU)
The Pentium includes a fully pipelined FPU, allowing overlapping execution of
floating-point instructions.
It improves performance for applications involving a lot of math computations.
3. Instruction Fetch and Decode
32-byte instruction prefetch buffer: Enhances instruction throughput.
Dual instruction decoders: One full decoder for complex instructions and one
simpler decoder for basic ones.
If both instructions are simple, they can be dispatched to U and V pipelines
simultaneously.
4. Branch Prediction
Implements dynamic branch prediction using a Branch Target Buffer (BTB).
Reduces the number of pipeline stalls due to control hazards.
5. Separate Code and Data Caches
Harvard-style cache architecture:
o 8 KB code cache
o 8 KB data cache
Speeds up memory access by allowing simultaneous code and data access.
6. 64-bit Data Bus and 32-bit Address Bus
Wider data bus improves memory transfer speed.
Maintains compatibility with 32-bit address space.
7. Instruction-Level Parallelism (ILP)
The superscalar architecture extracts ILP by scheduling multiple independent
instructions to run in parallel.
🔹 Internal Block Diagram Overview
🔹 Pipeline Stages (Simplified)
Each pipeline (U and V) includes several stages:
1. Fetch – Get instruction from memory
2. Decode – Determine instruction type and operands
3. Issue/Dispatch – Send to available execution unit
4. Execute – Perform the actual computation
5. Writeback – Store the result back
🔹 Limitations
Limited parallelism: Only two pipelines.
Out-of-order execution not supported: Pentium executes instructions in-order,
unlike later processors like the Pentium Pro.
Only simple instructions can be paired in U and V pipelines.
Dependency hazards may prevent dual issue.
🔹 Performance Enhancement Techniques Used
Instruction pairing rules: The compiler or processor must ensure instructions in U
and V pipes don't conflict.
Cache prefetching and branch prediction help minimize stalls.
Efficient pipeline design reduces instruction latency.
🔹 Applications and Impact
Popular in desktop computers and early multimedia systems.
Set the stage for more advanced CPUs like Pentium Pro, Pentium II, etc., which
introduced out-of-order execution and speculative execution.
The 8086 microprocessor is a 16-bit processor developed by Intel, and its Programmer’s
Model represents the internal architecture visible to a programmer. This model includes
registers, segments, flags, and pointers that can be manipulated directly through
instructions.
8086 Programmer’s Model: Overview
The 8086 Programmer’s Model consists of:
1. General Purpose Registers (GPRs)
2. Segment Registers
3. Pointer and Index Registers
4. Instruction Pointer
5. Flag Register
6. Data and Address Buses (not directly part of the programmer's model but useful for
understanding)
1. General Purpose Registers (16-bit)
These are used for arithmetic, logic, and data transfer operations. They can be accessed as full
16-bit registers or split into 8-bit high/low parts.
Register Purpose 8-bit Parts
AX Accumulator (mainly for I/O and arithmetic) AH (high), AL (low)
BX Base register (used in addressing) BH, BL
CX Count register (used in loops and shifts) CH, CL
DX Data register (used in multiplication, I/O) DH, DL
2. Segment Registers (16-bit)
Segment registers are used to point to memory segments. Each segment can be 64KB.
Register Purpose
CS Code Segment (contains the program code)
DS Data Segment (holds data used by the program)
SS Stack Segment (used for the stack)
ES Extra Segment (used for string operations)
🔹 These segment registers help in forming the 20-bit physical address using:
Physical Address = Segment × 10h + Offset
3. Pointer and Index Registers
Register Purpose
SP (Stack Pointer) Points to top of the stack in the stack segment
BP (Base Pointer) Used to access parameters passed via stack
SI (Source Index) Used in string operations as source
DI (Destination Index) Used in string operations as destination
IP (Instruction Pointer) Points to the next instruction to be executed in CS
4. Flags Register (Status and Control Flags)
The FLAGS register is a 16-bit register where individual bits represent the status of the
processor or control features.
Status Flags:
Flag Bit Purpose
CF 0 Carry Flag: Set if there's a carry or borrow
PF 2 Parity Flag: Set if the result has even parity
AF 4 Auxiliary Carry Flag: Used in BCD operations
ZF 6 Zero Flag: Set if result is zero
SF 7 Sign Flag: Set if result is negative
OF 11 Overflow Flag: Set if result overflows a signed value
Control Flags:
Flag Bit Purpose
TF 8 Trap Flag: Used for single-step debugging
IF 9 Interrupt Enable Flag: Enables or disables interrupts
DF 10 Direction Flag: Controls string processing direction
5. Logical vs Physical Address
The 8086 uses segmented memory. A 16-bit segment register combined with a 16-bit offset
forms a 20-bit physical address:
Physical Address = (Segment Register × 16) + Offset
Example:
CS = 2000H
IP = 0010H
Physical Address = 20000H + 0010H = 20010H
Summary Table
Category Registers
General Purpose AX, BX, CX, DX
Segment Registers CS, DS, SS, ES
Pointer/Index SP, BP, SI, DI, IP
Flags Register CF, PF, AF, ZF, SF, OF, TF, IF, DF
Comparison between various registers of 8086 microprocessor
In the 8086 microprocessor, registers are categorized into several types based on their
functionality. A key distinction exists between segment registers and general-purpose or
other registers.
Segment Registers vs. Other Registers in 8086
General-
Feature / Segment Index/Pointer
Purpose Flag Register
Type Registers Registers
Registers
Used for
arithmetic, Used for memory Holds
Used for memory
Purpose logic, and data addressing and status/control
segmentation
storage stack operations flags
operations
CS (Code SI (Source Index),
AX, BX, CX,
Segment) DS DI (Destination
DX (can be split
Registers (Data Segment) SS Index), SP (Stack Flags like Zero,
into 8-bit
Included (Stack Segment) Pointer), BP (Base Carry, Sign, etc.
registers AH/AL,
ES (Extra Pointer), IP
BH/BL...)
Segment) (Instruction Pointer)
16-bit (can be 16-bit (only
Size 16-bit used as 8-bit 16-bit lower 9 bits
parts too) used)
Primarily
Accessed Used for string and Accessed via
implicitly by CPU
Accessed By explicitly in memory access instructions like
during memory
instructions operations PUSHF, POPF
access
Modified
Modified using
automatically or
Modifiability MOV or segment Freely modified Freely modified
using
override prefixes
instructions
Define the Used to
Hold
Role in starting base of a Used in calculating determine
data/operands for
Addressing 64KB memory effective addresses outcomes of
operations
segment operations
Segmentation: CS Arithmetic
Indirect addressing, Conditional
for code, DS for operations, data
Typical Use stack operations, branching,
data, SS for stack, storage and
memory indexing interrupts
ES for extra data movement
Segment Registers define the memory segment boundaries.
General-purpose registers are used for computation and data handling.
Index/pointer registers are essential for advanced memory access and stack
control.
The flag register maintains processor status.
Pentium Branch Prediction Logic
The Pentium processor (introduced by Intel in 1993) improved instruction-level parallelism
through superscalar architecture and included dynamic branch prediction to minimize
pipeline stalls due to branches.
Why Branch Prediction?
In pipelined processors like the Pentium, encountering a branch (e.g., an if, loop, or goto)
introduces uncertainty. If the CPU guesses wrong, it must flush the pipeline, which is costly
in terms of performance.
Branch Prediction in Pentium
The Pentium processor uses a dynamic branch prediction mechanism that includes:
1. Branch Target Buffer (BTB)
A cache-like structure that stores the addresses of previously encountered branches
and their target addresses.
When a branch instruction is fetched, the BTB predicts:
o Whether the branch will be taken, and
o Where it will go (target address).
This allows the Pentium to speculatively fetch the next instruction from the predicted
path.
2. 2-bit Saturating Counters (History-Based)
Each branch entry in the BTB is associated with a 2-bit counter.
This counter helps in making a more stable prediction by tracking the branch’s
recent behavior.
Counter States:
State Prediction Transition on Outcome
00 Strongly Not Taken Stay or move to 01 on "taken"
01 Weakly Not Taken Move to 00 or 10
10 Weakly Taken Move to 01 or 11
11 Strongly Taken Stay or move to 10 on "not taken"
The predictor needs two wrong outcomes in a row to switch its prediction, making it more
stable.
Steps in Prediction
1. Fetch stage queries BTB.
2. If BTB hit → Prediction is made (taken/not taken) + target address.
3. If prediction is wrong, CPU flushes pipeline and fetches from the correct path.
4. The predictor updates its BTB and counter based on the actual branch outcome.
Accuracy & Impact
Pentium’s branch prediction was significantly more accurate than earlier static
approaches.
This improved pipeline efficiency, especially in loops and conditional code.
Helped Pentium execute multiple instructions per cycle by reducing branch
misprediction penalties.
Summary
Pentium uses dynamic branch prediction with a BTB and 2-bit saturating
counters.
This mechanism improves performance by predicting branches and minimizing stalls.
Accurate branch prediction is essential for exploiting instruction-level
parallelism.
Pentium: Virtual Memory (Segmented & Demand Page)
Here's a detailed explanation of Pentium Virtual Memory, which combines segmentation
and demand paging — a legacy of the x86 architecture and critical to its memory
management system.
Pentium Virtual Memory Overview
The Pentium processor supports a virtual memory system that allows programs to use more
memory than physically available and provides protection, isolation, and multitasking.
It does this through two memory management mechanisms:
1. Segmentation (inherited from 8086)
2. Paging (introduced with 80386, used in all 32-bit Pentium systems)
In protected mode, both are combined, though modern OSs typically minimize use of
segmentation.
1. Segmentation in Pentium
Key Concepts:
Memory is divided into segments, each defined by a segment descriptor.
Logical addresses are of the form:
Logical Address = Selector:Offset
Address Translation:
1. Selector (in a segment register like CS, DS, SS) points to a descriptor in the
GDT/LDT.
2. Descriptor gives:
o Base address of the segment
o Segment limit (size)
o Access rights & privilege levels
3. Offset is added to base address to form a Linear Address.
Segment Registers:
CS: Code Segment
DS: Data Segment
SS: Stack Segment
ES, FS, GS: Extra segments
Descriptor Tables:
GDT (Global Descriptor Table) – System-wide
LDT (Local Descriptor Table) – Per-process
🧠 Though segmentation is powerful, modern OSes typically set base to 0 and limit to max to
simplify memory management and use flat memory model (effectively disabling
segmentation).
2. Paging in Pentium (Demand Paging)
Paging translates linear addresses (from segmentation) into physical addresses using a 2-
level page table system.
Key Features:
Page size: 4 KB
Page directory and page tables used
On-demand loading (demand paging) – pages loaded into RAM only when needed
Address Translation (Linear → Physical):
Demand Paging Workflow:
1. Process tries to access a page not in RAM → Page Fault
2. OS:
o Pauses process
o Locates the page on disk (swap/file)
o Loads it into RAM
o Updates page table
o Restarts the process
Page Table Entry (PTE) Fields:
Present bit (0 = page not in RAM)
Read/Write, User/Supervisor bits
Accessed, Dirty bits
Physical Page Frame Address
Protection & Privilege
Pentium supports 4 privilege levels (rings):
Ring 0: Kernel
Ring 3: User
Segments and pages both can be used to enforce protection.
Paging and segmentation together allow fine-grained control over memory access and
protection
Feature Segmentation Paging
Logical memory separation &
Purpose Efficient memory allocation, VM support
protection
Address
Selector + Offset → Linear Address Linear Address → Physical Address
Translation
Address Type Logical Linear & Physical
Hardware Used Segment registers, GDT/LDT Page Directory, Page Tables
Size Granularity Variable segment size Fixed-size 4 KB pages
High (base, limit, privilege per
Flexibility Simpler for OS memory management
segment)
Actively used (especially with demand
Typical Use in OS Often minimized (flat model)
paging)