Pipelined MIPS 32-Bit Processor Implementation
Pipelined MIPS 32-Bit Processor Implementation
Faculty of Engineering
Electronics & Electrical
Communications Department
Project of:
1
Team Members :-
• Abdelrahman Abdelnasser Abdelrahman
• Yusuf Mohamed Saleh Boriek
• Raghad Soliman Mohamed
Electrical Communications.
2
Table of Content
3
Part 2: Pipeline Processor.................................................. 76
Introduction ................................................................... 77
3. Pipeline Architecture.................................................... 78
3.1. Instruction Fetching (IF) .......................................... 79
3.2. Instruction Decoding (ID) ....................................... 82
3.3. Execution Stage (EX) ............................................... 87
3.4. MEM/WB Stage ...................................................... 92
3.5. Forward & Stall ....................................................... 96
3.6. PC Control Unit ....................................................... 98
4
Phase 1: Single-Cycle
Processor
5
Introduction
• Overview
In the field of computer architecture, the design and implementation of a
MIPS (Microprocessor without Interlocked Pipeline Stages) processor serves
as a foundational milestone for understanding modern processor behavior.
This project focuses on the development of a Single-Cycle MIPS Processor,
designed and simulated using Logisim.
The primary objective of the project was to construct a fully functional 32-bit
single-cycle RISC processor capable of executing a predefined instruction set
in a single clock cycle. The processor includes a complete datapath and
control unit implementation that supports arithmetic, logical, memory access,
and control flow instructions.
6
The project not only enhanced our understanding of CPU internals and digital
design principles but also fostered collaboration, problem-solving, and critical
thinking—skills essential for future endeavors in computer engineering and
embedded systems.
The ISA includes a total of 31 general-purpose 32-bit registers (R1 to R31), while
R0 is hardwired to zero and cannot be modified. The processor also supports a 20-
bit Program Counter (PC), enabling it to address up to 2202^{20}220 instructions.
Each instruction is 32 bits wide and word-aligned in memory. Immediate values are
either sign-extended or zero-extended based on the instruction type. This design
ensures simplicity, consistency, and efficient execution of instructions within a
single clock cycle.
7
• I-Type: Designed for operations involving immediate values, memory access
(e.g., LW, SW), and branch instructions. This format includes a source
register, a destination register, and a 16-bit immediate value.
• SB-Type: Used specifically for branch instructions like BEQ, BNE, BLT, and
BGE, which rely on two registers and a split immediate offset for calculating
the branch target.
R-type format
6-bit opcode (Op), 5-bit destination register number d, and two 5-bit source
registers numbers S1 & S2 and 11-bit function field F.
I-type format
6-bit opcode (Op), 5-bit destination register number d, and 5-bit source
registers number S1 and16-bit immediate (Imm16)
Imm16 S15 d5 Op6
SB-type format
6-bit opcode (Op), 5-bit register numbers (S1, and S2) and 16-bit immediate
split into ({ImmU (11-bit) and ImmL(5-bit)})
ImmU11 S25 S15 ImmL16 Op6
8
Register Use
The processor features 32 general-purpose 32-bit registers, labeled from R0 to
R31, providing fast-access storage for instruction execution. These registers
serve as operands and result holders for various instructions executed by the
processor.
General-Purpose Registers
• R1 to R31: Usable for all instruction operations including arithmetic, logic,
memory access, and control flow.
• R0: Hardwired to zero. It always reads as 0, and any write operation to this
register is ignored. This behavior simplifies instruction design and enables
common operations like MOV or CLEAR.
Special Register Use Cases
• R30 (Stack Pointer - optional): In programs involving procedures or nested
function calls, register R30 can be used as a stack pointer, pointing to the top
of the stack in memory.
• R31 (Return Address - optional): For control flow instructions like JALR
(Jump and Link Register), R31 can be used to temporarily store the return
address (i.e., PC + 1).
Usage Conventions
Although all registers (except R0) are general-purpose, specific conventions can
be followed to organize code and debugging:
• Temporary registers (e.g., R1–R10)
• Saved registers (e.g., R11–R20)
• Argument passing (e.g., R21–R24)
• Return values (e.g., R25–R26)
These conventions are not enforced by hardware but are helpful for code
structure, clarity, and maintainability in complex programs.
Instruction Description
The instruction set for the Single-Cycle MIPS Processor is designed to cover a wide
range of operations using a simplified and uniform encoding. Each instruction is 32
bits wide and categorized under three main formats: R-type, I-type, and SB-type.
Below is an overview of the instructions and their functionality:
9
R-Type Instructions
R-Type instructions are used for arithmetic and logical operations that involve
two source registers and produce a result in a destination register. These
instructions do not involve immediate values or memory addresses.
Format:
• Opcode (6 bits): Specifies the instruction category (typically 000000 for R-
type).
• RS1 (5 bits): First source register.
• Unused (function) bits (11 bits): Determine the specific operation (e.g.,
ADD, SUB, AND).
Example Instructions:
• ADD R1, R2, R3 → Adds contents of R2 and R3, stores the result in R1.
• SLT R5, R6, R7 → Sets R5 to 1 if R6 is less than R7, else sets R5 to 0.
10
• ANDI R6, R6, 0xFF → Performs bitwise AND between R6 and the constant
0xFF.
Immediate Handling:
• Sign-extended for arithmetic/branching.
The two immediate parts (ImmU, ImmL) are concatenated and sign-extended to
calculate the offset for branch instructions or the effective memory address for
SW.
Example Instructions:
• BEQ R1, R2, Label → Branches to Label if the contents of R1 and R2 are
equal.
• SW R3, offset(R4) → Stores the value in R3 to memory at the address
calculated by R4 + offset.
11
Processor Architecture
The architecture of the single-cycle MIPS processor is designed to execute
every instruction within a single clock cycle. This design emphasizes simplicity
and clarity, making it highly suitable for educational and foundational
processor development. The processor is composed of several interconnected
components, each responsible for a distinct part of instruction execution.
Core Components
The processor is built around the following fundamental components:
• Register File: A set of 32 general-purpose 32-bit registers, including a
hardwired zero register (R0). It provides two read ports and one write
port, enabling simultaneous access to operands and storing results
efficiently.
• Arithmetic Logic Unit (ALU): A 32-bit combinational unit that performs
all arithmetic and logical operations required by the instruction set, such
as addition, subtraction, AND, OR, comparisons, and shifts.
• Instruction Memory: A word-addressable, read-only memory that stores
the binary instructions to be executed. Instructions are fetched based on
the current value of the Program Counter (PC).
• Data Memory: A word-addressable memory block used to read from and
write to data values during program execution. Access is performed using
the LW and SW instructions.
• Program Counter (PC): A 20-bit register that holds the address of the
current instruction. It increments by 1 on each cycle unless modified by a
branch or jump instruction.
• Control Unit: Decodes the instruction and generates control signals that
coordinate the behavior of all other components. It includes sub-units:
o Main Control Unit – generates signals for register selection,
memory access, and ALU input control.
o ALU Control Unit – determines the specific ALU operation based on
the instruction opcode and function field.
o PC-control Unit– manages branch and JALR decisions using
comparison outputs from the ALU (such as Zero or Set flags) to Set
the next PC.
12
Datapath Design
All components are interconnected through a carefully constructed
datapath that includes:
• Multiplexers: Used for selecting between multiple sources of data based
on control signals.
• Sign and Zero Extenders: Extend 16-bit immediate values to 32 bits for
ALU operations or address calculation.
• Splitters: Extract specific fields (like opcode, register addresses,
immediate values) from the 32-bit instruction.
• Pipeline-unaware sequential flow: Since this is a single-cycle
processor, each instruction completes all stages of execution (fetch,
decode, execute, memory, and write-back) in one clock cycle without
overlapping other instructions.
Instruction Flow
1. Fetch: The instruction is fetched from the instruction memory using the
address in the PC.
2. Decode: The instruction is decoded to determine the required operation,
source registers, and control signals.
3. Execute: The ALU performs the arithmetic or logical operation, or
computes the effective address.
4. Memory Access: For load or store instructions, the data memory is
accessed.
5. Write-Back: The result of the operation or memory read is written back
into the register file.
13
Single-Cycle Architecture
• Register File
Register File
Overview
The Register File is a
central component in the
processor architecture,
responsible for providing
fast, temporary storage and
retrieval of data required during instruction execution. It consists of 32
general-purpose 32-bit registers labeled from R0 to R31, with R0 being
hardwired to zero.
This Register File is designed to support:
• Two read operations
• One write operation per clock cycle, all managed through dedicated
input and control signals.
Inputs
• S1: Selects the first source register (RS1).
• S2: Selects the second source register (RS2).
• RD: Selects the destination register to be written to.
• BusD: Carries the 32-bit data to be written into the register specified by
RD.
• Select_RD_SSET: A control signal used when writing to the destination
register in special instructions like SSET.
• Select_RS2_Shift: A control signal used for shift operations to extract
specific bits from RS2.
• RegWrite: Enables the write operation. When high and triggered by the
clock, the data on BusD is written into the selected destination register.
14
• CLK: The system clock signal that synchronizes the write operation.
Outputs
• R0 to R31: These represent the values of all 32 registers, typically used
for display, debugging, or testing purposes in the Logisim simulation.
• RS2: The full 32-bit value read from the register selected by S2.
• RS2(0–4): The lower 5 bits of RS2, often used in shift instructions where
only a portion of the register value is needed.
• RD(0–15): The lower 16 bits of the register selected by RD, which can be
used in immediate-based operations or reduced-width ALU operations.
Design Considerations
• The use of multiplexers enables dynamic selection of source registers.
• The write mechanism is controlled by the RegWrite signal and
synchronized with the rising edge of the clock.
• The presence of control signals like Select_RD_SSET and Select_RS2_Shift
allows the register file to support extended instruction functionalities
such as loading partial values or controlling shift operations.
• R0 is fixed to zero and cannot be written to, simplifying instruction logic
such as clearing a register or implementing certain arithmetic
comparisons.
15
Register File Structure
16
17
• Arithmetic and Logic Unit ( ALU )
Overview
The Arithmetic Logic Unit (ALU) is a
critical component of a computer's
central processing unit (CPU), serving
as the computational core responsible
for executing a wide range of
operations. These operations include
arithmetic calculations (e.g., addition
and subtraction), logical operations
(e.g., AND, OR), shift operations (e.g.,
left shift, right shift), and comparisons
(e.g., equality, greater than). The ALU
is a combinational logic circuit that
processes data inputs based on
control signals, producing results that
are either stored in registers or used
to guide the processor's control flow,
such as in conditional branching
ALU Structure
1. Arithmetic Unit:
• Purpose: Performs arithmetic operations like addition and
subtraction.
• Inputs:
-Data inputs: RS0 and RS1 (register sources).
-Control signals: Signal of ADD, Signal of SUB, and Overflow.
• Outputs:
18
-Value: The result of the arithmetic operation.
-signal_select_arith_op: A control signal for the MUX.
• Functionality: The Arithmetic Unit executes operations based on
control signals. For example, an active Signal of ADD triggers the
addition of RS0 and RS1, while Signal of SUB results in subtracting RS1
from RS0. The Overflow signal indicates if the result exceeds the
representable range.
MUL: we used block (31bit*31bit) and generate sign bit by XOR gate
Overflow:
-Add: when inputs the same sign and outputs the opposite sign and vice versa
-SUB: when the output sign is not logical
19
2. Logical Unit:
• Purpose: Executes logical operations such as AND, OR, and XOR.
• Inputs:
-Data inputs: RS0 and RS1.
-Control signals: Implied signals like Signal of AND (e.g., via
Lo_op).
• Outputs:
-Value: The result of the logical operation.
-signal_select_logic_op: A control signal for the MUX.
• Functionality: This unit performs bitwise logical operations on RS0
and RS1, with the specific operation determined by control signals.
20
3. Shifter:
• Purpose: Handles shift operations, such as left shift or right shift.
• Inputs:
-Data inputs: RS0 and RS1.
-Shift amount: SAR0 and SAR1 (shift amount registers).
• Outputs:
-Value: The result of the shift operation.
-signal_select_shift_op: A control signal for the MUX.
• Functionality: The Shifter adjusts the bit positions of the input data by
an amount specified by SAR0 and SAR1, with the type of shift (e.g.,
logical or arithmetic) dictated by signals like sh-op.
21
4. Comparator:
• Purpose: Performs comparison operations to evaluate relationships
between inputs.
• Inputs:
-Data inputs: RS0 and RS1.
• Outputs:
-Comparison flags: Smaller, Smaller_Unsigned,
Equal_NotEqual, Greater_Equal, Greater_Equal_Unsigned, and
RD0_or_1.
-signal_select_comp_op: A control signal for the MUX.
• Functionality: The Comparator generates flags indicating whether RS0 is
less than, equal to, or greater than RS1, for both signed and unsigned
interpretations. These flags, such as BLT (Branch if Less Than) or
BEQ_BNE (Branch if Equal or Not Equal), are vital for conditional
branching.
22
Multiplexer (MUX):
• Purpose: Selects the final ALU output from the various units.
• Inputs: Value outputs from the Arithmetic Unit, Logical Unit, Shifter, and
Comparator.
• Control Signal: select_operation, derived from unit-specific control
signals (e.g., signal_select_arith_op).
• Output: Alu_res, the final ALU result.
The MUX, labeled ALU_result, ensures that the appropriate unit’s output is
selected as the ALU’s result based on the operation being performed.
Data Flow:
The ALU’s data flow is primarily directional:
• Inputs: Enter from the left as RS0 and RS1 (data) and various control
signals (e.g., signal_ADD, ALU_op).
• Processing: Each unit processes the inputs according to the active
control signals.
• Output: The MUX combines the results, outputting Alu_res on the right,
23
while comparison flags are generated separately for control flow
decisions.
Control Signals:
The ALU’s operation is governed by control signals from the CPU’s
control unit, including:
• Operation Selectors: Ar_op (arithmetic), Lo_op (logical), sh-op (shift),
com-op (comparison).
• Specific Signals: signal_ADD, signal_sub, etc.
• Primary Control: ALU_op, which orchestrates the overall operation.
These signals configure the ALU for the required task, ensuring precise
execution of instructions.
24
25
• Instruction Memory
• Overview
The Instruction Memory is a vital component of the single-cycle processor
that stores the program instructions and provides the necessary fields to the
processor during execution. It is addressed using the Program Counter (PC)
and outputs the decoded fields needed to control the datapath and execute
operations properly.
Instruction Memory is structured to deliver all essential parts of an instruction
simultaneously, ensuring that each cycle can execute one instruction
efficiently.
Inputs
• PC (Program Counter):
A 32-bit address input that specifies which instruction to fetch from
memory. The PC is updated every clock cycle to point to the next
instruction unless modified by jump, branch, or other control signals.
• pcc:
Additional input (likely representing a pipelined or next cycle PC value)
feeding the memory module.
26
Outputs
• Opcode:
A 6-bit field representing the type of instruction to execute (e.g.,
arithmetic, load/store, branch).
• RD:
A 5-bit destination register field, specifying the register that will be
written after instruction execution.
• S1:
A 5-bit source register field for the first operand.
• S2:
A 5-bit source register field for the second operand.
• d_ImmL:
Immediate data loaded from the instruction, representing lower bits used
in immediate-based operations.
• F_ImmU:
An unsigned immediate field for specific types of operations needing
zero-extension.
• Imm16:
A 16-bit immediate value, often used in load/store operations, branches,
and some arithmetic instructions.
Instruction Format
Depending on the type of instruction (R-type, I-type, or J-type), different
combinations of these fields are used.
For example:
• R-Type instructions use Opcode, S1, S2, and RD.
• I-Type instructions use Opcode, S1, RD, and Immediate fields like Imm16.
• Specialized fields like F_ImmU and d_ImmL are used for fine-grained
immediate control or extended operations.
Design Considerations
• Parallel Output:
All necessary fields are decoded and made available in parallel to
minimize latency and streamline datapath control.
• Flexibility:
Fields like Imm16, d_ImmL, and F_ImmU provide flexibility for
27
supporting multiple instruction types without complicating control logic.
• Alignment:
The PC input is word-aligned, ensuring correct instruction fetching and
avoiding misaligned memory accesses.
28
• Data Memory
Overview
The Data Memory module is responsible for storing and retrieving data
during program execution. It interacts with load (LW) and store (SW)
instructions, enabling the processor to work with external data beyond the
register file.
Inputs
• Address:
A 32-bit address input that specifies the memory location for the read or
write operation. The address is typically calculated by the ALU based on
the base register and offset.
• RS2 (Data_in):
29
The 32-bit data input that carries the value to be written into memory
during a store (SW) operation.
• MemRead:
A control signal that enables a memory read operation. When MemRead
is asserted, the data at the specified Address is read and output to the
processor.
• MemWrite:
A control signal that enables a memory write operation. When MemWrite
is asserted, the data on RS2 (Data_in) is written into the memory at the
specified Address.
• CLK:
The system clock that synchronizes memory write operations. Writes
occur on the rising edge of the clock to ensure timing correctness.
Outputs
• Data_out:
A 32-bit output that provides the data read from the memory when a load
(LW) instruction is executed.
Functional Behavior
Design Considerations
• Single-Ported Memory:
This design uses a single port for both read and write operations,
controlled via MemRead and MemWrite signals.
• Synchronization:
• Word Alignment:
Addresses are expected to be word-aligned to ensure correct access to
full 32-bit words.
31
32
• Program Counter
PC Overview
• Conditional branching,
33
Inputs
• ImmU_F:
Upper part of the immediate value used in branch address calculations.
• ImmU:
Full unsigned immediate value used for calculating branch targets.
• RD:
Destination register address, primarily used during jump and link
operations (JALR).
• ImmL:
Lower part of the immediate value for building complete branch offsets.
• CLK:
Clock signal controlling PC updates. The PC changes at the rising edge of
the clock.
• Branch:
A control signal that, when asserted, enables branching to a new address
calculated using immediate values.
• JALR:
A control signal for performing a jump and link to a register address plus
an immediate offset.
• RS1 Signal:
Carries the value of the RS1 register, necessary for dynamic branch or
jump address calculations.
• RS1_value:
The 32-bit value read from the RS1 register, used specifically for JALR
jumps.
• Imm16:
A 16-bit immediate value that is used in address calculations during
branching or jumping.
Outputs
• Address:
The 32-bit next instruction address output, used to fetch the next
instruction.
• PC+1:
The incremented program counter value, typically used in normal
sequential execution when no branch or jump is taken.
34
• pcc:
Represents the current program counter value (before incrementing or
jumping), often used for monitoring or pipeline forwarding.
• pc-1:
Represents the decremented program counter value if needed (used
mainly in error checking or specialized addressing).
35
The PC Datapath integrates seamlessly into the overall processor datapath,
providing the address for instruction fetch and supporting conditional control
flow.
Normal Instruction Flow
1. The PC holds the address of the current instruction.
37
• Data Path
38
2. Instruction Decode: Interpreting the instruction and preparing
operands.
3. Execution: Performing the required computation or operation.
4. Memory Access: Reading from or writing to memory if needed.
5. Write-Back: Storing the result back into the register file.
These stages are executed in a synchronized manner, typically governed by a
clock signal (CLK), with control signals directing the flow of data through the
data path.
• Inputs:
40
• Operand A: From RS1.
• Operand B: From RS2 or an immediate value (via MUX).
• Outputs:
• "ALU Result": The computed result.
• Flags: "Overflow," "Zero," etc., for control flow decisions.
• Control Signals: "ALUop" from the ALU Control unit dictates the
operation type.
5. ALU Control Unit:
• Function: Decodes the instruction’s opcode and function codes to
generate ALU control signals.
• Inputs: Opcode and function fields from the instruction.
• Outputs: Signals like "Signal of ADD," "Signal of SUB," and
"signal_select_arith_op" to control the ALU.
• Role: Ensures the ALU performs the correct operation for the
instruction.
6. Data Memory:
• Function: Manages load and store operations for data access.
• Inputs:
• Address: From the ALU result (e.g., for address calculations).
• Write Data: From the Register File (for stores).
• Outputs: Read Data to the Register File (for loads).
• Control Signals:
-"MemRead": Enables reading from memory.
-"MemWrite": Enables writing to memory.
41
• Role: Facilitates data storage and retrieval during memory-related
instructions.
7. Multiplexers (MUX):
• Function: Direct data flow between multiple sources based on control
signals.
• Key Instances:
-ALU Input MUX: Selects between RS2 and an
immediate value for the ALU’s second operand
("ALUSrc").
-Write-Back MUX: Chooses between the ALU result
and Data Memory output for writing to the Register
File.
-PC MUX: Determines the next PC value (sequential,
branch target, or jump target).
• Role: Provides flexibility in routing data through the datapath.
8. Control Unit:
• Function: Decodes the instruction and generates control signals to
orchestrate the data path.
• Inputs: Opcode from the Instruction Memory.
• Outputs: "RegWrite," "MemRead," "MemWrite," "ALUSrc," "Branch,"
"ALUop," etc.
• Role: Acts as the conductor, ensuring each component performs its task
at the right time.
9. Immediate Generation:
• Function: Extracts and sign-extends immediate values from the
instruction (e.g., for "ADDI" or load/store offsets).
42
• Outputs: Immediate value ("Imm") sent to the ALU via a MUX.
• Role: Supports instructions requiring constant values.
2. Instruction Decode:
• The Control Unit decodes the instruction and generates control signals.
• The Register File reads operands from RS1 and RS2 based on the
instruction’s register fields.
3. Execution:
• The ALU performs the operation (e.g., addition, comparison) using
operands from the Register File or an immediate value.
• For branch instructions, the comparator evaluates conditions (e.g.,
"BLT," "BEQ") and updates the PC if necessary.
4. Memory Access:
• For load/store instructions, the ALU result serves as the memory
address.
• Data Memory reads ("MemRead") or writes ("MemWrite") data as
directed.
43
5. Write-Back:
• The result (from the ALU or Data Memory) is written back to the
Register File via a MUX, controlled by "RegWrite."
44
• Control Units
3- PC Control Unit
45
• Main Control Unit
Overview
The main control unit is a fundamental component of a processor or
microcontroller architecture, often considered the "brain" of the system. It is
responsible for decoding instructions and generating the control signals
necessary to execute those instructions by coordinating the processor's data
path components, such as the register file, memory, and Arithmetic Logic Unit
(ALU). Below is a detailed overview of its structure, functionality, and role
based on the provided descriptions.
Datapath Interaction
The main control unit orchestrates the processor's datapath by sending control
signals to:
Register File: RegWrite enables writes, while Mux_RD1 and Mux_RD2 select
source registers.
Memory: MEM_Read and MEM_Write control data access.
ALU: ALU_Ctrl dictates the operation, ensuring correct computation or
comparison.
For instance, in a load instruction (LW):
46
MEM_Read fetches data from memory.
RegWrite stores it in the register file.
Mux_S2_Imm16 selects the immediate offset for the memory address.
47
Structure
48
Key Components and Functionality
1. Input: OpCode
• The main control unit receives the OpCode (operation code) as its
primary input, typically extracted from the instruction register. The
OpCode specifies the operation to be performed, such as arithmetic
(e.g., ADD, SUB), memory access (e.g., LW, SW), or branching (e.g., BEQ,
BNE).
• This input drives the control unit's logic to determine the appropriate
control signals for the current instruction.
49
3. Internal Logic
• The control unit employs combinational logic, such as AND gates, to
decode the OpCode into specific control signal patterns. For example:
• A unique OpCode bit pattern triggers specific AND gate outputs,
activating signals like RegWrite or MEM_Read.
• Multiplexers (Mux) are integrated to route data or control signals
dynamically, enhancing flexibility for different instruction types (e.g., R-
type, I-type).
50
Truth Table
51
Instruction-Specific Behavior
The control unit tailors its outputs based on the instruction being executed.
Here are examples from the provided table:
2. Memory Instructions:
• LW (Load Word): MEM_Read = 1 to fetch data from memory, RegWrite
= 1 to store it in a register, Mux_S2_Imm16 = 1 (immediate offset used).
• SW (Store Word): MEM_Write = 1 to write to memory, Mux_S2_Imm16
= 1, no register write (RegWrite = 0).
52
• ALU Control Unit
Overview
The ALU control unit serves as the "conductor" of the ALU, determining what
operation it should perform based on the instruction being executed. It
receives inputs from the instruction—typically the opcode and, in some cases,
additional function codes—and generates a set of control signals that configure
the ALU accordingly. These operations can include:
53
The control unit ensures that the ALU executes the correct operation by
producing signals like "Signal_of_ADD" or "Signal_of_SUB" for arithmetic tasks
or broader signals like "ALU_Ctrl" to specify the operation type. This process is
essential for enabling the processor to handle a wide variety of instructions
efficiently.
54
Structure
55
The ALU control unit is typically implemented as a combinational logic circuit,
meaning its outputs depend solely on its current inputs, with no memory or
clock dependency. Its structure can be broken down into the following key
elements:
1. Inputs:
• Opcode: Derived from the instruction, this field identifies the type of
operation (e.g., arithmetic, logical, memory access).
• Function Code: For certain instructions (e.g., R-type in MIPS or RISC-
V), this provides additional specificity about the operation (e.g., ADD vs.
SUB within arithmetic instructions).
• These inputs are often multi-bit fields, such as a 6-bit opcode or a 6-bit
funct field, depending on the architecture.
2. Logic Circuitry:
• The unit uses a network of logic gates—AND, OR, NOT, etc.—to decode
the inputs and produce the appropriate control signals.
• For example, an AND gate might combine specific opcode bits to
activate the "Signal_of_ADD" output, while an OR gate could enable
multiple logical operations based on a broader instruction type.
3. Outputs:
• ALU Operation Code: A multi-bit signal (e.g., 2-bit or 3-bit, such as
"ALU operation [1:0]") that specifies the exact operation for the ALU
(e.g., 00 for ADD, 01 for SUB, 10 for AND).
• Specific Control Signals: Outputs like "Signal_of_ADD,"
"Signal_of_SUB," "Logical_op," "Shifter_op," and "Comparator_op"
directly instruct the ALU’s functional units.
• Multiplexer Control Signals: Signals to control multiplexers in the
datapath (e.g., "Mux_S2_Imm16," "Mux_RD1") that select operands or
data sources.
Truth Table
Operation-Specific Behavior
The ALU control unit’s behavior varies depending on the instruction type:
58
• PC Control Unit
Overview
The PC Control Unit is a specialized
component within the processor's control
unit, dedicated to managing the Program
Counter (PC). Its primary responsibility is
to determine the next PC value, ensuring
the correct sequence of instruction
execution. This involves handling
different execution flows:
• Sequential Execution: The PC is incremented by 4 (assuming 32-bit
instructions) to fetch the next instruction in sequence.
• Conditional Branches: The PC is updated to a target address if a
condition (e.g., equality or comparison) is met, such as in instructions like
BEQ (Branch if Equal) or BLT (Branch if Less Than).
• Unconditional Jumps: The PC is set to a specific target address, as in
JALR (Jump and Link Register), which also saves a return address for
subroutine calls.
• Exceptions: Though less detailed here, the PC may be redirected to an
exception handler address in some cases.
Structure
60
• Function: Evaluates branch conditions such as equality (BEQ), inequality
(BNE), or comparisons (BLT, BGE). For instance, BEQ checks if two
registers are equal, often via an ALU subtraction yielding zero.
• Evidence: Outputs like BEQ, BNE, BLT, and BGEU are generated based
on instruction bits and ALU results, as seen in branch condition logic
diagrams.
3. Multiplexer Control:
• Components: Controls a multiplexer that selects the next PC value from
multiple sources:
- PC + 4: Default for sequential execution.
- Branch Target Address: Calculated as PC + offset for taken branches.
- Jump Target Address: A register value plus offset (e.g., for JALR) or a
direct address.
• Function: Generates select signals for the multiplexer based on the
instruction type and condition outcomes.
• Evidence: Multiplexers (e.g., Mux Extender, Mux RD) are shown
selecting data paths, with control signals like "Branch" and "JALR"
directing the choice.
61
Truth Table
Example: Processing a Branch
Instruction (BEQ):
63
17 ADD R4, R8, R3 00834100 R4 = R8 + R3 =
0x12345694
18 LW R1, 0(R0) 00000050 R1 = Mem[0] =
0x00000001
19 LW R2, 1(R0) 00010090 R2 = Mem[1] =
0x00000001
20 LW R3, 2(R0) 000200D0 R3 = Mem[2] =
0x0000000A
21 SUB R4, R4, R4 00A42100 R4 = 0
22 Loop1:ADD R4, R2, R4 00841100 R4 += R2
23 SLT R6, R2, R3 00C31180 R6 = (R2 < R3) ? 1 : 0
24 BEQ R6, R0, done 000030D2 if R6 == 0 → done
25 ADD R2, R1, R2 00820880 R2 += R1
26 BEQ R0, R0, Loop1 FFE00712 unconditional jump
to Loop1
27 done: SW R4, 0(R0) 00040011 Mem[0] = R4 = 0x37
28 MUL R10, R2, R3 01A31280 R10 = R2 * R3=0x64
29 SRL R14, R10, R4 00245380 R14 = R10 >> R4
(logical) = 0
30 SRA R15, R10, R4 004453C0 R15 = R10 >> R4
(arith) = 0
31 RORI R26, R14, 5 00057684 R26 = ROR(R14, 5)
=0
32 JALR R7, R0, func 002501CF Jump to func, save
PC+1 in R7ٍ
33 SET R9, 0x4545 4545024D R9 = 0x00004545
34 SET R10, 0x4545 4545028D R10 = 0x00004545
35 BGE R10, R9, L1 00095095 Branch if R10 >= R9
(taken)
36 ANDI R23, R1, 0xFFFF FFFF0DCB R23 = R1 &
0x0000FFFF (skiped)
64
37 L1: BEQ R0, R0, L1 00000012 Infinite loop (halt)
38 func: OR R5, R2, R3 01431140 R5 = R2 | R3 = 0xa
39 LW R1, 0(R0) 00000050 R1 = Mem[0] =
0x37
40 LW R2, 5(R1) 00050890 R2 = 0x128945AC
41 LW R3, 6(R1) 000608D0 R3 = Mem[R1 + 6] =
0x05007342
42 AND R4, R2, R3 01631100 R4 = R2 & R3 =
0x4100
43 SW R4, 0(R0) 00040011 Mem[0] = R4
44 JALR R0, R7, 0 0000380F Return to caller (JR
R7)
65
We decode each program according to the format of each instruction type
(R- Type, I-Type and SB-Type), after that we decode it to binary number
SW R2, 2(R0)
Stores the value in R2 (0x0000000A) into memory at address 2 + R0 (memory
address 2).
SET R3, 0x1289
Sets R3 to 0x00001289 (4745 in decimal).
SSET R3, 0x45AC
Modifies R3 by shifting its current lower 16 bits (0x1289) to the upper 16 bits
and setting the lower 16 bits to 0x45AC. Result: R3 = 0x128945AC.
SW R3, 60(R0)
Stores the value in R3 (0x128945AC) into memory at address 60 + R0
(memory address 60).
66
SET R4, 0x0500
Sets R4 to 0x00000500 (1280 in decimal).
SSET R4, 0x7342
Shifts R4’s lower 16 bits (0x0500) to the upper 16 bits and sets the lower 16
bits to 0x7342. Result: R4 = 0x05007342.
SW R4, 61(R0)
Stores the value in R4 (0x05007342) into memory at address 61 + R0
(memory address 61).
SET R1, 0x0384
Sets R1 to 0x00000384 (900 in decimal).
67
LW R2, 1(R0)
Loads the value from memory address 1 + R0 (memory[1], which is
0x00000001) into R2. R2 = 0x00000001.
LW R3, 2(R0)
Loads the value from memory address 2 + R0 (memory[2], which is
0x0000000A) into R3. R3 = 0x0000000A.
68
MUL R10, R2, R3
Multiplies R2 (0x0000000B) and R3 (0x0000000A). Result: R10 = 0x0000006E
(110 in decimal). Note: Thinking trace assumes R2 = 0x0A, but after the loop,
R2 = 0x0B.
LW R1, 0(R0)
Loads the value from memory[0] (0x00000037) into R1. R1 = 0x00000037 (55
in decimal).
LW R2, 5(R1)
Loads the value from memory[R1 + 5] = memory[0x37 + 5] = memory[60]
(0x128945AC) into R2. R2 = 0x128945AC.
69
LW R3, 6(R1)
Loads the value from memory[R1 + 6] = memory[0x37 + 6] = memory[61]
(0x05007342) into R3. R3 = 0x05007342.
70
Execution Flow Summary:
1. Initialization (1–20): Sets up registers and memory with initial
values.
2. Loop (22–26): Sums numbers 1 to 10 into R4 (result:
0x00000037), stored in memory[0] at "done".
3. Post-Loop Operations (28–31): Performs arithmetic and shift
operations, mostly resulting in zeros due to large shift amounts.
4. Function Call (32–39): Jumps to "func", performs logical
operations, and updates memory[0] to 0x00004100.
5. Final Branch (42–44): Compares equal values, branches to L1,
and enters an infinite loop.
71
• Array Test Code Description:
72
14 SW R4, 3(R20) 0004A0D1 memory[0x23] = 20
73
27 ADDI R23, R23, 1 0001BDC5 R23 = R23 + 1 (i++)
• 0x21: 10
• 0x22: 15
• 0x23: 20
• 0x24: 25
• 0x25: 30
• 0x26: 35
• 0x27: 40
Loop Operation
• Initial State: R22 (sum) = 0, R23 (i) = 0, R20 = 0x20
• Each Iteration:
74
5. Increment i (R23).
6. Jump back to PC = 16.
• Iterations:
75
Part 2: Pipelined
Processor
76
Introduction
In the initial phase of our project, we developed a Single-Cycle
Processor, meticulously analyzing its functionality and structure.
However, a significant drawback of this processor was its considerable
time delay. To address this limitation and optimize performance, we
embarked on implementing a Pipelined processor. Pipelining is an
implementation technique in which multiple instructions are overlapped
in execution.
77
ensuring the timely identification and resolution of any potential conflicts
or dependencies among instructions.
Pipeline Stages
78
• Instruction Fetching (IF):
79
Key Components of the IF/ID Stage
1. Program Counter (PC)
o A register that stores the memory address of the instruction to be
fetched.
o It is incremented for sequential execution (e.g., moving to the next
instruction) or updated for branches and jumps based on control
logic.
80
2. Instruction Memory
o Retrieves the 32-bit instruction from memory using the address
provided by the PC.
3. IF/ID Pipeline Register
o Stores the fetched instruction and the incremented PC value (e.g.,
PC+1).
o Acts as a latch, holding these values between clock cycles to ensure a
smooth handoff to the Instruction Decode (ID) stage.
4. Control Signals
o Stall: Pauses the pipeline to resolve hazards, such as when an
instruction depends on data not yet available.
o Kill: Invalidates instructions in the pipeline, often due to branch
mispredictions or jumps.
o These signals manage instruction flow and prevent errors caused by
pipeline hazards.
82
83
Key Components
The ID/EX stage involves several critical components that work together to
decode instructions and prepare for execution:
• Instruction Decoding Logic: Analyzes the instruction's opcode to
determine the operation (e.g., add, load, branch) and sets up control
signals.
• Register File: A storage unit containing the processor's registers,
accessed to retrieve source operand values.
• Immediate Value Extraction: Extracts and processes immediate values
embedded in the instruction (e.g., offsets or constants).
• Sign Extension Unit: Extends immediate values to match the processor's
data width (e.g., from 16 bits to 32 bits).
• Control Unit: Generates control signals based on the instruction type to
guide the ALU, memory, and register operations in subsequent stages.
• ID/EX Pipeline Register: A temporary storage buffer that holds decoded
data and control signals, ensuring they are available for the EX stage.
86
• Execution Stage ( EX ) :
87
Key Components
The EX/MEM stage integrates several essential components that collaborate to
execute instructions efficiently:
1. Arithmetic Logic Unit (ALU):
o The ALU performs arithmetic operations (e.g., addition, subtraction)
and logical operations (e.g., AND, OR, XOR) based on the instruction
type.
o It also calculates memory addresses for load/store instructions by
adding an offset to a base register.
o Inputs typically include register operands (e.g., RS1, RS2) or
immediate values, while the output (ALUres) is the result of the
computation.
2. Control Unit:
o Generates and propagates control signals that dictate subsequent
pipeline behavior.
o Examples include:
▪ MemRead: Signals a memory read operation in the MEM stage.
▪ MemWrite: Signals a memory write operation in the MEM
stage.
▪ RegWrite: Indicates whether the result should be written back
to a register in the WB stage.
▪ Branch: Controls branch instruction evaluation.
3. Forwarding Unit:
o Detects and resolves data hazards by forwarding the latest data
from later stages (e.g., MEM or WB) to the EX stage.
o Ensures that the ALU uses the most current operand values when an
instruction depends on a prior result.
4. Branch Logic:
o Evaluates branch conditions (e.g., BEQ for branch if equal, BLT for
branch if less than) using ALU outputs (e.g., zero flag or comparison
results).
o Calculates branch target addresses and updates the Program
Counter (PC) if the branch is taken.
5. EX/MEM Pipeline Register:
88
o A storage unit that latches the results of the EX stage (e.g., ALUres,
branch decisions) and control signals.
o Transfers this data to the MEM stage on the next clock cycle,
ensuring synchronized pipeline operation.
Operations Performed
The EX/MEM stage executes several critical tasks:
1. ALU Computations:
o For arithmetic/logical instructions, the ALU processes operands to
produce a result (ALUres).
o For memory instructions, it computes the effective address (e.g.,
base register + offset).
2. Branch Resolution:
o Evaluates branch conditions using comparators or ALU flags (e.g.,
zero or sign flags).
o If a branch is taken, the PC is updated with the target address;
otherwise, it proceeds sequentially.
o May generate a "Kill" signal to flush earlier pipeline stages if the
branch prediction was incorrect.
3. Data Hazard Management:
o The forwarding unit identifies dependencies and routes the correct
data (e.g., from MEM or WB stages) to the ALU via multiplexers
(MUX).
o This prevents stalls by ensuring the latest operand values are
available.
4. Control Signal Propagation:
o Control signals from the ID stage (e.g., MemRead, MemWrite,
RegWrite) are passed through the EX/MEM register to guide MEM
and WB stage operations.
5. Memory Access Preparation:
o For load/store instructions, the ALU result (address) and data (for
stores) are prepared and stored in the EX/MEM register for the
MEM stage.
Data Flow
The data flow through the EX/MEM stage follows this sequence:
89
1. Inputs from ID/EX Register:
o Decoded instruction fields (e.g., opcode, register identifiers).
o Operands (e.g., RS1, RS2, immediate values).
o Control signals (e.g., ALU operation type, memory access flags).
2. Execution by ALU:
o The ALU processes operands based on the instruction, producing
ALUres.
o For branches, it generates flags (e.g., zero) for condition evaluation.
3. Branch Decision:
o Branch logic assesses conditions and computes target addresses if
applicable.
4. Hazard Resolution:
o Forwarding logic selects the correct operand sources via MUXes
(e.g., ForwardA, ForwardB signals).
5. Output to EX/MEM Register:
o ALUres (computation result or address).
o Data for stores (e.g., RS2).
o Control signals (e.g., MemRead, MemWrite, RegWrite).
o PC-related values (e.g., for branches).
These outputs are latched in the EX/MEM register, ready for the MEM stage.
90
91
• MEM/WB Stage:
92
Key Components of the MEM/WB Stage
The MEM/WB stage relies on several components to manage data and control
signals:
1. MEM/WB Pipeline Register:
o This is a storage unit that temporarily holds data and control signals
transitioning from the MEM stage to the WB stage.
o It acts as a buffer, latching values on a clock cycle to ensure they are
available for the next stage.
o Schematically, it is often represented as a large block containing
internal registers, synchronized by a clock signal (CLK).
2. Data Lines (BusD):
o BusD carries the data to be written back to the register file. This
data could be:
▪ The result of an ALU operation (e.g., from an arithmetic
instruction like add).
▪ Data retrieved from memory (e.g., from a load instruction like
lw).
o It enters the MEM/WB register from the MEM stage and exits to the
WB stage, sometimes labeled as "BusD PL" (Pipeline) in diagrams.
3. Register Destination (RD):
o RD specifies the address of the register in the register file where the
result will be stored.
o It ensures that the data on BusD is written to the correct location
during the WB stage.
4. Control Signals:
o RegWrite: A critical control signal that determines whether the data
on BusD should be written to the register file.
▪ If RegWrite = 1, the write operation occurs; if RegWrite = 0, it
does not.
o CLK (Clock): Synchronizes the latching of data and control signals
within the pipeline register, ensuring operations occur at the right
time.
5. Flip-Flops:
o These are basic storage elements within the pipeline register, often
implemented as D flip-flops.
o They hold values like BusD, RD, and RegWrite, updating them on the
93
clock edge based on inputs from the MEM stage.
94
where the actual write to the register file occurs.
4. Support for Hazard Management:
o By providing access to RD and BusD, the stage aids in detecting and
resolving data hazards (e.g., through forwarding to earlier stages).
95
• Forward&Stall:
1. Inputs:
• RS1, RS2, RD: These are register identifiers from the pipeline
stages (likely IF/ID, ID/EX, EX/MEM, MEM/WB).
• RegWrite: Signals from EX/MEM and MEM/WB stages indicating
whether a register write will occur.
• EX/MEM.Read, MEM/WB.Read: Indicate if the EX/MEM or
MEM/WB stages are reading from registers.
2. Logic:
• The Forward and Stall unit contains comparators and logic gates
(AND, OR) to detect data dependencies and decide forwarding or
stalling.
• Comparators check if the source registers (RS1, RS2) in the ID stage
match the destination registers (RD) in later stages (EX, MEM, WB).
96
• Forwarding signals (ForwardA, ForwardB) are generated to select
the correct data source for the ALU inputs.
• A Stall signal is generated if a hazard cannot be resolved by
forwarding (e.g., a load-use hazard).
3. Outputs:
• ForwardA, ForwardB: Control signals for multiplexers in the EX
stage to forward data from EX/MEM or MEM/WB to the ALU inputs.
• Stall: A signal to stall the pipeline (e.g., by inserting a bubble) if a
hazard requires it.
Datapath Functionality:
• Forwarding:
• Stalling:
97
• PC control unit:
1. Inputs:
• PC+1: The incremented value of the current PC, representing the next
sequential instruction address.
• Branch: The target address for a branch instruction, derived from the
current PC and an immediate value (ImmU or ImmL).
• JALR: The target address for a Jump and Link Register instruction,
calculated using RS1 (a register value) and an immediate value
(Imm16).
• ImmU, ImmL, Imm16: Immediate values from the instruction, sign-
extended to form branch or jump offsets.
• RS1: A register value used in JALR calculation.
• CLK: Clock signal to synchronize the PC register.
• Stall: Control signal to pause PC updates when hazards occur.
98
• Components:
2. Outputs:
• Address: The current PC value sent to the instruction memory for
fetching the next instruction.
Datapath Functionality:
• Sequential Execution: Normally, the PC is incremented by 1 (PC+1) to
fetch the next instruction in sequence.
• Branch Handling: When a branch instruction is detected, the mux selects
the Branch address (PC + sign-extended immediate) as the next PC value.
• JALR Handling: For JALR instructions, the mux selects the JALR address
(RS1 + sign-extended Imm16) as the next PC value.
• Stalling: The Stall signal can halt PC updates, inserting a bubble in the
pipeline to resolve hazards (e.g., data or control hazards).
• Clock Synchronization: The PC register updates its value on the rising
edge of the CLK signal, ensuring synchronized operation with the
pipeline.
99
Team Work
Abdelrahman Abdelnasser Abdelrahman:
Design program counter, Data memory, Instruction memory, PC control
unit pipeline
Test the processor
Design datapath with Yusuf and Raghad
Write documentation with Yusuf
Update PC and Fix all problem
101