0% found this document useful (0 votes)
46 views16 pages

Implementation of Multi Stage Processor

The document discusses the design and implementation of a 32-bit multi-stage RISC-V processor, highlighting the advantages of multi-stage processing over single-cycle processing. It details the five stages of the pipeline, potential hazards (data, control, and structural), and provides a Verilog implementation of the processor. The document emphasizes the efficiency and speed improvements achieved through pipelining in instruction execution.

Uploaded by

ceralap881
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views16 pages

Implementation of Multi Stage Processor

The document discusses the design and implementation of a 32-bit multi-stage RISC-V processor, highlighting the advantages of multi-stage processing over single-cycle processing. It details the five stages of the pipeline, potential hazards (data, control, and structural), and provides a Verilog implementation of the processor. The document emphasizes the efficiency and speed improvements achieved through pipelining in instruction execution.

Uploaded by

ceralap881
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Design and Implementation of 32-bit Multi stage

RISC-V Processor

1. Why Choose a Multi-Stage Processor Over a Single-Cycle Processor?

i. Single Cycle processor

 Before diving into the multi-stage pipeline processor, let's first understand the single-
cycle processor. Then we can see why pipelining is important.

 A Single Cycle RISC-V Processor is a basic CPU design in which every


instruction is executed in exactly one clock cycle.
 This includes all five stages of instruction execution: instruction fetch, decode,
execute, memory access, and write-back.

 To understand better, can we execute a few instructions step-by-step in a Single Cycle


RISC-V Processor and observe how they go through each stage?

Program:

main:

addi x1, x0, 5

addi x2, x0, 10

add x3, x2, x1

https://drive.google.com/file/d/19tvVzC2Peg3M1gEwgkp9zjb7W3Y67ugn/view?usp=drive_
link
Cycle Instruction ID EX
IF MEM WB
1 addi x1, Decode, Add 0 + 5 No Write 5 -
Fetch
x0, 5 read x0 (0) memory x1
from
PC=0
2 Fetch from Decode, Add 0 + 10
addi x2, No Write 10
PC=4 read x0 (0)
x0, 10 memory – x2
3 add x3, x2, Fetch from Decode, Add 10 + 5 No
Write 15 -
x1 PC=8 read x2 memory
x3
(10), x1 (5)

 No need to handle data, control, or structural hazards since there’s no overlap between
instructions.
 The clock cycle has to be long enough to finish the slowest instruction so faster
instructions waste time.
 Only one instruction runs at a time, so it’s slow overall.
2. Pipeline Stages in a Multi-Stage RISC-V Processor
 That's why we use a multi-stage processor, it runs faster and is more efficient than
a single-cycle processor.
 Stages in Multi stage pipelined processor
5-stage pipeline:
IF → ID → EX → MEM → WB

 IF – Instruction Fetch
 Here we fetch an instruction from memory.
 PC register already contains the address of next instruction, so simply whatever is there
in PC from that memory location we read.
 ID – Instruction Decode
 Here we try to decode the opcode and find out the what kind of instruction it is.
 While decoding is going on it also do some fetching.
 Assuming that there will be 16bit immediate data, it will be taking that last 16bit of
instruction and it will be doing a sign extension to 32bits.
 EX-Execute
 Here we execute the instruction or some instructions we have to compute the effective
address.
 It’s actual memory address from which data will be loaded (LW) or to which data will
be stored(SW).
 MEM – Memory Access
 In this stage here it actually d memory access, read & write from memory.
 For branch instruction it decides whether to branch or not.
 WB – Write Back
 The result of an instruction is written back to the register file.
 After an instruction finishes calculating, we store the result into register in the register
file.

 Let us understand the pipeline stages in a multi-stage processor by taking an example.

Program:
.text
main:
addi x1, x0, 5
addi x2, x0, 10
nop
nop
add x3, x2, x1
link: https://drive.google.com/file/d/1kfnzHK05PBeWIAu1yKHBBrIyfgy1o7B-
/view?usp=drive_link

Cycle IF ID WB
EX MEM

1 addi x1, x0,


5
2 addi x2, x0, addi x1, x0,
10 5
3 NOP addi x2, x0, addi x1, x0,
10 5
4 NOP NOP addi x2, x0, addi x1, x0,
10 5
5 add x3, x2, NOP NOP addi x2, x0, addi x1, x0,
x1 10 5
6 add x3, x2, NOP NOP addi x2, x0,
x1 10
7 add x3, x2, NOP NOP
x1
8 add x3, x2, NOP
x1
9 add x3, x2,
x1
3. Micro-Operations in Each Pipeline Stage

IF_ID Stage:
The instruction is fetched from memory using the program counter, and the PC is
incremented by 4 to point to the next instruction.

ID/EX Stage :
The instruction is decoded, source registers are read, and control signals are generated
for the next stage.

EX/MEM Stage:
The ALU performs the required operation such as arithmetic or address calculation, and
the result is passed to the memory stage along with updated control signals.

MEM/WB Stage:
If it’s a load instruction, data is read from memory; otherwise, the ALU result is
prepared to be written back to the register file.
4. Types of Hazards in a Multi-Stage Pipeline

 Data Hazards:
When an instruction depends on the result of a previous instruction that hasn’t yet
completed.

Example:
addi x1, x0, 5 # x1 = 5
addi x2, x0, 10 # x2 = 10
add x3, x1, x2 # x3 = x1 + x2 → data hazard here

link:
https://drive.google.com/file/d/1hl8igFd6qln0DeALkVckzusGewKk0ejF/view?us
p=drive_link

Cycle IF ID EX WB
MEM

1 addi x1,
x0, 5
2 addi x2, addi x1,
x0, 10 x0, 5
3 add x3, addi x2, addi x1,
x1, x2 x0, 10 x0, 5
4 add x3, addi x2, addi x1,
x1, x2 x0, 10 x0, 5
5 add x3, addi x2, addi x1,
x1, x2 x0, 10 x0, 5
6 add x3, addi x2,
x1, x2 x0, 10
7 add x3,
x1, x2

 add x3, x1, x2 is trying to read x1 and x2 in its ID stage.


 But x1 and x2 haven’t reached WB yet, so their correct values aren't available yet.
 This is a Read After Write (RAW) data hazard.
 Control Hazards:
Hazards caused by branch or jump instructions that change the program counter
(PC).

Example:
What's is hazard in this
# Assume x1 = 5, x2 = 5 initially

addi x1, x0, 5 # x1 = 5


addi x2, x0, 5 # x2 = 5
beq x1, x2, target # If equal, jump to target
addi x3, x0, 10 # This should be skipped if branch is taken
addi x4, x0, 20 # This will be target

target:
addi x5, x0, 30 # This is where we land if beq taken

Link:
https://drive.google.com/file/d/1IAJcRL9DWJ0aPErHSCn9yp1pkTqZmGHF/vie
w?usp=drive_link

Cycle IF ID EX MEM WB

1 addi x1, x0,


5
2 addi x2, x0, addi x1, x0,
5 5
3 beq x1, x2, addi x2, x0, addi x1, x0,
target 5 5
4 addi x3, x0, beq x1, x2, addi x2, x0, addi x1, x0,
10 target 5 5
5 addi x4, x0, addi x3, x0, beq x1, x2, addi x2, x0, addi x1,
20 10 target 5 x0, 5

 In a 5-stage pipeline (like in Ripes), branch instructions like beq are only
resolved in the Execute (EX) stage, which is 2 cycles after the fetch.
 The branch decision (beq) is only made in the Execute (EX) stage.
 Meanwhile, the next instructions (addi x3, addi x4) are already fetched and
possibly entered decode or execute stages.
 This creates a Control Hazard — the CPU is unsure whether to continue with
x3/x4 or jump to target.

 Structural Hazard:
A Structural Hazard occurs when hardware resources are not sufficient to support
multiple instructions executing in parallel in the pipeline.

Example:
lw x1, 0(x2) # Instruction 1 — Load word from memory into x1
addi x3, x0, 5 # Instruction 2 — Set x3 = 5 (uses ALU, no memory access)
sw x4, 0(x5) # Instruction 3 — Store word from x4 into memory at address in x5

link:
https://drive.google.com/file/d/1V3A_KQz1b-
CuY_dOkxBQSieeDTePHF8T/view?usp=drive_link

 lw x1, 0(x2) and sw x4, 0(x5) involve memory access.


 If memory is not properly initialized or x2/x5 don't point to valid memory,
these memory-related instructions don't actually read or write correctly.
 But addi x3, x0, 5 is a pure ALU instruction (doesn't depend on memory),
so it always works and updates x3.
5. Implementation of Multi stage RISC-V processor using Verilog.

Design:

module pipe_riscv32(clk1, clk2);


input clk1, clk2;

reg [31:0] pc, IF_ID_IR, IF_ID_NPC;


reg [31:0] ID_EX_IR, ID_EX_NPC, ID_EX_A, ID_EX_B, ID_EX_Imm;
reg [2:0] ID_EX_type, EX_MEM_type, MEM_WB_type;
reg [31:0] EX_MEM_IR, EX_MEM_ALUOut, EX_MEM_B;
reg EX_MEM_cond;
reg [31:0] MEM_WB_IR, MEM_WB_ALUOut, MEM_WB_LMD;
reg [31:0] Reg [0:31]; // Register Bank 32x32
reg [31:0] MEM [0:1023]; // Memory 1024x32

reg HALTED;
reg TAKEN_BRANCH;

parameter ADD = 6'b000000,


SUB = 6'b000001,
AND = 6'b000010,
OR = 6'b000011,
SLT = 6'b000100,
MUL = 6'b000101,
HLT = 6'b111111,
LW = 6'b001000,
SW = 6'b001001,
ADDI= 6'b001010,
SUBI= 6'b001011,
SLTI= 6'b001100,
BNEQZ=6'b001101,
BEQZ= 6'b001110;

parameter RR_ALU = 3'b000,


RM_ALU = 3'b001,
LOAD = 3'b010,
STORE = 3'b011,
BRANCH = 3'b100,
HALT = 3'b101;

// IF Stage
always @(posedge clk1)
if (HALTED == 0) begin
if (((EX_MEM_IR[31:26] == BEQZ) && (EX_MEM_cond == 1)) ||
((EX_MEM_IR[31:26] == BNEQZ) && (EX_MEM_cond == 0))) begin
IF_ID_IR <= #2 MEM[EX_MEM_ALUOut];
TAKEN_BRANCH <= #2 1'b1;
IF_ID_NPC <= #2 EX_MEM_ALUOut + 1;
pc <= #2 EX_MEM_ALUOut + 1;
end else begin
IF_ID_IR <= #2 MEM[pc];
IF_ID_NPC <= #2 pc + 1;
pc <= #2 pc + 1;
end
end

// ID Stage
always @(posedge clk2)
if (HALTED == 0) begin
ID_EX_A <= #2 Reg[IF_ID_IR[25:21]];
if (IF_ID_IR[20:16] == 5'b00000)
ID_EX_B <= #2 0;
else
ID_EX_B <= #2 Reg[IF_ID_IR[20:16]];

ID_EX_NPC <= #2 IF_ID_NPC;


ID_EX_IR <= #2 IF_ID_IR;
ID_EX_Imm <= #2 {{16{IF_ID_IR[15]}}, IF_ID_IR[15:0]}; // sign-extend imm

case (IF_ID_IR[31:26])
ADD, SUB, AND, OR, SLT, MUL: ID_EX_type <= #2 RR_ALU;
ADDI, SUBI, SLTI: ID_EX_type <= #2 RM_ALU;
LW: ID_EX_type <= #2 LOAD;
SW: ID_EX_type <= #2 STORE;
BNEQZ, BEQZ: ID_EX_type <= #2 BRANCH;
HLT: ID_EX_type <= #2 HALT;
default: ID_EX_type <= #2 HALT;
endcase
end

// EX Stage
always @(posedge clk1)
if (HALTED == 0) begin
EX_MEM_type <= #2 ID_EX_type;
EX_MEM_IR <= #2 ID_EX_IR;
TAKEN_BRANCH <= #2 0;

case (ID_EX_type)
RR_ALU: begin
case (ID_EX_IR[31:26])
ADD: EX_MEM_ALUOut <= #2 ID_EX_A + ID_EX_B;
SUB: EX_MEM_ALUOut <= #2 ID_EX_A - ID_EX_B;
AND: EX_MEM_ALUOut <= #2 ID_EX_A & ID_EX_B;
OR: EX_MEM_ALUOut <= #2 ID_EX_A | ID_EX_B;
MUL: EX_MEM_ALUOut <= #2 ID_EX_A * ID_EX_B;
SLT: EX_MEM_ALUOut <= #2 (ID_EX_A < ID_EX_B);
default: EX_MEM_ALUOut <= #2 32'hxxxxxxxx;
endcase
end
RM_ALU: begin
case (ID_EX_IR[31:26])
ADDI: EX_MEM_ALUOut <= #2 ID_EX_A + ID_EX_Imm;
SUBI: EX_MEM_ALUOut <= #2 ID_EX_A - ID_EX_Imm;
SLTI: EX_MEM_ALUOut <= #2 (ID_EX_A < ID_EX_Imm);
default: EX_MEM_ALUOut <= #2 32'hxxxxxxxx;
endcase
end
LOAD, STORE: begin
EX_MEM_ALUOut <= #2 ID_EX_A + ID_EX_Imm;
EX_MEM_B <= #2 ID_EX_B;
end
BRANCH: begin
EX_MEM_ALUOut <= #2 ID_EX_NPC + ID_EX_Imm;
EX_MEM_cond <= #2 (ID_EX_A == 0); // assuming zero check for branch
end
endcase
end

// MEM Stage
always @(posedge clk2)
if (HALTED == 0) begin
MEM_WB_type <= #2 EX_MEM_type;
MEM_WB_IR <= #2 EX_MEM_IR;
case (EX_MEM_type)
RR_ALU, RM_ALU: MEM_WB_ALUOut <= #2 EX_MEM_ALUOut;
LOAD: MEM_WB_LMD <= #2 MEM[EX_MEM_ALUOut];
STORE:
if (TAKEN_BRANCH == 0)
MEM[EX_MEM_ALUOut] <= #2 EX_MEM_B;
endcase
end
// WB Stage
always @(posedge clk1)
if (HALTED == 0) begin
if (TAKEN_BRANCH == 0)
case (MEM_WB_type)
RR_ALU: Reg[MEM_WB_IR[15:11]] <= #2 MEM_WB_ALUOut;
RM_ALU, LOAD: Reg[MEM_WB_IR[20:16]] <= #2 (MEM_WB_type == LOAD
? MEM_WB_LMD : MEM_WB_ALUOut);
HALT: HALTED <= #2 1'b1;
endcase
end

endmodule
Testbench:

module pipe_riscv32_tb;

reg clk1, clk2;

integer i;

// Instantiate the processor

pipe_riscv32 DUT(clk1, clk2);

// Clock generation

initial begin

clk1 = 0; clk2 = 0;

forever begin

#5 clk1 = ~clk1; // Toggle clk1 every 5 time units

#5 clk2 = ~clk2; // Toggle clk2 after clk1

end

end

// Initialize memory and registers

initial begin

// Clear register file and memory

for (i = 0; i < 32; i = i + 1)

DUT.Reg[i] = 0;

for (i = 0; i < 1024; i = i + 1)

DUT.MEM[i] = 32'h00000000;
// -----------------------------

// Program: Simple instruction flow

// -----------------------------

// ADDI R1, R0, #5 => R1 = 5

// ADDI R2, R0, #10 => R2 = 10

// ADD R3, R1, R2 => R3 = R1 + R2 = 15

// SW R3, 100(R0) => MEM[100] = R3

// LW R4, 100(R0) => R4 = MEM[100]

// HLT

DUT.MEM[0] = {6'b001010, 5'd0, 5'd1, 16'd5}; // ADDI R1, R0, 5

DUT.MEM[1] = {6'b001010, 5'd0, 5'd2, 16'd10}; // ADDI R2, R0, 10

DUT.MEM[2] = {6'b000000, 5'd1, 5'd2, 5'd3, 11'd0}; // ADD R3, R1, R2

DUT.MEM[3] = {6'b111111, 26'd0}; // HLT

// Reset control flags

DUT.HALTED = 0;

DUT.TAKEN_BRANCH = 0;

DUT.pc = 0;

// Simulation time

#200;

// Output Register Contents


$display("\nFinal Register Values:");

for (i = 0; i < 8; i = i + 1)

$display("R[%0d] = %0d", i, DUT.Reg[i]);

$display("\nMemory[100] = %0d", DUT.MEM[100]);

$finish;

end

endmodule
6. Result

Final Register Values:


R[0] = 0
R[1] = 5
R[2] = 10
R[3] = 5
R[4] = 0
R[5] = 0
R[6] = 0
R[7] = 0

Memory[100] = 0
testbench.sv:56: $finish called at 200 (1s)

7. Simulation
8. References:
[1] I. Sen Gupta, Computer Organization and Architecture, National Programme on
Technology Enhanced Learning (NPTEL), IIT Kharagpur

[2] The RISC-V Instruction Set Manual, Volume I: Unprivileged ISA, Document Version
20191213, RISC-V Foundation.

You might also like