0% found this document useful (0 votes)
52 views6 pages

IEEE Conference Template 1

This document details the design and implementation of a 32-bit single-cycle RISC-V processor using open-source tools, specifically Verilog HDL, GTKWave, and the Qflow toolchain. The processor supports the RV32I instruction set and is designed for educational purposes, emphasizing simplicity and clarity while demonstrating a complete digital VLSI design flow from RTL to GDSII. The project showcases the feasibility of using open-source EDA tools in processor design, making it suitable for students and researchers in VLSI and digital systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views6 pages

IEEE Conference Template 1

This document details the design and implementation of a 32-bit single-cycle RISC-V processor using open-source tools, specifically Verilog HDL, GTKWave, and the Qflow toolchain. The processor supports the RV32I instruction set and is designed for educational purposes, emphasizing simplicity and clarity while demonstrating a complete digital VLSI design flow from RTL to GDSII. The project showcases the feasibility of using open-source EDA tools in processor design, making it suitable for students and researchers in VLSI and digital systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Enabling Cost-Effective ASIC Prototyping: Design

and simulation of a 32-bit Single-Cycle RISC-V


Processor with Qflow and GTKWave
Gampala Surekha Arpith Siddojigari Chappidi Ruthvik Reddy
ECE Dept., GRIET ECE Dept., GRIET ECE Dept., GRIET
Hyderabad, India Hyderabad, India Hyderabad, India
surekha537@[Link] arpith22241A0404@[Link] ruthvik22241A0410@[Link]

Gopu Chetan Alavala Renish


ECE Dept., GRIET ECE Dept., GRIET
Hyderabad, India Hyderabad, India
chetan22241A0415@[Link] renish22241A0402@[Link]

Abstract—This abstract aims at the design and physical im- [2.]The design and implementation of a 32-bit single-
plementation of a 32-bit single-cycle RISC-V processor using cycle RISC-V processor in Verilog is a sophisticated and
open-source tools, with demonstration of end-to-end digital VLSI elaborate process that aims to create a functioning processor
design flow from RTL to GDSII. The processor is implemented in
Verilog HDL and supports the RV32I instruction set. Functional architecture that adheres to the RISC-V instruction set. To
verification is accomplished using GTKWave for waveform anal- execute instructions in a single clock cycle, this research work
ysis and simulation validation. After RTL validation, the design requires the synthesis of components such as the program
is synthesized and mapped with Yosys, placed with GrayWolf, counter, register file, arithmetic logic unit (ALU), and memory
and routed with Qrouter, all under the Qflow toolchain. The modules. The Verilog-based implementation includes RISC-V
flow produces a DRC-clean GDSII layout, an entire physical
chip layout ready for manufacturing. The single-cycle design instructions such as arithmetic, memory access, and control
executes each instruction in one clock cycle, providing a balance flow instructions. The design prioritizes simplicity and clarity,
of performance and simplicity that is appropriate for educational laying the groundwork for educational study and the eventual
and prototyping applications. The project proves the viability development of more advanced processing functionality.
of using purely open source EDA tools in processor design, [3.]RISC-V is a very novel ISA(instruction-set architecture)
verification, and layout, and thus is a good practice for students
and researchers in the domain of VLSI and digital systems. recently launched features such as low power consumption,
Index Terms—RISC-V, GTKWave, Qflow, Verilog HDL, Open- low cost, and scalability. In the future, IoT(Internet of
source EDA, Processor Design, RTL-to-GDSII Things) devices will be developed in a large amount, and the
characteristics of RISC-V are exactly what IoT devices need.
I. I NTRODUCTION single-cycle processor completes each instruction in one clock
In recent years, the open-source hardware movement has cycle, providing simplicity in control logic and predictability
gained great momentum, with RISC-V leading the way as in timing. The processor is designed using Verilog HDL,
an open and extensible instruction set architecture (ISA). simulated and debugged with GTKWave, and synthesized to
This project entails the design, verification, and physical GDSII layout using the Qflow toolchain. This end-to-end flow
implementation of a 32-bit single-cycle RISC-V processor provides hands-on experience in digital design, bridging the
with the aid of open-source tools. [1.] The RISC-V ISA gap between theoretical computer architecture and practical
is becoming one of the leading instruction sets for the VLSI design.
Internet-of-Things and System-on-Chip applications. Due
to its strong security features and open-source nature, it is
becoming a competitor to the popular ARM architecture. This
paper describes the design of a light weight, open-source The project intends to present a contemporary, cost-effective
implementation of a RISCV processor using modern hardware way of understanding processor design by applying a RISC-
design teclmiques, the implementation of the design onto a V processor with the help of open source tools. This helps
Field Programmable Gate Array (FPGA), and its testing. We improve students’ performance and rectifies the shortcomings
wanted to create a RISC-V processor that is easy for beginners of conventional methods, i.e., outdated documentation-based
to learn from and lightweight enough to be implemented on methodologies and the lack of hands-on access to proprietary
even small FPGAs. tools.
II. M ETHODOLOGY
A. RISC-V ISA Design and Instruction set choice
The RISC-V RV32I ISA in this processor design is a 32-bit
basic instruction set that is simple but efficient for operations.
The ISA includes four main instruction types: R-type, I-
type, S-type, and SB-type, each performing different functions
within the processor.
R-type instructions are used for register-to-register opera-
tions like ADD, SUB, AND, and OR. These instructions have
a structure with fields named funct7, rs2, rs1, funct3, rd, and
opcode. This setup allows for precise ALU operations that
take two source registers and produce a result in a destination
register.
I-type instructions, such as ADDI and LW, use an immediate
value instead of a second source register. This setup allows for
Fig. 2. Single cycle RISC-V Architecture
tasks like constant arithmetic and memory loads. The fields in
this type are a 12-bit immediate, rs1, funct3, rd, and opcode.
S-type instructions are for SW (store word) operations,
2) Instruction Memory: Instruction Memory is a read-only
where data is written to memory. The immediate value is split
memory (ROM) module that stores the RISC-V machine
over two fields ([11:5] and [4:0]) for encoding.
instructions. The address fed to this memory is from the PC,
SB-type instructions, like BEQ (branch if equal), are for
and the output is a 32-bit instruction. This instruction gets
conditional branching. They feature a divided immediate field
decoded to infer the operation type, register operands, and
that enables jumps based on relative offsets to change control
immediate values.
flow.
The regular and straightforward format structure makes
the decoder hardware relatively easy, which saves space in 3) Control Unit: The Control Unit is responsible for
Verilog implementation. By using only these basic formats, understanding the opcode field of the instruction. It uses
the processor achieves high flexibility without needing very this opcode to create proper control signals that direct the
complex hardware. Additionally, the fixed 32-bit instruction operation of the datapath units. These control signals dictate
length allows for consistent fetch and decode phases, ideal what the ALU should perform, whether the register file or
for a single-cycle design. The use of RV32I ensures that data memory must be read, and which data paths should be
even a minimal processor core can perform a wide range taken using multiplexers.
of computations, memory access instructions, and branching,
making it a strong platform for education, prototyping, and 4) Register File: The Register File holds 32 general-
open-source VLSI design. purpose 32-bit registers utilized in RISC-V. It supports two
registers being read at the same time depending on source
register addresses from the instruction. If the instruction is
to write out a result (such as arithmetic operations or load
instructions), it supports writing into the destination register
of the instruction.

5) ALU (Arithmetic Logic Unit): The ALU does all


the arithmetic and logical operations including addition,
Fig. 1. Instruction types subtraction, AND, OR, and comparisons. It takes two input
operands (either both registers or one register and the
immediate generator), and does the desired operation based
B. Architecture of the 32-bit Single-cycle RISC-V processor upon the ALUControl signal obtained from the instruction. It
1) Program Counter (PC): Program Counter (PC) contains also provides a ”Zero” output utilized for branch decisions.
the memory address of the next instruction that is to be
executed. In every clock cycle, it points towards a new 6) Immediate Generator: This block takes the immediate
instruction by incrementing by 4 (in order to go to the next field of the instruction and sign-extends it to 32 bits. This
instruction in sequence) or updating to a branch target address is required for instructions such as addi, lw, sw, and branch
in case of a branch instruction. The PC provides correct instructions in which an immediate value is used either for
sequencing of instruction flow inside the processor. addressing or arithmetic purposes.
7) Multiplexers (MUXes): Multiplexers are employed B. Physical Design Using Qflow
across the datapath to choose among various data sources.
For instance, a MUX chooses between a register value and
an immediate value as the second input to the ALU. Another
MUX is employed to choose between ALU result and
memory output to write back into the register file depending
on the MemToReg control signal.

8) Data Memory: The Data Memory unit serves


instructions for load and store. If the instruction is an
lw (load word), it reads data from memory at the calculated
address. If it is an sw (store word), it stores data. The
address is from the ALU output, and control signals such as
MemRead and MemWrite decide a read or write.

9) Branching and Jumping Logic: The processor does


support conditional branching. The output of the ALU’s Zero
flag is combined with the Branch signal in order to determine
Fig. 4. Qflow
whether to update the PC with a branch target address. The
target is calculated by shifting the immediate left by 2 bits
and adding the result to the PC + 4 value with an adder. The entire physical design of the RISC-V processor was
done using the Qflow open-source ASIC toolchain, which
combines various tools for synthesis, placement, routing, and
III. R ESULTS AND D ISCUSSIONS layout verification. Qflow provides support for technology
A. Functional Simulation Using GTKWave libraries such as OSU035 (0.35µm) and OSU018 (0.18µm),
The operation of the 32-bit single-cycle RISC-V processor and these two libraries were utilized to check layout efficiency
was tested by simulation with Icarus Verilog and waveform as well as area consumption for this project.
visualization using GTKWave. A specific Verilog testbench
was created to simulate a stream of RV32I instructions, cov- All the steps in Qflow include:
ering arithmetic instructions (ADD, SUB), logical instructions 1) Preparation: Preparation in Qflow is the process
(AND, OR), memory access instructions (LW, SW), and of defining the top-level project directory structure, RTL
branch instructions (BEQ). The aim was to test the entire (Verilog) source files, and the standard cell library. This
processor datapath in all five stages: instruction fetch, decode, phase validates that all required files such as technology
execute, memory access, and write-back. files (.lib, .lef, .gds) and constraints (e.g., clock period) are
configured correctly. Preparation involves coding the top-
level design and testbench files and checking basic syntax.
Software such as iverilog and gtkwave is commonly utilized at
this stage to functionally simulate the design prior to synthesis.

2) Synthesis: During this stage, the RTL code is


converted to a gate-level netlist with the help of a logic
synthesis tool—most of the time Yosys in Qflow. The
Yosys reads the Verilog code, optimizes the logic gates, and
translates them to the gates in the standard cell library. This
Fig. 3. Simulated output using GTK wave produces a technology-mapped netlist and a timing report.
This gate-level netlist is crucial for the downstream backend
GTKWave offered a clear picture of internal signal tran- tools. It also ensures that the design is logically correct and
sitions at run time. Important signals like Program Counter, physically ready.
Instruction, Register Values, ALU Result, Memory Data,
and control lines (RegWrite, MemRead, Branch, etc.) were 3) Placement: Placement is where the physical location of
watched. In the case of an ADD instruction, the waveforms every standard cell from the synthesized netlist on the silicon
justified that the proper source registers were read, ALU die is determined. Qflow does this step by utilizing tools such
performed the correct result, and the destination register was as GrayWolf. The tool positions the cells to minimize the
updated accordingly. For LW and SW instructions, correct overall wire length and timing delays. Placement must be done
memory addresses were computed, and data were read from correctly so as not to cause routing congestion and failure to
or written to the proper place in data memory. meet timing closure. The output is a layout (.def) file with cell
orientations and coordinates. wire width, and spacing according to the technology rules.
The output is a completely connected design, available for
timing and physical verification.

6) Post-Route STA: Following routing, Qflow executes


one more iteration of STA with precise wire delays obtained
from the routed layout. This ensures the design complies
with timing requirements based on actual parasitic effects.
Violations found here might necessitate design modifications
like re-synthesis with alternative constraints, re-placement, or
rerouting.

7) Migration: Migration is the process of adjusting the


design for final layout generation and manufacturing. It entails
translating the routed layout information into tool-compatible
formats such as those for Magic to view and edit layouts fur-
ther. This process confirms that the DEF, LEF, and technology
information are ready for physical verification tools.

Fig. 5. Placement

4) Static Timing Analysis (Pre-route): Prior to routing,


Qflow executes an initial Static Timing Analysis (STA) with
utilities such as vesta or OpenSTA. This is to verify if the
placed design complies with its timing constraints without
accounting for detailed wire delays. STA assists in identifying
critical paths and slack values at early stages, allowing chances
to resubmit placement or synthesis if necessary.
Fig. 7. Final Layout

8) LVS (Layout Versus Schematic): LVS verifies if the


layout (from Magic and routing) agrees with the schematic
(synthesis gate-level netlist). Qflow applies programs such
as Netgen to execute LVS. It validates connectivity, cell
instances, and nets in the layout versus netlist. Successful
LVS ensures that the physical design realizes the desired
function.

9) DRC (Design Rule Check): DRC checks that the layout


adheres to the foundry’s production regulations, for example,
minimum spacing, width, and enclosure rules. Magic is
generally employed in Qflow to run DRC. DRC violations
can cause fabrication defects, so this phase is very important
for ensuring that the layout is physically manufacturable.

Fig. 6. Routing 10) GDS (Final Output): The last step in Qflow is
creating the GDSII file, the mask generation and fabrication
5) Routing: Routing refers to the creation of metal industry standard. Magic will be employed to save the verified
interconnections among the placed cells to finish the circuits. layout to a .gds file. This GDS file has all the geometry and
Qflow employs Qrouter to execute detailed routing from the layer data required for fabrication of the chip and is the final
DEF file placement. The objective is to reduce delay and product to be delivered to the foundry.
prevent design rule violations. It also considers routing layers,
C. Discussion D. Figures and Tables
a) Positioning Figures and Tables: Figure and table
The proper implementation of all RV32I base instructions in
placement complies with IEEE requirements. Example wave-
a single clock cycle highly supports the correctness of the RTL
forms and layouts were inserted to showcase processor func-
(Register Transfer Level) design, especially the control logic
tionality.
and datapath of the processor. As RV32I is the basic instruction
set of the RISC-V architecture, including arithmetic, logical,
load/store, branch, and control instructions, this ensures that
the processor works as expected for a wide variety of opera-
tions. Single-cycle execution means every instruction, whether
data or control, finishes the whole decode, execution, memory,
and write-back in a single clock cycle, requiring very accurate
control signal generation and highly synchronized submodules.
The module-based Verilog design methodology significantly
eased design readability and debugging. With the processor
divided into clearly defined, self-contained submodules like
the Arithmetic Logic Unit (ALU), Control Unit, Register File,
and Data/Instruction Memory, each module’s behavior could
be independently validated by designers. Through modularity, Fig. 8. Modules used in risc-V processor
signal tracing and identification of problems while functional
simulation was made simpler, and bugs and performance The architecture of the processor is partitioned into various
hotspots could be attacked systematically. modules in a modular fashion, each having distinct functions
Simulation was instrumental in verifying the processor. to perform in the instruction cycle. The following is an
Programs such as GTKWave were employed in combination outline of the principal modules used by the single-cycle
with waveform dump files (e.g., .vcd) in order to view signal RISC-V processor:
changes over time. GTKWave permitted the examination of
internal signals such as ALU inputs/outputs, control signals, Program Counter (PC): Program Counter stores the address
register contents, program counter updates, and memory of the instruction being processed. After every instruction, it
access activity. This ensured the processor processed each is incremented (or conditionally modified) to point to the next
instruction properly at every step and showed the effect of instruction. It has an important role in instruction sequencing.
control signals on data paths while executing.
Instruction Memory: It is read-only memory that keeps
Application of Qflow, an open-source digital ASIC design RISC-V instruction set as machine code. When given an
flow, demonstrated that it can definitely be used to synthesize, address by the Program Counter, it delivers the 32-bit
place, route, and generate a layout from the RTL Verilog instruction to be decoded and executed. It contains 32
description of a RISC-V processor. Applications such as general-purpose registers that are 32 bits wide. Two registers
Yosys (synthesis), GrayWolf (placement), Qrouter (routing), are read at one time and one is written in each cycle. The
Netgen (LVS), Magic (DRC and layout visualization), and Register File is used for fast data access and is the core for
OpenSTA (timing analysis) were utilized in combination to instruction execution.
implement a complete ASIC backend flow. This proved that
high-quality digital designs can be constructed completely Arithmetic Logic Unit (ALU): The ALU executes all the
using free and open-source EDA tools, which makes this arithmetic and logical operations like addition, subtraction,
method very useful for academic and prototyping purposes. AND, OR, etc., depending upon the control signals obtained
from the instruction. It is the heart of the processor’s
As a whole, the project successfully applied a full open- computational engine.
source RTL-to-GDSII flow to a single-cycle 32-bit RISC-V
processor. This provides a good basis for future extensions. Control Unit: The Control Unit interprets the opcode of
Possible upgrades include the addition of instruction pipelin- the instruction fetched and produces valid control signals to
ing, where more than one instruction would be executed in synchronize the operation of other modules. These signals
parallel for better throughput. Other future directions could be are like ALU operation selection, memory read/write, register
the introduction of other RISC-V extensions such as RV32M write enable, etc.
for division and multiplication, or interfacing the processor
to external memory modules or peripherals using standard Data Memory: Employed in LW (load word) and SW (store
buses, allowing real-world embedded applications and system word) instructions, the data memory enables the processor
integration. to communicate with temporary storage devices. It provides
support for both read and write operations of 32-bit data. [6.]D. A. Patterson and J. L. Hennessy, Computer
Organization and Design: The Hardware/Software Interface,
Immediate Generator: This module takes immediate values 5th ed., Morgan Kaufmann, 2013.
from instruction fields and sign-extends them to 32 bits. It
provides support for various RISC-V formats like I-type, [7.]M. J. Flynn and W. Luk, Computer System Design:
S-type, and B-type, enabling correct address and operand System-on-Chip, Wiley, 2011.
calculations.
[8.]GTKWave, ”GTKWave - Analyzing simulation wave-
Multiplexers (MUXes): MUXes are used to select between forms,” [Online]. Available: [Link]
multiple data sources in the datapath. They are controlled
by signals from the Control Unit and play a critical role in [9.]T. Becker, ”Qflow: A VLSI Flow Based
routing operands to the ALU and determining destination on Open-Source Tools,” [Online]. Available:
addresses for data. [Link]

IV. ACKNOWLEDGMENT
The authors gratefully acknowledge the support provided
by the Department of Electronics and Communication En-
gineering, Gokaraju Rangaraju Institute of Engineering and
Technology, Hyderabad, India, for the support they received.
This work could not have been achieved without the technical
resources, lab facilities, and academic advice from the depart-
ment. We also appreciate the sincere thanks of the faculty
members for their fruitful suggestions during the design, sim-
ulation, and verification process. Their support and comments
made a big contribution to the successful completion of the
implementation of the RISC-V processor using open-source
tools.
V. R EFERENCES
[1.]L. Poli, S. Saha, X. Zhai and K. D. Mcdonald-Maier,
”Design and Implementation of a RISC V Processor on
FPGA,” 2021 17th International Conference on Mobility,
Sensing and Networking (MSN), Exeter, United Kingdom,
2021.

[2.]M. R. P, P. Niranjan and D. K. M. J, ”Design


and Implementation of 32-bit RISC-V Processor Using
Verilog,” 2024 IEEE International Conference on Distributed
Computing, VLSI, Electrical Circuits and Robotics
(DISCOVER), Mangalore, India, 2024.

[3.]J. -Y. Lai, C. -A. Chen, S. -L. Chen and C. -Y. Su,
”Implement 32-bit RISC-V Architecture Processor using
Verilog HDL,” 2021 International Symposium on Intelligent
Signal Processing and Communication Systems (ISPACS),
Hualien City, Taiwan, 2021.

[4.]A. Waterman and K. Asanović, The RISC-V Instruction


Set Manual, Volume I: User-Level ISA, Version 2.2, RISC-V
Foundation, 2017.

[5.]D. Patterson and K. Asanović, ”Instruction sets should


be free: The case for RISC-V,” EECS Dept., University of
California, Berkeley, Tech. Rep. UCB/EECS-2014-146, 2014.

You might also like