RISC-V SoC System Integration
RISC-V SoC System Integration
Master’s Thesis
written
written by:
by:
Thomas
Thomas Grubelnik,
Grubelnik, BSc
BSc
at
at the
the master’s
master’s degree
degree programme
programme System
System Test
Test Engineering
Engineering
of
of the
the FH
FH JOANNEUM
JOANNEUM –– University
University of
of Applied
Applied Sciences,
Sciences, Austria
Austria
supervised
supervised by:
by:
Patrick
Patrick Lampl,
Lampl, BSc
BSc MSc
MSc
externally
externally supervised
supervised by:
by:
DI
DI Heimo
Heimo Hartlieb
Hartlieb
Graz,
Graz, January
January 17,
17, 2022
2022
Obligatory declaration
I hereby declare that the present Master’s Thesis was composed by myself and that the work
contained herein is my own. I also confirm that I have only used the specified resources. All
formulations and concepts taken verbatim or in substance from printed or unprinted material or
from the Internet have been cited according to the rules of good scientific practice and indicated
by footnotes or other exact references to the original source.
The present thesis has not been submitted to another university for the award of an academic
degree in this form. This thesis has been submitted in printed and electronic form.
I hereby confirm that the content of the digital version is the same as in the printed version.
I understand that the provision of incorrect information may have legal consequences.
The complexity of modern pure hardwired application-specific integrated circuits (ASIC) in-
creases resulting in high effort when it comes to behavioral changes. A System-On-Chip (SoC)
combines a processor core with ASIC components and represents a suitable option to enable
flexibility regarding behavioral changes.
The target of this work is to implement an SoC using Infineon’s code generation framework
MetaGen, which bases upon metamodeling, in combination with common handwritten compo-
nents. MetaRTL adds a metamodel to the MetaGen framework to provide components and
commands to generate register-transfer layer (RTL) code. This includes a processor core imple-
menting the RISC-V instruction set architecture. To enable communication between generated
and handwritten components, multiple bus interfaces were used. Integration of all components
was done using the integration flow of MetaGen.
The final output of this work is a simulatable SoC including a RISC-V RV32IMC 5-stage
CPU. Simulation results are provided to prove that MetaGen with MetaRTL is capable of
providing more complex RTL modules and integrating handwritten components.
Kurzfassung
Thanks to the whole MetaGen and MetaRTL development at Infineon for providing the code
generation framework. Especially I would like to thank Keerthikumara Devarajegowda who
supported me all time in case questions during usage of MetaRTL raised. I also want to thank
Heimo Hartlieb who provided constructive feedback and answering all my questions regarding
MetaGen and DBBL library blocks. Last but not least, I want to thank you Joachim Kahr for
reading through this thesis and giving constructive feedback.
Short Title of your Thesis
Contents
1 Introduction 1
2 Research 3
2.1 RISC-V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 Instruction Set Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.2 Integer Instruction Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.3 Integer Multiplication and Division Instruction Extension . . . . . . . . . 9
2.1.4 Compressed Instruction Extension . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Code Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.1 Metamodeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.2 Metamodel-base Code Automation . . . . . . . . . . . . . . . . . . . . . . 16
2.2.3 Model Driven Architecture for Code Automation . . . . . . . . . . . . . . 17
2.2.4 MetaGen and MetaRTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4 Components 29
4.1 MetaRTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.1.1 RISC-V CPU Core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.1.2 AHB Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.1.3 AHB-to-APB Bridge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.1.4 AHB-to-CSC Bridge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
– XI –
CONTENTS
5 Integration 63
5.1 MetaGen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.2 Integration Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6 Simulation 67
6.1 Test Bench . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.1.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.1.2 Test Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.2.1 CPU Boot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.2.2 Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.2.3 Hardware Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.2.4 Software Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
A Registers 77
A.1 Interrupt Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
A.2 Timer Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
– XII –
CONTENTS
B Simulation Waveforms 99
– XIII –
CONTENTS
– XV –
Short Title of your Thesis
List of Figures
2.1 RV32I base instruction formats with formats showing immediate variants. [1,
Page 16] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Compressed 16-bit RVC instruction formats[1, p. 100, Table 16.1] . . . . . . . . . 10
2.3 UML class diagram showing an example of metamodel and a model instance[2] . 16
2.4 Illustration of a simplified utilization of metamodeling at Infineon[3] . . . . . . . 17
2.5 MDA as Y-Chart[4] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.6 MDA applied to hardware generation[2] . . . . . . . . . . . . . . . . . . . . . . . 18
– XVII –
LIST OF FIGURES
5.1 Block diagram of the System-on-Chip showing internal bus connections and in-
terfaces provided to the outside. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.1 Common testbench top with instantiated DUT connected to two IVCs. . . . . . . 68
6.2 Simulator waveform showing the initialization of the RISC-V CPU and loading
first instruction from the boot ROM. . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.3 Simulator waveform showing the SPI transaction setting the startup location. . . 71
6.4 Simulator waveform showing the selection of startup location and the triggered
jump to the instruction ROM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.5 Simulator waveform showing a simple addition with the signals of the RISC-V
ALU and registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.6 Simulator waveform showing a multiplication using the hardware multiplier added
by the RISC-V "M" extension. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
– XVIII –
LIST OF FIGURES
B.1 Simulator waveform showing the initialization of the RISC-V CPU and loading
first instruction from the boot ROM. . . . . . . . . . . . . . . . . . . . . . . . . . 100
B.2 Simulator waveform showing the selection of start up location and the triggered
jump to the instruction ROM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
B.3 Simulator waveform showing the SPI transaction setting the startup location. . . 102
B.4 Simulator waveform showing a simple addition with the signals of the RISC-V
ALU and registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
B.5 Simulator waveform showing a multiplication using the hardware multiplier added
by the RISC-V "M" extension. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
B.6 Simulator waveform showing a multiplication using a software implementation of
the C math library. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
– XIX –
Short Title of your Thesis
List of Tables
2.1 Integer register-immediate instructions, where all instructions thread the values
as signed numbers expect explicitly mentioned. The LUI and AUIPC are used
for immediate handling of the U-type instruction format. . . . . . . . . . . . . . . 5
2.2 Integer register-register instructions, where all instructions thread the values as
signed numbers expect explicitly mentioned. . . . . . . . . . . . . . . . . . . . . . 6
2.3 Control transfer instructions with unconditional jumps and conditional branches.
For all conditional branches, the address range is limited to ±4 KiB. . . . . . . . 6
2.4 Load and store instructions, where word is defined for 32-bit, half-word for 16-bit
and byte for 8-bit values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.5 Integer multiplication and division instructions excluding the 64-bit specific in-
structions MULW, DIV[U]W and REM[U]W. . . . . . . . . . . . . . . . . . . . . 9
2.6 Common 8 registers accessible by rs1´, rs2´ and rd´ fields in the compressed
instructions formats CIW, CL, CS, CA and CB[1, p. 100, Table 16.2]. . . . . . . 11
2.7 Compressed load and store instructions with the used instruction format. . . . . 11
2.8 Compressed unconditional jump and conditional branch instructions with the
used instruction format. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.9 Integer constant generation instructions with the used instruction format. . . . . 12
2.10 Integer register-immediate instructions with the used instruction format. . . . . . 13
2.11 Integer register-register instructions with the used instruction format. . . . . . . 13
4.1 Base address and address range of the AHB matrix slaves. . . . . . . . . . . . . . 36
4.2 Settings of the MetaRTL specification for the SRAM instance used to store data. 40
– XXI –
LIST OF TABLES
4.3 Settings of the MetaRTL specification for the SRAM instance used to store in-
structions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.4 Settings of the MetaRTL specification for the ROM instance. . . . . . . . . . . . 42
4.5 Signal list for interface tim implemented in the wrapper. All signals have a width
of 1-bit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.6 Signal list of interface SPI_FRAME . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.7 MOSI/MISO frame structure for read and write operations. . . . . . . . . . . . . 52
4.8 Structure of the MISO error code field. CEC contains predefined codes which are
described in table 4.9. The error code field contains the transmitted error code
from the internal bus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.9 List of predefined communication error codes. . . . . . . . . . . . . . . . . . . . . 53
4.10 Signal list of interface tm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.11 Signal list of interface tm_en_ctrl . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.12 Signal list for interface SOCCTRL_IF with a short description providing the
signal source. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.13 Generics of SYSCTRL and its associated signals. . . . . . . . . . . . . . . . . . . 61
– XXII –
Short Title of your Thesis
Listings
– XXIII –
Short Title of your Thesis
Introduction
1
The complexity of modern pure hardwired ASICs requires more and more effort when it comes
to behavioral changes. This increase in effort can be countered by including a processor core
in the ASIC and implementing the behavior in software. Depending on the changes only the
software must be changed. Using a memory that is at least one time writable avoids changes
of the chip masks entirely. In case a Read-Only-Memory (ROM) is used only the ROM mask
must be changed. Nevertheless, the decision of used memory depends on multiple factors like
available space, costs and availability in the chosen production technology. Additionally, to the
above-mentioned behavioral flexibility, bugs in the software can be easily fixed. This decreases
development costs due to the reduction of needed full re-designs, where such a re-design can cost
multiple millions of euros1 depending on the used technology. In case the chip has a multiple
writable memory, like flash, such changes can be done in the field with software updates reducing
the risk for product callbacks.
At Infineon a code generation framework using metamodeling is available. It provides the
possibility to generate complex register transfer level (RTL) code. One of these framework
implementations is a central processing unit (CPU) which is an implementation of the RISC-V
instruction set architecture. As the name, RISC-V states that the processor core is implemented
using the concept of Reduced Instruction Set Computing (RISC). The second common processor
core concept the Complex Instruction Set Computing (CISC) should be mentioned [8].
A processor using the RISC design approach only supports a limited instruction set where
more complex instructions must be described with multiple simple instructions. The simple
instructions can be implemented directly without any overhead. That increases the execution
speed of each instruction. As a downside, the code size for more complex instructions is increased.
CISC processors on the other hand implement a larger instruction set. This processor type
1
[Link]
– 1 –
1 Introduction
supports more and also more complex instructions. All instructions are implemented using mi-
crocode, which is stored in the memory of the CISC processor. Before executing, all instructions
must be loaded and split into smaller instructions causing an overhead that slows down the ex-
ecution speed. Additionally, most of the implemented instructions are barely used what makes
a CISC processor inefficient.
This work focus will be on the license-free RISC-V architecture. Thus, a comparison to other
available RISC architectures like ARM is not conducted.
The overall target of this work is to provide a simulatable System-on-Chip (SoC) combining
generated and manual written components. To increase the development speed the already men-
tioned generator framework is used. In addition, handwritten modules from the digital building
block library (DBBL) are re-used and adapted to suit the application. After the integration of
all components, the focus is shifted to debugging and testing. Therefore, a universal verification
methodology (UMV) test bench is set up. With the simulation, it shall be shown that the
generated CPU using the RISC-V ISA including the extensions "M" and "C" works.
– 2 –
Short Title of your Thesis
Research
2
2.1 RISC-V
The RISC-V instruction set is an open-source instruction set architecture (ISA) developed by
Krste Asanovic, Yunsup Lee and Andre Waterman at UC Berkeley [9]. The architecture is
independent of the used implementation technology and it is separated into a small base integer
instruction set with optional extensions. There are multiple extensions defined by the ISA but
additional ones, outside of the ISA, can be added. The basic integer instruction set is currently
ratified for 32bit and 64bit.
A RISC-V hardware platform [1, Page 2] has no limits in the case of RISC-V-compatible
processing cores and non-RISC-V-compatible components or cores combined in one platform.
The RISC-V Instruction Set Manual [1, Page 2] defines a core when it has its own instruction
fetch unit, where to each core accelerators and co-processors can be added. An accelerator
is defined as a unit that is specialized for a specific task. In the manual [1], a co-processor
is defined as a unit that is mostly sequenced by a RISC-V instruction stream that contains
additional extensions or architectural states.
The base RISC-V integer ISA defined in the RISC-V Instruction Set Manual [1] must be im-
plemented in any RISC-V core. For the basic integer ISA, four different implementations are
currently described in the manual. The already previously mentioned 32-bit/64-bit basic in-
teger ISAs, RV32I and RV64I, the 32-bit subset RV32E and a 128-bit wide basic integer ISA
named RV128I. The RV32E subset is derived from the RV32I basic integer instruction set and
its purpose is to provide support for smaller microcontrollers. This is achieved by decreasing the
number of integer registers by 50%. The RV128I is a full basic integer ISA with an increased
– 3 –
2 Research
address space to 128-bit. The RISC-V Instruction Set Manual in its current state only provides
a sketch for a possible variant of this basic integer ISA.
The implementation of the RISC-V core described in this thesis uses the RV32I basic integer
instruction set with the extensions "M" and "C". Therefore, only this implementation is shortly
described in the following sections.
The RV32I basic integer instruction set [1, Page 13] provides 32 registers with a width of 32-
bit. The registers x1 -x31 can be used to store values. All values for registers x1 -x31 can be
interpreted as unsigned binary integers, two’s complement binary integers or Boolean values.
Register x0 is always zero. Therefore, it can be used when the instruction result can be omitted.
For storing the current instruction an additional register, the so-called program counter pc
is provided. Besides that, it must be mentioned that the RV32I basic integer instruction set [1,
Page 13] provides no other dedicated register, like a dedicated register for the stack pointer.
Four core instruction formats (R/I/S/U) and two additional variants for further handling of
immediates (B/J) are provided. All instructions are fixed to a size of 32-bit with a four-byte
alignment in memory. Therefore, a violation of the four-byte alignment triggers an instruction-
address-misaligned exception.
To simplify decoding the registers rs1, rs2 and rd are kept at the same position. To speed-
up the sign extension, the sign bit for all immediates is located in bit 31. All immediates are
signed except for the 5-bit immediates of CSR instructions. These instructions are described in
Chapter 9 "Zicsr" of the RISC-V Instruction Set Manual [1, Page 55].
Figure 2.1: RV32I base instruction formats with formats showing immediate variants. [1, Page
16]
– 4 –
2.1 RISC-V
The integer computational instructions either use the I-type or R-type format. For register-
immediate operations the I-type format and for register-register operations the R-type format
is used. In both cases, the result is stored in register rd.
For integer arithmetic instructions there is no special support for overflow checks. This
check must be implemented manually using branches. Examples for overflow checking and a
more detailed instruction description can be found in the RISC-V Instruction Set Manual [1,
Page 18].
Table 2.1: Integer register-immediate instructions, where all instructions thread the values as
signed numbers expect explicitly mentioned. The LUI and AUIPC are used for immediate
handling of the U-type instruction format.
Function Short Description
For program flow control, RV32I provides unconditional jumps and conditional branches. Un-
conditional jumps are implemented using the I-type and J-type formats. Conditional branches
only use the B-type format. The target address for jumps and branches must be aligned to
the four-byte boundary. An instruction-address-misaligned exception is triggered when this
boundary is harmed.
– 5 –
2 Research
Table 2.2: Integer register-register instructions, where all instructions thread the values as signed
numbers expect explicitly mentioned.
Function Short Description
Table 2.3: Control transfer instructions with unconditional jumps and conditional branches. For
all conditional branches, the address range is limited to ±4 KiB.
Function Short Description
JAL "Jump and link" implements a relative jump to the current instruction address
in a range of ±1 MiB. The signed offset is encoded in the immediate field. The
rd register stores the instruction address following the jump.
JALR "Jump and link register" implements an absolute jump by adding the immediate
to rs1 with dropping the least-signification bit of the result. The rd register
stores the instruction address following the jump.
BEQ A conditional branch that branches when rs1 equals rs2.
BNE A conditional branch that branches when rs1 unequal rs2.
BLT A conditional branch that branches when rs1 is less than rs2.
BLTU Same as BLT but interpreting the values as unsigned.
BGE Conditional branch that branches when rs1 is greater than or equal rs2.
BGEU Same as BGE but interpreting the values as unsigned.
– 6 –
2.1 RISC-V
For memory operations, load and store instructions are provided. Load instructions are imple-
mented using the I-type, whereas store instructions are using the S-type format. For loading or
storing data from or to memory, the memory address is calculated by adding rs1 to the signed
12-bit offset stored in the immediate field. In case of a load, the read data from the memory
is stored in register rd. For storing, register rs2 holds the data that will be transferred to the
memory.
Table 2.4: Load and store instructions, where word is defined for 32-bit, half-word for 16-bit
and byte for 8-bit values.
Function Short Description
– 7 –
2 Research
Memory ordering instructions implemented using the I-type format are used to order the memory
and I/O accesses seen by other co-processors and RISC-V harts (Thread). Any combination of
memory reads/writes (R/W) and device input/output (I/O) can be ordered. This is needed due
to the used relaxed memory model of the RISC-V ISA. Additional information about the used
memory model can be found in the RISC-V Instruction Set Manual [1, p. 83].
For ordering, the FENCE instruction is implemented, where the immediate field has three
sections. The lowest bits, from bit 0 to bit 3, are used to encode the successor accesses. The next
four bits, from bit 4 to bit 7, encode the predecessor accesses. The highest bits encode the fence
mode field, which can be used to implement different semantics of the FENCE instruction.
For environment calls and breakpoints, the SYSTEM instructions, encoded using the I-type
format, are provided. The functions ECALL and EBREAK are specified, where function
ECALL triggers a services request to the execution environment. Whereas, EBREAK returns
the control to the debugging environment.
HINT Instructions
HINT instructions can be used to progress the pc without changing any architecturally visible
state, such as all computational instructions from section 2.1.2 with the constraint rd = x0. The
no operation instruction (NOP) for example is such an instruction.
– 8 –
2.1 RISC-V
To accelerate integer multiplication and division the "M" extension is specified in the RISC-V
Instruction Set Manual [1, p. 43]. The instructions perform a multiplication or division on two
register values. Therefore, the instructions are encoded using the R-type format.
Table 2.5: Integer multiplication and division instructions excluding the 64-bit specific instruc-
tions MULW, DIV[U]W and REM[U]W.
Function Short Description
MUL Performs a 32-bit x 32-bit multiplication and stores the lower 32-bit of the result
in rd.
MULH Performs the multiplication and stores the upper 32-bit of the result including
the sign bit in rd.
MULHU Performs the multiplication of two unsigned values and stores the upper 32-bit
of the result in rd.
MULHSU Performs the multiplication of a signed and unsigned value. Afterward, it stores
the upper 32-bit of the result including the sign bit in rd.
DIV Performs a 32-bit by 32-bit signed division and stores the result in rd. The result
is rounded towards zero.
REM Provides the remainder of the signed division and stores it in rd.
DIVU Performs a 32-bit by 32-bit unsigned division and stores the result in rd. The
result is rounded towards zero.
REMU Provides the remainder of the unsigned division and stores it in rd.
– 9 –
2 Research
The compressed instruction extension "C" adds the possibility to use 16-bit instructions with
any base instruction set. Therefore, the four-byte boundary is relaxed to a two-bye boundary
resulting that no instruction can raise an instruction-address-misaligned exception anymore.
Besides that, the manual[1, p. 97] specifies the generic term "RVC" for the compressed instruction
extension, which is also used in this thesis. Additionally, the manual[1, p. 97] provides an
estimation of the achieved code-size reduction that is typically around 25%-30% when 50%-60%
of the RISC-V base instructions are replaced.
To achieve the compressed 16-bit version of a common 32-bit instruction RVC follows a
simple compression scheme. First, the size of the immediate field can be reduced resulting in a
smaller possible address offset or an immediate value. Second, one register can be fixed to x0
(zero register), x1 (ABI link register) or x2 (ABI stack pointer). Third, source and destination
registers can be identical and finally the number of selectable registers can be limited to 8.
Additionally, for many RVC instructions the additional constraints imm 6= 0 or rd, rs1, rs2 6= x0
are necessary to free up encoding space for other instructions with fewer operand bits [1, p. 100].
As an example, the compressed no operation ([Link]) has the constraint nzimm 6= 0.
Figure 2.2 shows an overview of the instruction formats used by the RVC instructions.
Whereas, table 2.6 provides the list of the 8 commonly used registers rs1´, rs2´ and rd´. As
for the base instructions to simplify decoding the position of the registers are kept at the same
location.
Figure 2.2: Compressed 16-bit RVC instruction formats[1, p. 100, Table 16.1]
– 10 –
2.1 RISC-V
Table 2.6: Common 8 registers accessible by rs1´, rs2´ and rd´ fields in the compressed instruc-
tions formats CIW, CL, CS, CA and CB[1, p. 100, Table 16.2].
RVC Register Number 000 001 010 011 100 101 110 111
Integer Register Number x8 x9 x10 x11 x12 x13 x14 x15
Integer Register ABI Name s0 s1 a0 a1 a2 a3 a4 a5
For loading and storing data to memory, four RVC instructions (see table 2.7) are available in
combination with an RV32I instruction set. All are using zero-extended immediates to increase
the reachable memory address. The scaling factor depends on the data size, which is 32-bit for
RV32I, which results in the scaling factor of 4.
Table 2.7: Compressed load and store instructions with the used instruction format.
Function Format Short Description
[Link] CI Loads a 32-bit value into rd. The memory address is calculated by
adding the offset stored in the immediate field to the stack pointer
x2.
[Link] CSS Stores content of register rs2 into memory. The memory address is
calculated by adding the offset stored in the immediate field to the
stack pointer x2.
[Link] CL Loads a 32-bit value into rd´. The memory address is calculated by
adding the offset stored in the immediate field to rs1´.
[Link] CS Stores content of register rs2 into memory. The memory address is
calculated by adding the offset stored in the immediate field to rs1´.
To control the program flow RVC provides unconditional jump and conditional branch instruc-
tions, where for both instruction types the offset stored in the immediate field is multiplied by
2.
– 11 –
2 Research
Table 2.8: Compressed unconditional jump and conditional branch instructions with the used
instruction format.
Function Format Short Description
For constant generation and integer arithmetic operations, RVC provides multiple instructions.
Tables 2.9, 2.10 and 2.11 provide a short overview of the available instructions. For a more
comprehensive description including all constraints, a look into section 16.5 of the manual [1, p.
106] is recommended.
Table 2.9: Integer constant generation instructions with the used instruction format.
Function Format Short Description
– 12 –
2.1 RISC-V
Table 2.10: Integer register-immediate instructions with the used instruction format.
Function Format Short Description
Table 2.11: Integer register-register instructions with the used instruction format.
Function Format Short Description
– 13 –
2 Research
HINT Instructions
The behavior of RVC HINT instructions is the same as for RV32I described in section 2.1.2.
As a short reminder, HINT instructions do not modify any architectural visible state. As for
RV32I HINT instructions are implemented as computational instructions, where rd = x0 or rd
is overwritten by itself.
– 14 –
2.2 Code Generation
For increasing productivity in research and development, it is common practice to use code
generation. In this thesis, Infineon’s internal code generation framework called "MetaGen" with
"MetaRTL" is used. The framework is based upon metamodelling and concepts of the Model
Driven Architecture (MDA)[4, 10]. Therefore, in the following sections, a brief overview of the
used model techniques and MDA is provided.
2.2.1 Metamodeling
The term "metamodel" consists out of two words. "Meta" is Greek and translates to after or
beyond. Whereas, a "Model" presents a certain level of abstraction of, for example, a system.
As a result, "metamodel" can be seen as a model that goes beyond a model; a model of a model.
This approach is similar to the class abstraction of object-oriented programming languages.
Figure 2.3 shows the metamodel of an ISA as a Unified Markup Language (UML) class
diagram. The model instance on the right is an instance of the metamodel definition on the left.
In this example, the metamodel defines a component with four attributes. Each attribute has
its own type and multiplicity value. The component can have multiple, at least one, relations
to the "Instruction" class, where the class has its own attributes defined. As seen in figure 2.3
the model instance has only one root node with four instructions related to it. All attributes
are filled with valid values, therefore, this instance meets exactly the constraints set by the
metamodel.
– 15 –
2 Research
Figure 2.3: UML class diagram showing an example of metamodel and a model instance[2]
As previously mentioned Infineon uses metamodel-based code generation for repetitive code to
reduce engineering costs. Figure 2.4 illustrates the workflow and describes the metamodel’s
role in the flow. To enable metamodeling, a metamodel needs to be defined. Therefore, a
metamodel description using a UML modeling tool is generated based on certain requirements
and specifications. Afterwards, a python framework based on the previously defined metamodel
is generated.
For modifying the input specification, the framework provides a graphical user interface
(GUI). There the user can fill the metamodel with data. Afterwards, the specification is passed
to a reader that reads the input specification into the metamodel framework. As a result,
the model is accessible through a Python Application Program Interface (API). For generating
target code, the model is passed to writers or a template engine. However, the template engine is
the most important and commonly used output mechanism. Due to the possibility of providing
different template files, there are barely limitations of the generated target views.
– 16 –
2.2 Code Generation
MDA proposed by the Object Management Group2 (OMG) is an idea to reduce the growing
productivity gap for using models in software design. Therefore, MDA adds additional steps
before the code is generated. Figure 2.5 illustrates the three main models of MDA with the
additional model of the target platform.
• Platform Independent Model (PIM) avoids platform details and is the result of transform-
ing of the CIM adding more details accordingly to the architecture
• Platform Specific Model (PSM) combines PIM and PM. From this view, the final code is
generated.
– 17 –
2 Research
For RTL generation adaptions to the MDA were necessary. Figure 2.6 sketches the enhanced
and adopted MDA for hardware generation by introducing new terms that describe the involved
hardware-related models [11].
• Model of Design (MoD) corresponds to PIM and is the transformed MoT using templates
of design (ToD). A memory subsystem could be a possible example of a MoD.
• Model of View (MoV) corresponds to PSM where platform-specific details are added using
templates of view (ToV). As a result, HDL code can be generated depending on the used
target view models.
– 18 –
2.2 Code Generation
– 19 –
Short Title of your Thesis
3.1 AHB
AHB Lite [5] is a pipelined high-performance bus interface. It is the reduced implementation of
AHB with the main difference that it supports only a single master. Therefore, no arbitration is
needed. In addition, some signals, slave responses and signal behaviors are missing or differing.
A comprehensive list of all differences can be found in the technical reference manual [13, p.
A-3]. The explanation for the bus signals can be taken from the AHB protocol specification [5,
p.2-1].
To achieve high-performance, features like burst transfers, single-clock edge operation, non-
tristate implementation and wide data bus configurations are implemented. AHB Lite is com-
monly used for memory and high bandwidth peripherals. However, low-performance peripherals
can be added by using an AHB-to-APB bridge.
Figure 3.1 sketches a basic AHB read transfer. At the first clock cycle, the source address is
applied on signal HADDR and signal HWRITE is set to zero indicating a read access. During
the following clock cycle, the read data is provided on the signal HRDATA. A write transfer
sketched in figure 3.2 follows the same timing as a read transfer. Only HWRITE is pulled to
high indicating a write access and the data to be written is applied on HWDATA during the
following clock cycle. However, both figures sketch the transfers without wait states. Additional
timing diagrams including diagrams with wait states can be found in chapter "Transfers" of the
– 21 –
3 Bus Interfaces and Protocols
– 22 –
3.2 APB
3.2 APB
APB [6] is an unpipelined low bandwidth bus interface with a focus on reduced complexity. It
supports only a single master and every transfer needs at least two clock cycles. In combination
with a bridge, a communication with high-performance bus interfaces, like AHB, is possible. An
explanation of the bus signals can be taken from the APB protocol specification [6, p. 4-2].
An APB read transfer is sketched in figure 3.3. A high-level PSEL indicates the start of
a transfer. At time T1, the source address is driven on PADDR with PWRITE set to zero
indicating a read access. Earliest a clock cycle later the read data is driven on PRDATA with
PENABLE and PREADY asserted. The transfer finishes by releasing PENABLE. In case of
signal PSEL is kept asserted another transfer is initialized. A write transfer sketched in figure
3.4 follows the same timing as a read transfer. Only PWRITE is pulled to high indicating a
write access and the data to be written is applied on PWDATA. However, both figures sketch
the transfers without wait states. Additional timing diagrams including diagrams with wait
states can be found in chapter "Transfers" of the APB protocol specification [6, p. 2-1].
– 23 –
3 Bus Interfaces and Protocols
3.3 CSC
The CSC bus interface is a simple single-cycle data bus protocol defining the signals from table
3.1, where the signal sizes of addr, rdata and wdata are commonly defined to 32-bit. There
is no size dependency between the address signal addr and the data signals rdata and wdata.
Nevertheless, there is a size dependency between rdata and wdata, where both must be the same
size. CSC works as a single cycle bus, therefore, on every rising clock edge along cs and wen or
ren are set a data transmission happens. For clocking of the bus, no dedicated clock signal is
specified. Thus, the bus is clocked using the module clock.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
clk
rst
cs
ren
wen
Figure 3.5 shows a timing diagram for multiple read and write transfers. The diagram also
includes the module signals clk and rst to illustrate the relation between them and the bus
signals. At tick 2 of the diagram, the reset is released that triggers immediately the first read
access because signals cs and ren are set. For the next two clock cycles address and read data
– 24 –
3.4 HNDSHK
are updated, where every cycle a read access is done. At tick 8 of the timing diagram, ren is
released but wen is set. Therefore, the value of data signal rdata is kept but a change on wdata
can be observed. On each rising edge and along cs and wen are set data from wdata is taken
over by the slave. At tick 14 cs and wen are released and signals addr and wdata are kept at
their last values until new changes due to another transmission on the bus is modifying their
value.
3.4 HNDSHK
The HNDSHK bus interface is a simple data bus protocol with the intention to be used across
power and clock domain boundaries. Therefore, the bus interface is split into control and
non-control signals. To enable cross-domain usage the control signals must be synchronized
to the destination clock domain. Whereas, the address, data and error signals shall be directly
connected. The HNDSHK specification names the source producer and the destination consumer
instead of master and slave. Figure 3.6 sketches a typical setup with the signals described in
table 3.2 including the synchronization flip-flops. However, these flip-flops can be neglected
when producer and consumer are in the same clock domain.
Figure 3.7 shows the timing diagram for two transfers with a minimum timing. First, a write
transfer is sketched, where valid, addr, wdata and write are asserted from the producer. On the
next clock edge, the consumer takes over the data and asserts ack. After another clock cycle, the
producer releases at least valid and write. The transfer is finished when the consumer de-asserts
ack. Before another transfer can be started the signal valid must be at least one producer clock
cycle set to zero.
The second transfer follows the same timing but keeps the write signal de-asserted. However,
– 25 –
3 Bus Interfaces and Protocols
valid FF0
ack
Consumer
Producer
FF1
nack
FF2
addr 16
wdata 16
16 rdata
write
err
Figure 3.6: Cross-domain structure diagram between producer and consumer. The flip-flops
FF1 and FF2 synchronize ack and nack to the producer clock domain. Whereas, flip-flop FF0
synchronizes valid to the consumer clock domain.
the consumer had an error taking over the data and therefore, asserts nack and err. In that case,
the signal rdata contains an error code. The transfer is finished when the consumer de-asserts
nack and err.
ATV-PS Cross Domain Handshake Schema
0 1 2 3 4 5 6 7 8 9 10
clk
rst_n
valid a c
Control
ack b d
nack
Handshake
addr A0 A1
Non-control
wdata D0
rdata e0
write
err
Figure 3.7: HNDSHK timing diagram for valid write and invalid read transfer.
– 26 –
3.4 HNDSHK
valid - Signal generated by the producer indicating the start of a transfer. Be-
tween two transfers this signal must be at least one producer clock cycle
set to zero. This signal must be synchronized from producer to consumer
in case of a clock domain crossing.
ack - Acknowledge signal set by the consumer after successful access. This
signal must be synchronized from consumer to producer in case of a clock
domain crossing.
nack - Not-Acknowledge signal set by the consumer after failed access. This
signal must be synchronized from consumer to producer in case of a clock
domain crossing.
addr 16 Address for read and write operation set by the producer.
rdata 16 Data read by the producer from the consumer.
wdata 16 Data written to the consumer.
write - Direction indication signal set by the producer.
err - Error signal set by the consumer to indicate that rdata contains an error
code.
– 27 –
Short Title of your Thesis
Components
4
For building the digital part of the SoC multiple components taken from two libraries were used.
Both libraries are developed company intern from two independent development teams.
The first library uses MetaRTL and provides multiple components which use metamodeling
for code generation. For all components, the generated code can be configured using a GUI.
Afterward, ready-to-use VHDL and/or Verilog code should be generated which is wrapped for
further usage with MetaGen. Nevertheless, during generation and simulation errors in some
components occurred. In case that happened, a short error description including the current
working solution is stated.
The second library is the digital building blocks library (DBBL). This library contains hand-
written components with partly generated code using MetaGen. MetaGen is used to generate
component entities, register components and the interconnections between multiple components
combining it to a module. All DBBL components and modules are written in SystemVerilog.
4.1 MetaRTL
Overview
MetaRTL provides an implementation of the RISC-V ISA briefly described in chapter 2. After
generation of the CPU, a wrapper for further integration into the SoC is added. Therefore, the
MetaGen flow is used to generate the module entities. Figure 4.1 shows the block diagram of
the wrapped CPU core with all signals and interfaces accessible outside of the wrapper. For
accessing data and instructions the core provides two AHB Lite master interfaces. Both will be
connected to the AHB matrix during the integration. The signals mei_ext_i and mei_ended_o
– 29 –
4 Components
are used to connect the interrupt controller described in section 4.1.9 with the CPU.
illegal_state_o clk_i
mei_ended_o reset_n_i
mei_ext_i
RISC-V
AHB Master Data
CPU Core
Generation
For generation, the metamodel shown in figure 4.2 with the hardware extensions shown in
figure 4.3 is used. To configure the metamodel for generation the GUI provided by MetaGen is
used. The RISC-V core is configured to implement RV32IMC without the custom instructions
Custom32 and CustomMAC. The attributes of the metamodel class "Config" are set to implement
a 5-stage CPU with nested non-vectorized exception handling and a startup address of 32768.
For the hardware extensions, only the "ExceptionUnit" is set to be active, where the attribute
CSR is left untouched. The Exception attribute is configured with the exceptions machine
software interrupt (MSI) and machine timer interrupt (MTI) deactivated.
However, the initial described configuration could not be used due to failures during the code
generation. The solution is to activate the custom instructions Custom32 and CustomMAC.
Additionally, the exceptions MSI and MTI are enabled again.
As a targeted view Verilog was selected. During setting up the simulation a compile error
inside the CPU was thrown. The generated code had two missing ports of the control and status
register instance inside of the exception unit instance. For this problem, a fix was issued. As an
intermediate solution, the missing ports are hardwired to zero.
– 30 –
4.1 MetaRTL
– 31 –
4 Components
Pipeline
As previously mentioned the CPU implements a 5-stage pipeline sketched in figure 4.4 with the
following stages:
• Instruction Bus (IB) stage is commonly known as instruction fetch. In this state, the next
instruction is read from memory.
• Instruction Decode (ID) stage decodes the instruction. The logic checks if the pipeline is
ready to be executed.
• Execute (EX) stage is where the ALU operates. Computation units of extensions, like a
multiplier, are also operating in this stage.
• Writeback (WB) stage is where the instructions write their results to the register files.
instructions
IB ID EX MEM WB
IB ID EX MEM WB
IB ID EX MEM WB
IB ID EX MEM WB
IB ID EX MEM WB
time
Figure 4.4: Five-stage pipeline of the RISC-V CPU[7]. The orange column shows the first
instruction reaching the WB stage.
Additionally, to the generated Verilog code, a schematic of the RISC-V CPU is generated. Figure
4.5 shows the ALU instance inside the CPU, where the supported arithmetic functions can
be graphically seen. Furthermore, MetaRTL offers an easy possibility to compare schematics
with code in different languages. To provide an example also the VHDL code of the RISC-
V CPU is generated, where listing 4.1 shows the Verilog implementation and listing 4.2 the
VHDL implementation. Due to the similar naming of the signals in the code, it is easy to
make a connection to the schematic. Therefore, it is traceable how the arithmetic functions are
represented and implemented in the different languages.
– 32 –
4.1 MetaRTL
Listing 4.1: Verilog code of RISC-V ALU without the result multiplexer.
1 assign SLICE_Outp_s = alu_param_2 [ 4 : 0 ] ;
2 assign SIGNEDCAST00_Outp_s = s i g n e d ( alu_param_1 ) ;
3 assign SIGNEDCAST01_Outp_s = s i g n e d ( alu_param_2 ) ;
4 assign LS_Outp_s = alu_param_1 << SLICE_Outp_s ;
5 assign RSL_Outp_s = alu_param_1 >> SLICE_Outp_s ;
6 assign RSA_Outp_s = s i g n e d ( alu_param_1 ) >>> SLICE_Outp_s ;
7 assign BAND_Outp_s = alu_param_1 & alu_param_2 ;
8 assign BOR_Outp_s = alu_param_1 | alu_param_2 ;
9 assign BXOR_Outp_s = alu_param_1 ^ alu_param_2 ;
10 assign HWPLUS_Outp_s = SIGNEDCAST00_Outp_s + SIGNEDCAST01_Outp_s ;
11 assign HWMINUS_Outp_s = alu_param_1 − alu_param_2 ;
12 assign LT00_Outp_s = alu_param_1 < alu_param_2 ? 3 2 ’ b1 : 3 2 ’ b0 ;
13 assign LT01_Outp_s = SIGNEDCAST00_Outp_s < SIGNEDCAST01_Outp_s ? 3 2 ’ b1 : 3 2 ’ b0 ;
Listing 4.2: VHDL code of RISC-V ALU without the result multiplexer.
1 SLICE_Outp_s <= alu_param_2 ( 4 downto 0 ) ;
2 SIGNEDCAST00_Outp_s <= alu_param_1 ;
3 SIGNEDCAST01_Outp_s <= alu_param_2 ;
4 LS_Outp_s <= s t d _ l o g i c _ v e c t o r ( s h i f t _ l e f t ( u n s i g n e d ( alu_param_1 ) , t o _ i n t e g e r (
u n s i g n e d ( s t d _ l o g i c _ v e c t o r ( s t d _ l o g i c _ v e c t o r ’ ( " " & SLICE_Outp_s ) ) ) ) ) ) ;
5 RSL_Outp_s <= s t d _ l o g i c _ v e c t o r ( s h i f t _ r i g h t ( u n s i g n e d ( alu_param_1 ) , t o _ i n t e g e r
– 33 –
4 Components
– 34 –
4.1 MetaRTL
Overview
The AHB Matrix is an M-to-N AHB Lite connection matrix, where M masters with fixed priority
can access N slaves. The matrix allows parallel access. This means multiple masters can operate
at the same time as long they access different slaves. In case two masters want to access the
same slave a stall on the lower priority master will happen.
Figure 4.6 shows the wrapped matrix with all master and slaves of the current configuration,
where the master for the serial peripheral interface (SPI) has the highest priority followed by
the master for the CPU instruction bus.
clk_i
reset_n_i
Generation
The AHB Matrix is generated using the metamodel sketched in figure 4.7. The configuration
of the metamodel is done using the GUI provided by MetaGen. The attributes for the root
node are kept at default values. The three master interfaces ahb_master_spi, ahb_master_ib
and ahb_master_db are added. Important here is the position in the list because the position
is defining the priority. In the last step the list of slave interfaces is specified, where table 4.1
– 35 –
4 Components
provides the list of slaves with their base address and address range. Based on that information
the matrix is multiplexing the master interfaces to the slaves.
Table 4.1: Base address and address range of the AHB matrix slaves.
Slave Base Address Address Range
– 36 –
4.1 MetaRTL
Overview
For connecting modules from DBBL with the RISC-V and MetaRTL components an AHB-to-
APB bridge is used. The bridge converts an AHB transaction to an APB transaction. The
generation uses the metamodel shown in figure 4.9. As already seen in the metamodel only the
instance name can be modified using the GUI. After generation, a wrapper using MetaGen is
added around the generated code resulting in the block diagram shown in figure 4.8.
However, during setting up the simulation a compilation error inside of the generated code
of the AHB-to-APB bridge was thrown. This is caused by a faulty implementation of the AHB
signal HREADYOUT. The simulator reported that the signals have multiple drivers. Therefore,
as a quick solution, the additional driver was manually removed.
clk_i
reset_n_i
AHB-to-APB
APB Master
Bridge
AHB Slave
– 37 –
4 Components
The AHB-to-CSC bridge metamodel is the same as the metamodel for the AHB-to-APB bridge.
After setting the instance name using the GUI the code can be generated. As for the AHB-to-
APB bridge, a wrapper is added using MetaGen. The AHB-to-CSC bridge is used to connect
CSC register instances which are commonly used by MetaRTL components when they contain
registers or memory.
clk_i
reset_n_i
AHB-to-CSC
CSC Master
Bridge
AHB Slave
The HNDSHK-to-AHB bridge metamodel is the same as the metamodel for the AHB-to-APB
bridge. After setting the instance name using the GUI the code can be generated. As for the
AHB-to-APB bridge, a wrapper is added using MetaGen. This bridge is used to connect the
SPI module provided by DBBL as an AHB master to the AHB matrix.
During testing, a misalignment in the protocol specification of AHB was found. In the
simulation waveform, it could be observed that the AHB signal HTRANS is set incorrectly in
the state machine. The signal is kept too long at high what caused the signal HREADY to be
toggled all time. Therefore, the data written to a certain register was lost. The solution was to
remove the signal HTRANS from states, where it should not be set to 1.
clk_i
reset_n_i
HNDSHK-to-AHB
HNDSHK Slave
Bridge
AHB Master
– 38 –
4.1 MetaRTL
Overview
One type of memory used inside the SoC is static random-access memory (SRAM) for storing
data and instructions. The SoC has multiple instances of SRAM implemented. The generated
code implements the AHB slave interface and connects it to an SRAM IP provided by the used
production technology, where each SRAM IP instance provides space for 2048 words. Providing
more details about the SRAM IP is not possible due to company restrictions.
To use the generated code in the integration flow, another wrapper on top is added. This
needs to be done due to a restriction of the code generator that implements interface signals
without using SV interfaces. However, the wrapper generated by MetaGen defines the SV
interface and inside the wrapper, all separate AHB signals are properly connected. Figure 4.12
shows the block diagram of an SRAM module, where the signals and interfaces of the wrapper
are shown.
clk_i
SRAM reset_n_i
Figure 4.12: Block diagram of the wrapped SRAM for data and instruction storage.
Code Generation
The Metamodel for code generation in figure 4.13 is kept simple and only provides a single class.
Table 4.2 and 4.3 provides the settings used to generate the instance for data and instruction
SRAM. This results in three generated instances of SRAM.
– 39 –
4 Components
Table 4.2: Settings of the MetaRTL specification for the SRAM instance used to store data.
Field Value Short Description
Table 4.3: Settings of the MetaRTL specification for the SRAM instance used to store instruc-
tions.
Field Value Short Description
AhbIRam0
Name Name of the generated component.
AhbIRam1
16384 Base address of the generated component used to
BaseAddress
24576 calculate the relative address.
SizeInBytes 8192 Size of the SRAM in bytes.
[Link]
FileName File loaded at simulation start up.
[Link]
– 40 –
4.1 MetaRTL
4.1.7 ROM
Overview
The second type of memory used inside the SoC is a ROM for storing data and/or instructions
using a metal mask. The ROM component is generated using MetaRTL and it implements a
CSC slave connected to a ROM IP provided by the used production technology. Compared to
the size of the SRAM IP, the ROM IP provides the double amount of capacity per IP instance,
which is 4096 instead of 2048 words. As for the SRAM IP, providing more details about the
ROM IP is not possible due to company restrictions.
For easier integration and reducing the complexity for integrating the module in the SoC, the
wrapper instantiates the generated ROM and an AHB-to-CSC bridge. As a result, the module
already provides the AHB slave interface needed for integration. Figure 4.14 shows the wrapper
with the two components including the connection inside the wrapper and the provided output
signals.
clk_i
reset_n_i
ROM
AHB-to-CSC
CSC Slave
Bridge
ROM IP AHB Slave
The Metamodel for code generation in figure 4.15 is kept simple and only provides a single class.
Table 4.4 provides the settings used to generate the ROM instance.
The currently available implementation of the MetaRTL library needs manual modifications.
First, the generator python script must be updated. During setup, the newest version of the
library is provided by the package manager. However, the function addInterface() of MetaRTL
library metartl_core has changed. Previously, it had a parameter for setting a prefix which is
now removed. Therefore, the function call inside the generation script is updated to also remove
this prefix parameter.
The second modification needed is the replacement of the used IP inside of the generated
code. Used directly after setup including the Python modifications and configuration the gen-
– 41 –
4 Components
Table 4.4: Settings of the MetaRTL specification for the ROM instance.
Field Value Short Description
– 42 –
4.1 MetaRTL
Overview
The third type of memory inside the SoC is the initial boot ROM. This component is generated
using MetaRTL with the same approach as for the normal ROM. However, the ROM IP is
removed and instead a simple memory array is implemented. These changes are not visible
outside of the module and the boot ROM behaves like the normal ROM. To enable further
usage in the MetaGen integration flow a wrapper as shown in figure 4.16 is added. As for the
ROM, the wrapper contains the AHB-to-CSC bridge.
clk_i
reset_n_i
BOOT ROM
AHB-to-CSC
CSC Slave
Bridge
ROM_ARRAY
AHB Slave
Generation
The boot ROM uses the same metamodel as the ROM shown in figure 4.15. Therefore, similar
settings for generation can be used and only the base address and the size must be adopted.
The base address is set to 32768 what is the startup address set during the generation of the
RISC-V CPU. For the boot ROM, the size in bytes is limited to 4096.
Code
As previously mentioned the ROM IP was replaced by a memory array. This array must be
filled with an initial code. Therefore, the C code from listing 4.3 is compiled using the toolchain
for RISC-V. As compilation output representation hex is chosen. Listing 4.4 shows the final hex
file with manual modifications before it is converted to binary. The conversion is done using a
Python script converting hex to binary and slicing it into 8-bit blocks. The results are afterward
copied into the array.
During the simulation setup, to bring the SoC up to at least loop the initial boot code some
issues with the natively compiled code were discovered. First, the initial provided boot code
– 43 –
4 Components
tried to use the data RAM what should be avoided. The C code shown in listing 4.3 already
contains the fix by replacing the variables used in line 12 and 16 by values.
The second issue discovered is due to a delay in the CPU. When a jump instruction is
executed followed by another jump instruction the second instruction is executed 4 clock cycles
delayed. That results in certain cases that the memory array size is too small leading to a
read failure. The MetaRTL development is already working on an improved and speed-up jump
and branch implementation of the RISC-V CPU. However, the current solution is to add NOP
instructions to the code. In listing 4.4 line 5 is the relative jump back to line 25 and to overcome
that the program counter tries to read from a not available memory address the lines 1 to 4 are
added.
– 44 –
4.1 MetaRTL
Listing 4.4: Hard coded boot code in hexadecimal representation stored in Boot ROM
1 00000013
2 00000013
3 00000013
4 00000013
5 fefff06f
6 00000013
7 000047 b7
8 fca79ae3
9 00078067
10 00000013
11 000097 b7
12 00 b79863
13 0107 e7b3
14 00779793
15 00 e 04 803
16 01004783
17 fec71ce3
18 fed70ce3
19 00170713
20 00000713
21 00400513
22 00200593
23 00100613
24 06400693
25 0040006 f
26 00000013
27 00000013
28 00000013
29 00000013
30 00000013
– 45 –
4 Components
Overview
The interrupt controller is a programmable interrupt controller (PIC) that connects multiple
interrupts to a single host device. The generated code provides interrupt control registers ac-
cessible over CSC. Therefore, for further usage in the MetaGen integration flow a wrapper to
instantiate the AHB-to-CSC bridge is implemented. Figure 4.17 shows the block diagram of
the wrapper including all visible signals. To handle the interrupt signals from the components
GPIO, SAR ADC and Timer multiple input signals are implemented. A non-vectorized signal
intr_cpu_o is provided to forward the interrupt requests to the RISC-V CPU. To clear the
pending interrupt request, the signal intr_ended_i is provided by the CPU.
clk_i
reset_n_i
Interrupt AHB-to-CSC
Controller CSC Slave
Bridge
CSC Register AHB Slave
gpio0_i
intr_cpu_o
gpio2_i
gpio1_i
gpio3_i
sar_adc0_i
intr_ended_i
timer1_i
timer0_i
Generation
The interrupt controller is generated using the metamodel in figure 4.18. The configuration of
the metamodel is done using the GUI provided by MetaGen. After setting the name attribute
of the root node the attribute IRQDisable of "Generalconfiguration" is changed to 1. Afterwards
the interrupts for the timer, SAR ADC and GPIO are specified. The two timer interrupts have
the highest priority followed by the SAR ADC interrupt. The lowest priority has the GPIO
interrupts. All generated interrupts are maskable by the bits in register INTERRUPT_CTRL.
– 46 –
4.1 MetaRTL
– 47 –
4 Components
4.1.10 Timer
Overview
The timer module provides two timer instances with optional capture compare units (CCU). As
typically for MetaRTL components the register access is implemented using the CSC interface.
Therefore, to connect the timer to the AHB matrix the AHB-to-CSC bridge is implemented
inside of the wrapper. Additionally, the wrapper is used to combine the timer output signals
from table 4.5 into an interface called tim. To handle timer interrupts to the interrupt controller
the signals timer0_intr_o and timer1_intr_o are provided. A list of all registers can be found
in Appendix A.2.
clk_i
reset_n_i
Timer AHB-to-CSC
CSC Slave
Bridge
CSC Register AHB Slave
timer1_intr_o
timer0_intr_o
tim
Generation
The timer is generated using the metamodel in figure 4.20. For configuration of the metamodel,
the GUI provided by MetaGen is used. At first, a single timer with two channels is specified.
Both channels provide a counter width of 32 bits. For channel 0 only one, whereas, for channel
1 two capture compare units are specified. All attributes for software and hardware access are
set to 1.
– 48 –
4.1 MetaRTL
Table 4.5: Signal list for interface tim implemented in the wrapper. All signals have a width of
1-bit.
Signal Direction Short Description
– 49 –
4 Components
– 50 –
4.2 Digital Building Block Library
For external communication, a serial peripheral interface (SPI) is used. Figure 4.21 sketches the
block diagram of the SPI module. However, the SPI is not directly used as a module but inside
the SoC, the SPI bridge and the SPI frame encoder/decoder are instantiated separately.
The SPI bridge converts the serial data stream from the signal Master-Out-Slave-In (MOSI)
into parallel data. For transmitting data to the master via the Master-In-Slave-Out (MISO)
signal this process is inverted. Additionally, the SPI bridge checks the number of shift bits and
triggers a frame error when the amount of shift bits differs from 32. In case that the received
frame match any of the two defined test mode entry keys a key received trigger is sent over the
according signal in interface tm_en_ctrl. However, the frame is thrown away without being
transmitted to the frame encoder/decoder.
The SPI Frame Encoder/Decoder checks the parallel data provided by the SPI bridge ac-
cordingly to the protocol described in section SPI Protocol. In case the provided data is correct
a HNDSHK bus transmission is triggered.
clk_i
SPI reset_n_i
scan_mode_i
crcmd_i
sys_stat_i
SPI FRAME
SPI Slave SPI BRIDGE HNDSHK Master
Encoder/Decoder
tm_en_ctrl
SPI_FRAME
SPI Protocol
Each SPI transmission consists out of a MOSI and MISO frame both with a fixed size of 32-bit.
Table 4.7 shows the common frame structure for MOSI and MISO. In the current version of the
module two cyclic redundancy check (CRC) modes CRC24 and IGN_CRC are implemented.
The mode CRC24 implements a CRC according to AUTOSAR standard SAE J1850[14] which
calculates the CRC over 24bits of the MOSI frame excluding the CRC field. Whereas, IGN_CRC
skips the CRC calculation. In that case, an extended write address space is available, where the
– 51 –
4 Components
CRC field represents the upper 8bits of the address. For both implementations, a read operation
is indicated by 0xFF in the address field.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
MOSI
CRC ADDRESS / READ INDICATOR WRITE DATA / READ ADDRESS
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
MISO
CRC OFC LFOK SYSTEM STATUS READ DATA / ERROR CODE
Table 4.7: MOSI/MISO frame structure for read and write operations.
The MISO field OFC (One Frame Corrupted) indicates that at least one frame since the
last system reset or OFC bit readout was faulty. Last Frame OK (LFOK) indicates that the
last transmitted MOSI frame was valid. In that case, the data field of MISO contains an error
code. As seen in table 4.8 the upper 3 bits contain a predefined communication error code (short
CEC). The lower bits are containing error codes from the internal bus. The usage of the lower
bits will be indicated by a CEC value of 0x7.
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
CEC ERROR CODE
Table 4.8: Structure of the MISO error code field. CEC contains predefined codes which are
described in table 4.9. The error code field contains the transmitted error code from the internal
bus.
– 52 –
4.2 Digital Building Block Library
– 53 –
4 Components
The test mode control (TMCTRL) provides design for test (DFT) features like registers to
control test mode behavior and limiting access to the chip. One key feature of the module is
providing three test mode levels. Therefore, to enter test mode level 1 two consecutive predefined
keys must be received over SPI. The entry key definition is done in the SPI module and just
the information that a test mode key was received is transmitted to TMCTRL by two signals
of the interface tm_en_ctrl. TMCTRL only enters the test mode when test mode entry key 1
is received before key 2 and both keys are received consecutively. However, as soon as the test
mode is entered the test mode control increases the available address space for read and write
operations over the SPI module by setting the corresponding signals in interface tm_en_ctrl to
their maximal value.
The second defined interface tm with the signals from table 4.10 is used to distribute the
active test mode level and enabled test mode features to other modules. The most commonly
used test mode feature is indicated by the signal scan_mode. It indicates that the device is set
into a state for testing[15].
tm_en_ctrl
APB
clk_i
reset_n_i
TMCTRL
Registers tm
OTP_CTRL
OTP_STAT
Figure 4.22 sketches the TMCTRL with its signals and interfaces also including the inter-
faces OTP_STAT and OTP_CTRL. Both are defined by the One-Time-Programmable (OTP)
memory module. However, the current implementation of the SoC skips implementing the OTP.
Therefore, these interfaces are left unconnected or tied to fixed values.
– 54 –
4.2 Digital Building Block Library
– 55 –
4 Components
tm
– 56 –
4.2 Digital Building Block Library
The General-Purpose Input/Output (GPIO) module provides the control logic for the general
input and outputs of the SoC. The SoC has 32 pins, where pins 0 to 3 provide additional features
by being connected to a multiplexer. A register block accessible via APB provides the registers
for controlling direction, multiplexing and storing/setting of the pin value.
As previously mentioned pins 0 to 3 can be multiplexed to be connected to interrupt inputs
and timer outputs. The register GPO_OUT_MUX0 is used to select the multiplexed signal.
Here it must be mentioned that for interrupt inputs the direction of the pin must be configured
as an input.
To control the direction of all pins registers GP_CTRL0 and GP_CTRL1 are used. De-
pending on the settings the current input of a pin can be read in register GP_IN0 or GP_IN1.
Setting the output value of a pin can be done using registers GP_OUT0 and GP_OUT1.
clk_i
reset_n_i
tm
GPIO tim
gpio
intr0_o
– 57 –
4 Components
The System-On-Chip control (SOCCTRL) module provides SoC-specific register fields for con-
trol and software frame handling. Four 16bit registers store the data for software frame handling,
where two registers are used to store transmission data from the SoC to the SPI master, whereas,
the other two registers store the raw 32bit of the received SPI frame. The interpretation and
consistency check of the data must be done in software. A status register provides information
about the current data status. It includes a flag for new data available and frame error both
provided by the SPI module described in section 4.2.1. For the selection of the boot memory, the
bitfield STARTUP_LOC in register IMEM_CTRL0 is provided. Additional trimming registers
can be used for trimming the PLL clock, bandgap and current reference. A complete list of
available registers can be found in appendix A.6.
The block diagram in figure 4.25 shows all interfaces and sideband signals. The module reg-
isters are connected as APB slaves via APB bus to the AHB-to-APB bridge. All control signals
stated in table 4.12 are bundled into the interface SOCCTRL_IF. For software frame handling
and data status a connection to the SPI module is needed. This connection is implemented
using the SPI_FRAME interface provided by the SPI module.
clk_i
SOCCTRL reset_n_i
SPI_FRAME
– 58 –
4.2 Digital Building Block Library
Table 4.12: Signal list for interface SOCCTRL_IF with a short description providing the signal
source.
Signal Direction Short Description
– 59 –
4 Components
The module system control (SYSCTRL) handles reset and clock signals provided from analog
and scan. Figure 4.26 shows the block diagram with all inputs and outputs. The power-on-
reset (por_n_i) is an input vector whose size depends on a generic. Same is valid for the
output signals en_clk_o, clk_delta_early_o and clk_div_o. Table 4.13 provides an overview
and description of all generics including the signals which sizes depend on generics.
por_n_i en_clk_o
reset_n_i clk_delta_early_o
clk_i clk_div_o
scan_clk_i SYSCTRL sys_reset_n_o
scan_reset_n_i sys_clk_o
tm rst_o
The generation of the internal resets sys_reset_n_o and rst_o is shown in figure 4.27,
where the external asynchronous resets reset_n_i and por_n_i are synchronized using a two-
stage synchronizer resulting in the synchronized signals reset_n_s and por_n_s. Both signals
can be masked when the chip is in test mode (see register TM_CTRL).
por_n_s
tm.ignore_por
tm.scan_mode
reset_n_s 0
sys_reset_n_o
scan_reset_n_i 1
tm.ignore_reset
rst_o
SYSCTRL provides multiple clock sources to the system. The signal sys_clk_o provides
the fastest clock because it is a pass-through of the input signal clk_i. Whereas, the signal
clk_div_o in combination with en_clk_o provides clocks speeds depending on the set value of
the generic clk_div_cnt_g. The output clock frequency can be calculated as
clk_i
clk_div_o[index] = (4.1)
2index+1
where index is a number between 0 and clk_div_cnt_g-1.
– 60 –
4.2 Digital Building Block Library
– 61 –
Short Title of your Thesis
Integration
5
5.1 MetaGen
The integration of all components described in chapter 4 is done using MetaGen. The specifica-
tion for connecting multiple MetaGen components and modules is using a clear-text file. Listing
5.1 provides an example of two MetaGen component instances and interface connections. This
file is used as input for the code generation resulting in SystemVerilog code. Listing 5.2 contains
a code snippet of the generated code using the previously described specification.
To use this integration flow it was necessary to wrap all generated components to match
the MetaGen flow for integration. For manual written modules, like SOCCTRL, no wrapping is
needed. Here the MetaGen flow is already followed.
Listing 5.1: Code snippet of the MetaGen specification file for the SoC digital top.
1 component a h b _ m a t r i x IFX : ATVPTS : AHB_MATRIX_RISCV_TC : 1 0 0 ;
2 component ahb2apb0 IFX : ATVPTS : AHB2APB : 1 1 0 ;
3
4 //AHBMATRIX SLAVES
5 c o n n e c t i o n ahbmatrix_s_apb0_con = a h b _ m a t r i x . ahb_s_apb0 , ahb2apb0 . ahb_s ;
6 //AHBMATRIX MASTER
7 c o n n e c t i o n ahbmatrix_m_spi_con = a h b _ m a t r i x . ahb_m_spi , h n d s h k 2 a h b . ahb_m ;
Listing 5.2: Code snippet of the SoC digital top SystemVerilog file generated using Meta-
Gen
1 // I n t e r f a c e I n s t a n c e s
2 a h b _ s l a v e _ i f d ahbmatrix_s_apb0_con ( ) ;
3
4 d_ahb_matrix_riscv_tc
5 inst_ahb_matrix
– 63 –
5 Integration
6 (
7 . reset_n_i ( sys_reset_n_con_s ) ,
8 . c l k _ i ( sys_clk_con_s ) ,
9 . ahb_s_apb0 ( ahbmatrix_s_apb0_con . M i r r o r e d S l a v e ) ,
10 . ahb_s_timer ( ahbmatrix_s_timer_con . M i r r o r e d S l a v e ) ,
11 . ahb_s_intr ( ahbmatrix_s_interrupt_con . MirroredSlave ) ,
12 . ahb_s_apb1 ( ahbmatrix_s_apb1_con . M i r r o r e d S l a v e ) ,
13 . ahb_s_data_ram ( ahbmatrix_s_dram_con . M i r r o r e d S l a v e ) ,
14 . ahb_s_instr_ram ( ahbmatrix_s_iram_con . M i r r o r e d S l a v e ) ,
15 . ahb_s_brom ( ahbmatrix_s_brom_con . M i r r o r e d S l a v e ) ,
16 . ahb_s_rom ( ahbmatrix_s_rom_con . M i r r o r e d S l a v e ) ,
17 . ahb_s_otp ( a h b m a t r i x _ s _ o t p _ c o n . M i r r o r e d S l a v e ) ,
18 . ahb_s_ecc ( a h b m a t r i x _ s _ e c c _ c o n . M i r r o r e d S l a v e ) ,
19 . ahb_m_spi ( ahbmatrix_m_spi_con . M i r r o r e d M a s t e r ) ,
20 . ahb_m_imem ( a h b m a t r i x _ m _ i n s t r _ c o n . M i r r o r e d M a s t e r ) ,
21 . ahb_m_dram ( ahbmatrix_m_data_con . M i r r o r e d M a s t e r )
22 );
23
24 d_ahb2apb
25 inst_ahb2apb0
26 (
27 . reset_n_i ( sys_reset_n_con_s ) ,
28 . c l k _ i ( sys_clk_con_s ) ,
29 . ahb_s ( ahbmatrix_s_apb0_con . S l a v e ) ,
30 . apb_m ( apb0_bus_con . M a s t e r )
31 );
The AHB matrix provides more AHB slaves as depicted in figure 5.1. Nevertheless, unconnected
AHB slaves cause compilation errors during simulation. Therefore, all unused AHB slaves are
connected to dummies which are tying the signals to defined values. The same is valid for unused
input signals which are tied to constant signals inside the MetaGen specification.
– 64 –
5.2 Integration Notes
AHB
AHB DRAM
SYSCTRL RISC-V
Core
AHB
AHB IRAM0
AHB IRAM1
HNDSHK
SPI SPI HNDSHK to AHB
AHB
AHB ROM
AHB Matrix
Boot
AHB
ROM
GPIO GPIO
AHB TIMER
ANALOG SAR
ADC
AHB
APB to AHB
APB
AHB Interrupt
TMCTRL
SOCCTRL
Figure 5.1: Block diagram of the System-on-Chip showing internal bus connections and interfaces
provided to the outside.
– 65 –
Short Title of your Thesis
Simulation
6
For simulation, a Universal Verification Methodology [16] (UVM) testbench written in Sys-
temVerilog [18] is used. However, an overview of UVM testbenches implemented using MetaGen
is not part of this thesis. As simulation software Cadence Xcelium Logic Simulator [19] version
18.09.009 with SimVision as GUI is used. At Infineon, the simulator setup is provided by the
development flow.
6.1.1 Setup
The test bench is generated using MetaGen and instantiates the following modules and interface
verification components (IVC):
• Device-Under-Test (DUT)
• SPI IVC
• SYSCTRL IVC
• GPIO IVC
The SoC is the DUT that is connected to all IVCs as sketched in figure 6.1. Due to different
naming conventions between DUT and IVCs, all signals must be manually wired. The SPI IVC
is configured to emulate a connected master. It can be used to write and read registers and
download a program to the instruction RAM. For checking the input and output pins the GPIO
IVC is used. This IVC is fully generated using MetaGen by setting enabling the sideband mode
of the generator. As a result, the generated IVC provides generic set and get functions for each
– 67 –
6 Simulation
signal. To provide a clock and reset signals the SYSCTRL IVC is instantiated and connected
to the DUT. The generated clock has a randomized period between 68ns and 74ns with a duty
cycle of 50%.
TB Top
Figure 6.1: Common testbench top with instantiated DUT connected to two IVCs.
For simulating Software, only one UVM test case is needed. The test case called "test_sandbox"
is generated using MetaGen. Listing 6.1 shows the executed code of the test case. At first, the
clock is enabled followed by releasing the reset signals reset_n_i and por_n_i. After a waiting
time of 200ns, the startup location is chosen. Therefore, a write to register IMEM_CTRL0
selecting the instruction ROM as used memory for the startup is done. Another 10us are waited
before the register IMEM_CTRL0 is read two times. After another 10ms of waiting time, the
test case finishes and the simulation is stopped.
– 68 –
6.2 Results
17 s p i _ a d d r == ’ he ;
18 s p i _ d a t a == ’ h2 ;
19 })
20 #10u s ;
21 ‘uvm_do_on_with ( s p i _ r e a d _ s e q ,
22 p_sequencer . spi_if_i_data_agent_sequencer ,
23 {
24 s p i _ a d d r == ’ he ;
25 })
26 ‘uvm_do_on_with ( s p i _ r e a d _ s e q ,
27 p_sequencer . spi_if_i_data_agent_sequencer ,
28 {
29 s p i _ a d d r == ’ he ;
30 })
31 #10ms ;
6.2 Results
All results are generated using the previously mentioned test case test_sandbox. The simulator
is started with the test case and the seed for the randomization is fixed to 0. This leads to a
randomized clock frequency of 14,482MHz. Additionally, the instruction ROM is filled at startup
with a binary file holding the two instructions for addition and multiplication. Due to the size
of the waveform figures in the following sub-sections, a landscape version of each is provided in
Appendix B.
In figure 6.2 the start of the simulation is shown. At the waveform marker 9, the reset signal
reset_n_i of the SoC is released leading to invalid data on the instruction bus signal HRDATA.
During the first start of the simulation, this bug was discovered. To overcome the propagation
of faulty instruction data the reset signal rst_pin of the CPU is delayed by one clock cycle. The
release of the CPU reset is marked with the cursor TimeA. At this point, the CPU starts to
fetch data over the instruction bus to fill the pipeline.
Comparing the instruction bus signals HADDR and HRDATA with the snippet of the boot
code from listing 6.2 it can be observed that a delay of one clock cycle between data and address
is present. This delay is caused in the boot ROM module by the AHB-to-CSC bridge. However,
a solution to overcome this additional delay would be the direct use of an AHB instance in the
boot ROM.
– 69 –
6 Simulation
Figure 6.2: Simulator waveform showing the initialization of the RISC-V CPU and loading first
instruction from the boot ROM.
Listing 6.2: Initial part of the boot code shown in hexadecimal with disassembler infor-
mation
1 801C : 00100613 l i a2 , 1
2 8018: 06400693 l i a3 , 1 0 0
3 8014: 0040006 f j 8018 <main>
4 8010: 00000013 nop
5 800C : 00000013 nop
6 8008: 00000013 nop
7 8004: 00000013 nop
8 8000: 00000013 nop
Figure 6.4 shows the jump to instruction ROM after bit field STARTUP_LOC in register
IMEM_CTRL0 is set to 1. The SPI transmission with the according bus transaction is shown
in figure 6.3. Due to an SPI clock frequency of 5MHz in combination with an SPI transmission
size of 32 bits, it is not possible to show both in one figure. Nevertheless, marker 1 shows when
the written data is taken over by the register.
Marker 4 in figure 6.4 indicates the data AHB transmission start triggered by the CPU to
read the register IMEM_CTRL0. After the register access is done, marked by marker 5, the
CPU stores the read value into register x15. At marker 7 the CPU changes the instruction
address signal ib_addr to the start of the instruction ROM located at 0x00009000. At the next
clock cycle the stall of instruction decode and execute can be observed. After another four clock
cycles, the pipeline is again fully utilized.
– 70 –
6.2 Results
Figure 6.3: Simulator waveform showing the SPI transaction setting the startup location.
Figure 6.4: Simulator waveform showing the selection of startup location and the triggered jump
to the instruction ROM.
– 71 –
6 Simulation
6.2.2 Addition
As previously described the CPU performs the jump to the first address of the instruction
ROM, which is already pre-filled with two instructions. Whereas, the first instruction encodes
an addition, which shall add register x11 to register x12 and stores it to register x30.
During "instruction decode" the register values are read from the internal registers. Marker
13 in figure 6.5 marks the readout in the waveform. At the next clock cycle, the ALU performs
the addition. Finally, the data is stored during "write back", which is marked by TimeA.
Figure 6.5: Simulator waveform showing a simple addition with the signals of the RISC-V ALU
and registers.
The second instruction encodes a multiplication of register x10 and x11 storing the result to x29
using the hardware multiplier. Figure 6.6 shows the waveform with the executed multiplication.
However, two problems with the loaded ROM file showed up.
First, the location of the second instruction is expected to be 0x00009004. During the
simulation, it showed up that the second instruction is loaded at 0x0000A008. Nevertheless,
this is not a simulation-breaking issue but needs further investigation to find the root cause.
Second, the ROM misses the load instructions for register x10 and x11. Therefore, a force
of the registers to x10 = 0x2222 and x11 = 0x5 is applied.
At marker 18, "instruction decode" reads the registers. One clock cycle later, during "ex-
ecute", the hardware multiplier is utilized and performs the multiplication. Finally, two clock
cycles later the result 0x0000AAAA is stored to register x29.
– 72 –
6.2 Results
Figure 6.6: Simulator waveform showing a multiplication using the hardware multiplier added
by the RISC-V "M" extension.
For this simulation, an implementation of the C math library is used. Listing 6.3 provides a
code snippet of the C math library which shows the multiplication function mult_uint16(). The
code running in the simulation performs 8738 · 5 = 0xAAAA. Figure 6.7 shows the simulator
waveform with the instructions needed to just perform the multiplication. Marker 1 marks the
entry of the provided code snippet, whereas, marker Baseline marks the jump back to the main
routine.
After simulating both possible implementations and having a look at their waveforms. It can
be immediately seen that the hardware multiplier performs better. Nevertheless, a count of the
clock cycles between entering and leaving of the multiplication function was done with the result
of 64 cycles. Comparing that with the 6 clock cycles needed by the hardware multiplier this
results in a factor of 10,6 times faster calculation when any produced overhead of the software
multiplication is neglected.
– 73 –
6 Simulation
Listing 6.3: Code snipped of the C math library showing the 16bit unsigned integer
multiplication.
1 00004354 <m u l t _ u i n t 1 6 > ( F i l e O f f s e t : 0 x4354 ) :
2 mult_uint16 () :
3 4 3 5 4 : 00050793 mv a5 , a0
4 4 3 5 8 : 00000513 li a0 , 0
5 435 c : 00079463 b n e z a5 , 4 3 6 4 <m u l t _ u i n t 1 6+0x10> ( F i l e O f f s e t : 0
x4364 )
6 4 3 6 0 : 00008067 ret
7 4 3 6 4 : 0017 f 7 1 3 a n d i a4 , a5 , 1
8 4 3 6 8 : 00070463 b e q z a4 , 4 3 7 0 <m u l t _ u i n t 1 6+0x1c> ( F i l e O f f s e t : 0
x4370 )
9 436 c : 00 b50533 add a0 , a0 , a1
10 4 3 7 0 : 0017 d793 srli a5 , a5 , 0 x1
11 4 3 7 4 : 00159593 slli a1 , a1 , 0 x1
12 4378: f e 5 f f 0 6 f j 435 c <m u l t _ u i n t 1 6+0x8> ( F i l e O f f s e t : 0 x 4 3 5 c )
– 74 –
Short Title of your Thesis
This work aimed to develop a System-On-Chip using Infineon’s code generation framework which
bases upon metamodeling. Research on the RISC-V ISA and code generation was done for a
common understanding of the used technologies and methodologies.
All necessary components were generated using MetaGen and MetaRTL and combined with
handwritten modules from DBBL to form an SoC. This SoC was instantiated into a UVM
testbench and properly connected to IVCs. Finally, a simulator setup including one test case
was implemented. The simulation was used to debug the SoC and provide simulations waveforms
showing the CPU operation.
As targeted by this thesis, it could be proven that MetaGen with MetaRTL is capable of
providing more complex RTL modules with optional extensions and configurations. In addition,
its flexibility regarding combining generated code with handwritten code was shown. The final
outcome of this work is a simulatable SoC including a RISC-V RV32IMC 5-stage CPU.
7.2 Outlook
The SoC is working but some issues must be addressed in the future. First, the versioning of
MetaRTL needs to be adopted to check frequently that all modules can be generated using the
default setup. As already mentioned in chapter 4, versions of some dependencies changed and
the setup was using wrong or outdated versions causing generation failures. In that case, the
solution could be adopting the setup to only use a certain version of the used dependencies or
to update the module to support the newest version. The previous mentioned frequently check
– 75 –
7 Conclusion and Outlook
3
[Link]
– 76 –
Short Title of your Thesis
Registers
A
A.1 Interrupt Registers
INTERRUPT_CTRL
31 16
EN
0_
ER N
EN
E
EN
EN
EN
EN
1_
0_
D
D
3_
2_
1_
0_
ER
SE
_
O
O
U
IM
IM
PI
PI
PI
PI
N
SA
G
T
U
15 7 6 5 4 3 2 1 0
– 77 –
A Registers
TIM0CTRLSTAT
31 16
-
D
D
n
tE
SE
SE
En
In
U
im
vf
N
N
O
U
T
15 8 7 6 1 0
- rw-(0) - rw-(0)
TIM0ACTVAL
31 16
rw-(0x0000)
L
VA
T
C
A
15 0
rw-(0x0000)
– 78 –
A.2 Timer Registers
TIM0MAXVAL
L
VA
X
A
M
31 16
rw-(0x0000)
L
VA
X
A
M
15 0
rw-(0x0000)
TIM0CCUCTRL0
31 16
-
D
0
SE
IM
0
M
U
ap
N
C
U
15 6 5 3 2 0
- rw-(0x0) rw-(0x0)
– 79 –
A Registers
TIM0CCUVAL0
31 16
rw-(0x0000)
L0
VA
U
C
C
15 0
rw-(0x0000)
TIM1CTRLSTAT
– 80 –
A.2 Timer Registers
D
SE
U
N
U
31 16
D
n
tE
SE
SE
En
In
U
im
vf
N
N
O
U
T
15 8 7 6 1 0
- rw-(0) - rw-(0)
TIM1ACTVAL
31 16
rw-(0x0000)
L
VA
T
C
A
15 0
rw-(0x0000)
TIM1MAXVAL
– 81 –
A Registers
L
VA
X
A
M
31 16
rw-(0x0000)
L
VA
X
A
M
15 0
rw-(0x0000)
TIM1CCUCTRL0
31 16
-
D
0
SE
IM
0
M
U
ap
N
C
U
15 6 5 3 2 0
- rw-(0x0) rw-(0x0)
– 82 –
A.2 Timer Registers
TIM1CCUVAL0
31 16
rw-(0x0000)
L0
VA
U
C
C
15 0
rw-(0x0000)
TIM1CCUCTRL1
– 83 –
A Registers
D
SE
U
N
U
31 16
1
SE
M
1
M
I
U
ap
N
C
U
C
15 6 5 3 2 0
- rw-(0x0) rw-(0x0)
TIM1CCUVAL1
– 84 –
A.3 Test Mode Control Registers
L1
VA
U
C
C
31 16
rw-(0x0000)
L1
VA
U
C
C
15 0
rw-(0x0000)
TM_MODES
S
U
U
AT
AT
AT
ST
ST
ST
_
_
3
1
E
EL
EL
EL
D
O
V
D
D
M
LE
LE
LE
SE
SE
_
_
C
U
M
R
N
N
C
T
U
15 10 9 8 7 6 5 4 3 2 1 0
CRC_MODE CRC mode selection bits. Select the used CRC algo-
rithm in the host communication interface module.
• 0x0 - CRC24
• 0x1 - CRC40
• 0x2 - NO_CRC
• 0x3 - IGN_CRC
TM_LEVEL3_STATUS Read only. Bits for indicating test mode level 3 status.
• 0x0 - Undefined
• 0x1 - Disabled
• 0x2 - Enabled
• 0x3 - Undefined
– 85 –
A Registers
TM_LEVEL2_STATUS Read only. Bits for indicating test mode level 2 status.
• 0x0 - Undefined
• 0x1 - Disabled
• 0x2 - Enabled
• 0x3 - Undefined
TM_LEVEL1_STATUS Read only. Bits for indicating test mode level 1 status.
• 0x0 - Undefined
• 0x1 - Disabled
• 0x2 - Enabled
• 0x3 - Undefined
TM_LEVEL2_KEY
15 0
rw-(0x0000)
KEY_VALUE Test mode level 2 activation key. A valid key activates level 2. An
invalid key leaves level 2 and all higher active test mode levels.
TM_LEVEL3_KEY
15 0
rw-(0x0000)
KEY_VALUE Test mode level 3 activation key. A valid key activates level 3. An
invalid key leaves level 3.
– 86 –
A.3 Test Mode Control Registers
TM_EXIT_ALL_KEY
E
LU
VA
_
EY
K
15 0
rw-(0x0000)
KEY_VALUE Test mode exit all key. A valid key deactivates the test mode.
TM_OTP_STAT
G
R
IN
O
R
R
R
N
O
ER
D
R
R
D
A
ER
D
W
A
IN
SE
R
_
_
W
G
IL
C
U
IR
V
EC
EC
N
FA
O
U
15 8 7 4 3 2 1 0
– 87 –
A Registers
TM_SCAN_CTRL
N
_
K
C
LO
E_
IT
EN
EN
R
D
_
_
SE
P_
N
Q
U
A
D
T
N
SC
ID
O
U
15 5 4 3 2 1 0
TM_CTRL
ES
PO
R
E_
E_
D
D
R
R
SE
SE
O
O
U
U
N
N
N
N
IG
IG
U
15 10 9 8 7 6 5 0
- rw-(0x1) rw-(0x1) -
– 88 –
A.4 SAR ADC Registers
IGNORE_POR Enables the "Ignore Power On Reset" function. This bit disables
the power on reset for testing purpose.
• 0x0 - Invalid
• 0x1 - Disabled
• 0x2 - Enabled
• 0x3 - Invalid
IGNORE_RES Enables the "Ignore Reset" function. This bit disables the reset
for testing purpose.
• 0x0 - Invalid
• 0x1 - Disabled
• 0x2 - Enabled
• 0x3 - Invalid
TM_TRIM_CTRL
K
C
LO
N
U
D
_
SE
IM
U
R
N
T
U
15 1 0
- rw-(0)
TRIM_UNLOCK Unlocks trimming registers. The trimming bits are locked after
first write, therefore, for further modifications this unlock must
be set.
ADC_CTRL
– 89 –
A Registers
FG
FG
C
C
L
_
S_
A
D
C
SE
L
L
ER
A
A
A
U
C
SC
C
V
R
N
ST
EP
ST
O
D
U
15 10 9 8 7 6 5 4 3 2 1 0
ADC_COMP_CONF
L
O
EN
VA
N
D
P_
P_
P_
SE
M
M
U
O
O
N
C
C
U
15 14 13 12 0
DYN_SAMPLING_CTRL
Y
SE
D
_
U
EN
N
U
15 4 3 0
- rw-(0x0)
EN_DYN_SAMPLING
SAMPLE_CTRL1_X
– 90 –
A.5 GPIO Registers
EX
D
IN
R
_
N
ER
H
C
D
D
T
S_
SE
SE
IS
EA
EG
U
U
N
M
R
U
U
15 13 12 8 7 5 4 0
- rw-(0x00) - rw-(0x00)
SAMPLE_CTRL2_X
EX
D
IN
_
ER
IN
D
G
SE
IG
U
R
N
LV
T
U
15 7 5 4 0
- rw-(0) rw-(0x00)
RESULTS_X
G
SE
E
LU
IG
U
R
N
VA
T
U
15 14 12 11 0
- r-(0x0) r-(0x000)
GP_IN0
– 91 –
A Registers
A
AT
D
15 0
r-(0x0000)
DATA Holds input value of gpio pins 0 to 15 when selected as input in GP_CTRL0
GP_IN1
15 0
r-(0x0000)
DATA Holds input value of gpio pins 16 to 31 when selected as input in GP_CTRL1
GP_OUT0
15 0
rw-(0x0000)
GP_OUT1
15 0
rw-(0x0000)
– 92 –
A.5 GPIO Registers
GP_CTRL0
EN
P_
T
U
O
15 0
rw-(0x0000)
OUTP_EN Data flow of gpio pins 0 to 15. A one indicates that the gpio is used
as output.
GP_CTRL1
15 0
rw-(0x0000)
OUTP_EN Data flow of gpio pins 16 to 31. A one indicates that the gpio is used
as output.
GPO_OUT_MUX0
ue
ue
ue
l
l
D
va
va
va
va
SE
3_
2_
_
L1
L0
U
L
N
SE
SE
SE
SE
U
15 8 7 6 5 4 3 2 1 0
– 93 –
A Registers
FRAME_RECEIVED_LO
15 0
r-(0)
DATA Read only. Contains lower 16bit received data from the last SPI transaction.
FRAME_RECEIVED_HI
– 94 –
A.6 SOCCTRL Registers
A
AT
D
15 0
r-(0)
DATA Read only. Contains upper 16bit received data from the last SPI transaction.
FRAME_TRANSMIT_LO
A
AT
D
15 0
w-(0)
DATA Write only. Contains lower 16bit transmit data for the next SPI transaction.
FRAME_TRANSMIT_HI
15 0
w-(0)
DATA Write only. Contains upper 16bit transmit data for the next SPI transaction.
FRAME_STATUS
R
EX
O
R
IL
_
ER
EC
A
AV
E_
D
D
_
_
SE
SE
A
FT
AT
U
FR
N
N
SO
D
U
15 9 8 7 2 1 0
– 95 –
A Registers
IMEM_CTRL0
EN
EN
S_
_
ES
D
EA
C
C
O
A
R
Y
L
_
D
P_
B
EA
H
H
D
U
A
R
SE
RT
_
_
M
M
U
A
O
A
N
ST
R
R
U
15 5 4 3 2 1 0
PLL_CTRL0
– 96 –
A.6 SOCCTRL Registers
LE
IV
IV
B
D
A
_
EN
N
R
15 7 6 1 0
GEN_CTRL0
EN
IM
S_
IM
_
R
PD
D
T
IA
R
SE
SE
T
_
B
EF
_
U
U
LL
_
G
N
N
IO
IR
A
B
U
15 10 9 8 7 6 3 2 0
– 97 –
Short Title of your Thesis
– 98 –
Short Title of your Thesis
Simulation Waveforms
B
– 99 –
B Simulation Waveforms
Figure B.1: Simulator waveform showing the initialization of the RISC-V CPU and loading first instruction from the boot ROM.
– 100 –
Figure B.2: Simulator waveform showing the selection of start up location and the triggered jump to the instruction ROM.
– 101 –
A.6 SOCCTRL Registers
Figure B.3: Simulator waveform showing the SPI transaction setting the startup location.
B Simulation Waveforms
– 102 –
Figure B.4: Simulator waveform showing a simple addition with the signals of the RISC-V ALU and registers.
– 103 –
A.6 SOCCTRL Registers
Figure B.5: Simulator waveform showing a multiplication using the hardware multiplier added by the RISC-V "M" extension.
B Simulation Waveforms
– 104 –
Figure B.6: Simulator waveform showing a multiplication using a software implementation of the C math library.
– 105 –
A.6 SOCCTRL Registers
Short Title of your Thesis
Bibliography
[1] A. K. SiFive Inc. (2019, Dec.) The RISC-V Instruction Set Manual Volume I:
Unprivileged ISA. [Last visit 29.10.2021]. [Online]. Available: [Link]
riscv-isa-manual/releases/download/Ratified-IMAFDQC/[Link]
[3] J. Schreiner, “Automated generation of pipelined risc cpus following themodel-driven ar-
chitecture principle,” Master’s thesis, TU München, 2016.
[4] L. Research. What is mda? why concerns bpmn? [last visited 15.12.2021]. [Online].
Available: [Link]
[5] ARM Limited, AMBA3 AHB-Lite Protocol Specification, ARM Std., [Last vis-
ited 22.11.2021]. [Online]. Available: [Link]
5f914801f86e16515cdc2a27?token=
[6] ——, AMBA3 APB Protocol Specification, ARM Std., [Last visited 22.11.2021]. [Online].
Available: [Link]
token=
[7] Wikipedia. Classic risc pipeline. [last visited 17.12.2021]. [Online]. Available: https:
//[Link]/wiki/Classic_RISC_pipeline
[8] K. Crystal Chen, Greg Novick. risc vs. cisc. [Last visit 22.11.2021]. [Online]. Available:
[Link]
[9] RISC-V International, “History of risc-v,” 2021, [Last visit 29.10.2021]. [Online]. Available:
[Link]
– 107 –
BIBLIOGRAPHY
[10] F. Truyen. (2006) The basics of model driven architecture. [last visited 15.12.2021]. [Online].
Available: [Link]
[12] ARM Limited. Amba3 overview. [Last visited 22.11.2021]. [Online]. Available: https:
//[Link]/architectures/system-architectures/amba
[13] ——. Amba design kit technical reference manual. [last visited 15.12.2021]. [Online]. Avail-
able: [Link]
[14] AUTOSAR CRC Routines, AUTOSAR Std., [Last access on 12.01.2021]. [On-
line]. Available: [Link]
AUTOSAR_SWS_CRCLibrary.pdf
[15] S. Engineering. Scan test. [last visited 16.12.2021]. [Online]. Available: https:
//[Link]/knowledge_centers/test/scan-test-2/
[17] T. Grubelnik, “Verification of an interface module using the universal verification method-
ology,” 2018.
[18] C. Spear, G. Tumbush, SystemVerilog for Verification - A Guide to Learning the Tesbench
Language Features, third edition ed. Chris Spear, Synopsys, Inc., Marlborough, MA, USA
Greg Tumbush, University of Colorado, Colorado Springs, Colorado Springs, CO ,USA:
Springer, 2012.
[19] Cadence. Xcelium logic simulator. [last visited 15.12.21]. [Online]. Avail-
able: [Link]
simulation-and-testbench-verification/[Link]
– 108 –
BIBLIOGRAPHY
[20] A. I. Inc. (2011) Ieee p1687 internal jtag (ijtag) tutorial. [last visited 15.12.2021]. [On-
line]. Available: [Link]
[Link]
– 109 –