0% found this document useful (0 votes)
11 views133 pages

RISC-V SoC System Integration

Uploaded by

Trường Xuân
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views133 pages

RISC-V SoC System Integration

Uploaded by

Trường Xuân
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

RISC-V SoC System Integration

using Metamodeling Methodology

Master’s Thesis

written
written by:
by:
Thomas
Thomas Grubelnik,
Grubelnik, BSc
BSc

at
at the
the master’s
master’s degree
degree programme
programme System
System Test
Test Engineering
Engineering
of
of the
the FH
FH JOANNEUM
JOANNEUM –– University
University of
of Applied
Applied Sciences,
Sciences, Austria
Austria

supervised
supervised by:
by:
Patrick
Patrick Lampl,
Lampl, BSc
BSc MSc
MSc

externally
externally supervised
supervised by:
by:
DI
DI Heimo
Heimo Hartlieb
Hartlieb

Graz,
Graz, January
January 17,
17, 2022
2022
Obligatory declaration

I hereby declare that the present Master’s Thesis was composed by myself and that the work
contained herein is my own. I also confirm that I have only used the specified resources. All
formulations and concepts taken verbatim or in substance from printed or unprinted material or
from the Internet have been cited according to the rules of good scientific practice and indicated
by footnotes or other exact references to the original source.
The present thesis has not been submitted to another university for the award of an academic
degree in this form. This thesis has been submitted in printed and electronic form.
I hereby confirm that the content of the digital version is the same as in the printed version.
I understand that the provision of incorrect information may have legal consequences.

(Place, Date) (Signature)


Abstract

The complexity of modern pure hardwired application-specific integrated circuits (ASIC) in-
creases resulting in high effort when it comes to behavioral changes. A System-On-Chip (SoC)
combines a processor core with ASIC components and represents a suitable option to enable
flexibility regarding behavioral changes.
The target of this work is to implement an SoC using Infineon’s code generation framework
MetaGen, which bases upon metamodeling, in combination with common handwritten compo-
nents. MetaRTL adds a metamodel to the MetaGen framework to provide components and
commands to generate register-transfer layer (RTL) code. This includes a processor core imple-
menting the RISC-V instruction set architecture. To enable communication between generated
and handwritten components, multiple bus interfaces were used. Integration of all components
was done using the integration flow of MetaGen.
The final output of this work is a simulatable SoC including a RISC-V RV32IMC 5-stage
CPU. Simulation results are provided to prove that MetaGen with MetaRTL is capable of
providing more complex RTL modules and integrating handwritten components.
Kurzfassung

Die Komplexität moderner, festverdrahteter anwendungsspezifischer integrierter Schaltungen


(englisch application-specific integrated circuit, ASIC) nimmt zu, was zu einem hohen Aufwand
bei Verhaltensänderungen führt. Ein Ein-Chip-System (englisch System-On-Chip, SoC) kom-
biniert einen Prozessorkern mit ASIC-Komponenten und stellt eine geeignete Option dar, um
Flexibilität bei Verhaltensänderungen zu ermöglichen.
Ziel dieser Arbeit ist die Implementierung eines SoCs unter Verwendung von Infineons Code-
generierungsframework MetaGen, welches auf Metamodellierung basiert, in Kombination mit
gängigen handgeschriebenen Komponenten. MetaRTL fügt dem MetaGen-Framework ein Meta-
modell hinzu, um Komponenten und Befehle zur Generierung von Code der Registertrans-
ferebene (englisch Register Transfer Layer, RTL) bereitzustellen. Dazu gehört ein Prozes-
sorkern, welcher die RISC-V-Befehlssatz-architektur implementiert. Um die Kommunikation
zwischen generierten und handgeschriebenen Komponenten zu ermöglichen, wurden mehrere
Busschnittstellen verwendet. Die Integration aller Komponenten erfolgte mit Hilfe des Integra-
tionsflusses von MetaGen.
Das Endergebnis dieser Arbeit ist ein simulierbarer SoC mit einer RISC-V RV32IMC 5-
stufigen CPU. Die Simulationsergebnisse zeigen, dass MetaGen mit MetaRTL in der Lage ist,
komplexere RTL-Module bereitzustellen und handgeschriebene Komponenten zu integrieren.
Acknowledgments

Thanks to the whole MetaGen and MetaRTL development at Infineon for providing the code
generation framework. Especially I would like to thank Keerthikumara Devarajegowda who
supported me all time in case questions during usage of MetaRTL raised. I also want to thank
Heimo Hartlieb who provided constructive feedback and answering all my questions regarding
MetaGen and DBBL library blocks. Last but not least, I want to thank you Joachim Kahr for
reading through this thesis and giving constructive feedback.
Short Title of your Thesis

Contents

1 Introduction 1

2 Research 3
2.1 RISC-V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 Instruction Set Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.2 Integer Instruction Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.3 Integer Multiplication and Division Instruction Extension . . . . . . . . . 9
2.1.4 Compressed Instruction Extension . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Code Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.1 Metamodeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.2 Metamodel-base Code Automation . . . . . . . . . . . . . . . . . . . . . . 16
2.2.3 Model Driven Architecture for Code Automation . . . . . . . . . . . . . . 17
2.2.4 MetaGen and MetaRTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3 Bus Interfaces and Protocols 21


3.1 AHB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 APB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3 CSC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.4 HNDSHK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4 Components 29
4.1 MetaRTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.1.1 RISC-V CPU Core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.1.2 AHB Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.1.3 AHB-to-APB Bridge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.1.4 AHB-to-CSC Bridge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

– XI –
CONTENTS

4.1.5 HNDSHK-to-AHB Bridge . . . . . . . . . . . . . . . . . . . . . . . . . . . 38


4.1.6 Data and Instruction SRAM . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.1.7 ROM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.1.8 Boot ROM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.1.9 Interrupt Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.1.10 Timer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.2 Digital Building Block Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2.1 Serial Peripheral Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2.2 Test Mode Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2.3 Successive-Approximation-Register Analog-to-Digital Converter . . . . . . 56
4.2.4 General Purpose Input/Output . . . . . . . . . . . . . . . . . . . . . . . . 57
4.2.5 System-On-Chip Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2.6 System Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5 Integration 63
5.1 MetaGen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.2 Integration Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

6 Simulation 67
6.1 Test Bench . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.1.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.1.2 Test Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.2.1 CPU Boot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.2.2 Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.2.3 Hardware Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.2.4 Software Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

7 Conclusion and Outlook 75


7.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
7.2 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

A Registers 77
A.1 Interrupt Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
A.2 Timer Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

– XII –
CONTENTS

A.3 Test Mode Control Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85


A.4 SAR ADC Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
A.5 GPIO Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
A.6 SOCCTRL Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

B Simulation Waveforms 99

– XIII –
CONTENTS

– XV –
Short Title of your Thesis

List of Figures

2.1 RV32I base instruction formats with formats showing immediate variants. [1,
Page 16] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Compressed 16-bit RVC instruction formats[1, p. 100, Table 16.1] . . . . . . . . . 10
2.3 UML class diagram showing an example of metamodel and a model instance[2] . 16
2.4 Illustration of a simplified utilization of metamodeling at Infineon[3] . . . . . . . 17
2.5 MDA as Y-Chart[4] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.6 MDA applied to hardware generation[2] . . . . . . . . . . . . . . . . . . . . . . . 18

3.1 AHB Lite basic read transfer[5, p. 3-2] . . . . . . . . . . . . . . . . . . . . . . . . 22


3.2 AHB Lite basic write transfer[5, p. 3-2] . . . . . . . . . . . . . . . . . . . . . . . 22
3.3 APB read transfer[6, p. 2-4] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4 APB write transfer[6, p. 2-2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.5 CSC timing diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.6 Cross-domain structure diagram between producer and consumer. The flip-flops
FF1 and FF2 synchronize ack and nack to the producer clock domain. Whereas,
flip-flop FF0 synchronizes valid to the consumer clock domain. . . . . . . . . . . 26
3.7 HNDSHK timing diagram for valid write and invalid read transfer. . . . . . . . . 26

4.1 Block diagram of the wrapped CPU. . . . . . . . . . . . . . . . . . . . . . . . . . 30


4.2 Metamodel of the RISC-V core . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.3 Metamodel of the hardware extensions . . . . . . . . . . . . . . . . . . . . . . . . 31
4.4 Five-stage pipeline of the RISC-V CPU[7]. The orange column shows the first
instruction reaching the WB stage. . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.5 Generated schematic of the RISC-V ALU . . . . . . . . . . . . . . . . . . . . . . 33
4.6 Block diagram of the AHB matrix wrapper. . . . . . . . . . . . . . . . . . . . . . 35
4.7 Metamodel of the AHB matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

– XVII –
LIST OF FIGURES

4.8 Block diagram of the AHB-to-APB bridge wrapper. . . . . . . . . . . . . . . . . 37


4.9 Metamodel of the AHB-to-APB bridge . . . . . . . . . . . . . . . . . . . . . . . . 37
4.10 Block diagram of the AHB-to-CSC bridge wrapper. . . . . . . . . . . . . . . . . . 38
4.11 Block diagram of the HNDSHK-to-AHB bridge wrapper. . . . . . . . . . . . . . . 38
4.12 Block diagram of the wrapped SRAM for data and instruction storage. . . . . . . 39
4.13 Metamodel of the SRAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.14 Block diagram of the wrapped ROM. . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.15 Metamodel of the ROM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.16 Block diagram of the wrapped Boot ROM. . . . . . . . . . . . . . . . . . . . . . . 43
4.17 Block diagram of the wrapped interrupt controller. . . . . . . . . . . . . . . . . . 46
4.18 Metamodel of the interrupt controller . . . . . . . . . . . . . . . . . . . . . . . . 47
4.19 Block diagram of the wrapped timer. . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.20 Metamodel of the timer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.21 Block diagram of the serial peripheral interface module. . . . . . . . . . . . . . . 51
4.22 Block diagram of test mode control. . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.23 Block diagram of the SAR ADC. . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.24 Block diagram of the GPIO module. . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.25 Structural overview of the SOCCTRL module. . . . . . . . . . . . . . . . . . . . 58
4.26 Simple block diagram of SYSCTRL. . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.27 Logic circuit for output signal sys_reset_n_o. . . . . . . . . . . . . . . . . . . . 60

5.1 Block diagram of the System-on-Chip showing internal bus connections and in-
terfaces provided to the outside. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

6.1 Common testbench top with instantiated DUT connected to two IVCs. . . . . . . 68
6.2 Simulator waveform showing the initialization of the RISC-V CPU and loading
first instruction from the boot ROM. . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.3 Simulator waveform showing the SPI transaction setting the startup location. . . 71
6.4 Simulator waveform showing the selection of startup location and the triggered
jump to the instruction ROM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.5 Simulator waveform showing a simple addition with the signals of the RISC-V
ALU and registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.6 Simulator waveform showing a multiplication using the hardware multiplier added
by the RISC-V "M" extension. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

– XVIII –
LIST OF FIGURES

6.7 Simulator waveform showing a multiplication using a software implementation of


the C math library. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

B.1 Simulator waveform showing the initialization of the RISC-V CPU and loading
first instruction from the boot ROM. . . . . . . . . . . . . . . . . . . . . . . . . . 100
B.2 Simulator waveform showing the selection of start up location and the triggered
jump to the instruction ROM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
B.3 Simulator waveform showing the SPI transaction setting the startup location. . . 102
B.4 Simulator waveform showing a simple addition with the signals of the RISC-V
ALU and registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
B.5 Simulator waveform showing a multiplication using the hardware multiplier added
by the RISC-V "M" extension. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
B.6 Simulator waveform showing a multiplication using a software implementation of
the C math library. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

– XIX –
Short Title of your Thesis

List of Tables

2.1 Integer register-immediate instructions, where all instructions thread the values
as signed numbers expect explicitly mentioned. The LUI and AUIPC are used
for immediate handling of the U-type instruction format. . . . . . . . . . . . . . . 5
2.2 Integer register-register instructions, where all instructions thread the values as
signed numbers expect explicitly mentioned. . . . . . . . . . . . . . . . . . . . . . 6
2.3 Control transfer instructions with unconditional jumps and conditional branches.
For all conditional branches, the address range is limited to ±4 KiB. . . . . . . . 6
2.4 Load and store instructions, where word is defined for 32-bit, half-word for 16-bit
and byte for 8-bit values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.5 Integer multiplication and division instructions excluding the 64-bit specific in-
structions MULW, DIV[U]W and REM[U]W. . . . . . . . . . . . . . . . . . . . . 9
2.6 Common 8 registers accessible by rs1´, rs2´ and rd´ fields in the compressed
instructions formats CIW, CL, CS, CA and CB[1, p. 100, Table 16.2]. . . . . . . 11
2.7 Compressed load and store instructions with the used instruction format. . . . . 11
2.8 Compressed unconditional jump and conditional branch instructions with the
used instruction format. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.9 Integer constant generation instructions with the used instruction format. . . . . 12
2.10 Integer register-immediate instructions with the used instruction format. . . . . . 13
2.11 Integer register-register instructions with the used instruction format. . . . . . . 13

3.1 CSC bus interface signals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25


3.2 HNDSHK bus interface signals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.1 Base address and address range of the AHB matrix slaves. . . . . . . . . . . . . . 36
4.2 Settings of the MetaRTL specification for the SRAM instance used to store data. 40

– XXI –
LIST OF TABLES

4.3 Settings of the MetaRTL specification for the SRAM instance used to store in-
structions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.4 Settings of the MetaRTL specification for the ROM instance. . . . . . . . . . . . 42
4.5 Signal list for interface tim implemented in the wrapper. All signals have a width
of 1-bit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.6 Signal list of interface SPI_FRAME . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.7 MOSI/MISO frame structure for read and write operations. . . . . . . . . . . . . 52
4.8 Structure of the MISO error code field. CEC contains predefined codes which are
described in table 4.9. The error code field contains the transmitted error code
from the internal bus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.9 List of predefined communication error codes. . . . . . . . . . . . . . . . . . . . . 53
4.10 Signal list of interface tm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.11 Signal list of interface tm_en_ctrl . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.12 Signal list for interface SOCCTRL_IF with a short description providing the
signal source. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.13 Generics of SYSCTRL and its associated signals. . . . . . . . . . . . . . . . . . . 61

– XXII –
Short Title of your Thesis

Listings

4.1 Verilog code of RISC-V ALU without the result multiplexer. . . . . . . . . . . . . 33


4.2 VHDL code of RISC-V ALU without the result multiplexer. . . . . . . . . . . . . 33
4.3 C code for selecting startup memory . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.4 Hard coded boot code in hexadecimal representation stored in Boot ROM . . . . . 45
5.1 Code snippet of the MetaGen specification file for the SoC digital top. . . . . . . 63
5.2 Code snippet of the SoC digital top SystemVerilog file generated using MetaGen 63
6.1 Code of test case sandbox. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.2 Initial part of the boot code shown in hexadecimal with disassembler information 70
6.3 Code snipped of the C math library showing the 16bit unsigned integer multipli-
cation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

– XXIII –
Short Title of your Thesis

Introduction
1
The complexity of modern pure hardwired ASICs requires more and more effort when it comes
to behavioral changes. This increase in effort can be countered by including a processor core
in the ASIC and implementing the behavior in software. Depending on the changes only the
software must be changed. Using a memory that is at least one time writable avoids changes
of the chip masks entirely. In case a Read-Only-Memory (ROM) is used only the ROM mask
must be changed. Nevertheless, the decision of used memory depends on multiple factors like
available space, costs and availability in the chosen production technology. Additionally, to the
above-mentioned behavioral flexibility, bugs in the software can be easily fixed. This decreases
development costs due to the reduction of needed full re-designs, where such a re-design can cost
multiple millions of euros1 depending on the used technology. In case the chip has a multiple
writable memory, like flash, such changes can be done in the field with software updates reducing
the risk for product callbacks.
At Infineon a code generation framework using metamodeling is available. It provides the
possibility to generate complex register transfer level (RTL) code. One of these framework
implementations is a central processing unit (CPU) which is an implementation of the RISC-V
instruction set architecture. As the name, RISC-V states that the processor core is implemented
using the concept of Reduced Instruction Set Computing (RISC). The second common processor
core concept the Complex Instruction Set Computing (CISC) should be mentioned [8].
A processor using the RISC design approach only supports a limited instruction set where
more complex instructions must be described with multiple simple instructions. The simple
instructions can be implemented directly without any overhead. That increases the execution
speed of each instruction. As a downside, the code size for more complex instructions is increased.
CISC processors on the other hand implement a larger instruction set. This processor type
1
[Link]

– 1 –
1 Introduction

supports more and also more complex instructions. All instructions are implemented using mi-
crocode, which is stored in the memory of the CISC processor. Before executing, all instructions
must be loaded and split into smaller instructions causing an overhead that slows down the ex-
ecution speed. Additionally, most of the implemented instructions are barely used what makes
a CISC processor inefficient.
This work focus will be on the license-free RISC-V architecture. Thus, a comparison to other
available RISC architectures like ARM is not conducted.
The overall target of this work is to provide a simulatable System-on-Chip (SoC) combining
generated and manual written components. To increase the development speed the already men-
tioned generator framework is used. In addition, handwritten modules from the digital building
block library (DBBL) are re-used and adapted to suit the application. After the integration of
all components, the focus is shifted to debugging and testing. Therefore, a universal verification
methodology (UMV) test bench is set up. With the simulation, it shall be shown that the
generated CPU using the RISC-V ISA including the extensions "M" and "C" works.

– 2 –
Short Title of your Thesis

Research
2
2.1 RISC-V

The RISC-V instruction set is an open-source instruction set architecture (ISA) developed by
Krste Asanovic, Yunsup Lee and Andre Waterman at UC Berkeley [9]. The architecture is
independent of the used implementation technology and it is separated into a small base integer
instruction set with optional extensions. There are multiple extensions defined by the ISA but
additional ones, outside of the ISA, can be added. The basic integer instruction set is currently
ratified for 32bit and 64bit.
A RISC-V hardware platform [1, Page 2] has no limits in the case of RISC-V-compatible
processing cores and non-RISC-V-compatible components or cores combined in one platform.
The RISC-V Instruction Set Manual [1, Page 2] defines a core when it has its own instruction
fetch unit, where to each core accelerators and co-processors can be added. An accelerator
is defined as a unit that is specialized for a specific task. In the manual [1], a co-processor
is defined as a unit that is mostly sequenced by a RISC-V instruction stream that contains
additional extensions or architectural states.

2.1.1 Instruction Set Overview

The base RISC-V integer ISA defined in the RISC-V Instruction Set Manual [1] must be im-
plemented in any RISC-V core. For the basic integer ISA, four different implementations are
currently described in the manual. The already previously mentioned 32-bit/64-bit basic in-
teger ISAs, RV32I and RV64I, the 32-bit subset RV32E and a 128-bit wide basic integer ISA
named RV128I. The RV32E subset is derived from the RV32I basic integer instruction set and
its purpose is to provide support for smaller microcontrollers. This is achieved by decreasing the
number of integer registers by 50%. The RV128I is a full basic integer ISA with an increased

– 3 –
2 Research

address space to 128-bit. The RISC-V Instruction Set Manual in its current state only provides
a sketch for a possible variant of this basic integer ISA.
The implementation of the RISC-V core described in this thesis uses the RV32I basic integer
instruction set with the extensions "M" and "C". Therefore, only this implementation is shortly
described in the following sections.

2.1.2 Integer Instruction Set

The RV32I basic integer instruction set [1, Page 13] provides 32 registers with a width of 32-
bit. The registers x1 -x31 can be used to store values. All values for registers x1 -x31 can be
interpreted as unsigned binary integers, two’s complement binary integers or Boolean values.
Register x0 is always zero. Therefore, it can be used when the instruction result can be omitted.
For storing the current instruction an additional register, the so-called program counter pc
is provided. Besides that, it must be mentioned that the RV32I basic integer instruction set [1,
Page 13] provides no other dedicated register, like a dedicated register for the stack pointer.
Four core instruction formats (R/I/S/U) and two additional variants for further handling of
immediates (B/J) are provided. All instructions are fixed to a size of 32-bit with a four-byte
alignment in memory. Therefore, a violation of the four-byte alignment triggers an instruction-
address-misaligned exception.
To simplify decoding the registers rs1, rs2 and rd are kept at the same position. To speed-
up the sign extension, the sign bit for all immediates is located in bit 31. All immediates are
signed except for the 5-bit immediates of CSR instructions. These instructions are described in
Chapter 9 "Zicsr" of the RISC-V Instruction Set Manual [1, Page 55].

Figure 2.1: RV32I base instruction formats with formats showing immediate variants. [1, Page
16]

– 4 –
2.1 RISC-V

Integer Computational Instructions

The integer computational instructions either use the I-type or R-type format. For register-
immediate operations the I-type format and for register-register operations the R-type format
is used. In both cases, the result is stored in register rd.
For integer arithmetic instructions there is no special support for overflow checks. This
check must be implemented manually using branches. Examples for overflow checking and a
more detailed instruction description can be found in the RISC-V Instruction Set Manual [1,
Page 18].

Table 2.1: Integer register-immediate instructions, where all instructions thread the values as
signed numbers expect explicitly mentioned. The LUI and AUIPC are used for immediate
handling of the U-type instruction format.
Function Short Description

ADDI Adds the 12-bit immediate to rs1.


SLTI rd is set to 1 if rs1 is less than 12-bit immediate.
SLTIU Same as SLTI but with treatening values as unsigned numbers.
ANDI, ORI, XORI Bitwise logical operations on rs1 and 12-bit immediate.
SLLI Logical left shift shifting zeros into lower bits.
SRLI Logical right shift shifting zeros into upper bits.
SRAI An arithmetic right shift that keeps the original sign.
LUI Load upper immediate. Upper 20-bit of rd filled with immediate and
lowest 12-bit filled with zeros.
AUIPC Adds AUIPC instruction address to 32-bit offset and store in rd. Upper
20-bit of offset filled with immediate and lowest 12-bit filled with zeros.

Control Transfer Instructions

For program flow control, RV32I provides unconditional jumps and conditional branches. Un-
conditional jumps are implemented using the I-type and J-type formats. Conditional branches
only use the B-type format. The target address for jumps and branches must be aligned to
the four-byte boundary. An instruction-address-misaligned exception is triggered when this
boundary is harmed.

– 5 –
2 Research

Table 2.2: Integer register-register instructions, where all instructions thread the values as signed
numbers expect explicitly mentioned.
Function Short Description

ADD Adds rs1 to rs2.


SLT rd is set to 1 if rs1 is less than rs2.
SLTU Same as SLT but with treatening values as unsigned numbers.
AND, OR, XOR Bitwise logical operations.
SLL Logical left shift shifting zeros into lower bits on rs1. Shift amount held
in lower 5-bits of rs2.
SRL Logical right shift shifting zeros into upper bits on rs1. Shift amount held
in lower 5-bits of rs2.
SUB Subtracts rs2 from rs1.
SRA An arithmetic right shift that keeps the original sign on rs1.

Table 2.3: Control transfer instructions with unconditional jumps and conditional branches. For
all conditional branches, the address range is limited to ±4 KiB.
Function Short Description

JAL "Jump and link" implements a relative jump to the current instruction address
in a range of ±1 MiB. The signed offset is encoded in the immediate field. The
rd register stores the instruction address following the jump.
JALR "Jump and link register" implements an absolute jump by adding the immediate
to rs1 with dropping the least-signification bit of the result. The rd register
stores the instruction address following the jump.
BEQ A conditional branch that branches when rs1 equals rs2.
BNE A conditional branch that branches when rs1 unequal rs2.
BLT A conditional branch that branches when rs1 is less than rs2.
BLTU Same as BLT but interpreting the values as unsigned.
BGE Conditional branch that branches when rs1 is greater than or equal rs2.
BGEU Same as BGE but interpreting the values as unsigned.

– 6 –
2.1 RISC-V

Load and Store Instructions

For memory operations, load and store instructions are provided. Load instructions are imple-
mented using the I-type, whereas store instructions are using the S-type format. For loading or
storing data from or to memory, the memory address is calculated by adding rs1 to the signed
12-bit offset stored in the immediate field. In case of a load, the read data from the memory
is stored in register rd. For storing, register rs2 holds the data that will be transferred to the
memory.

Table 2.4: Load and store instructions, where word is defined for 32-bit, half-word for 16-bit
and byte for 8-bit values.
Function Short Description

LW Loads a word from memory that is stored in register rd.


LH Loads a half-word from memory that is expanded to 32-bit with sign bit set and
stored in register rd.
LHU Loads a half-word from memory that is expanded to 32-bit without sign bit set
and stored in register rd.
LB Loads a byte from memory that is expanded to 32-bit with sign bit set and stored
in register rd.
LBU Loads a byte from memory that is expanded to 32-bit without sign bit set and
stored in register rd.
SW Stores all 32-bit of register rs2 to memory.
SH Stores the lower 16-bit of register rs2 to memory.
SB Stores the lower 8-bit of register rs2 to memory.

– 7 –
2 Research

Memory Ordering Instructions

Memory ordering instructions implemented using the I-type format are used to order the memory
and I/O accesses seen by other co-processors and RISC-V harts (Thread). Any combination of
memory reads/writes (R/W) and device input/output (I/O) can be ordered. This is needed due
to the used relaxed memory model of the RISC-V ISA. Additional information about the used
memory model can be found in the RISC-V Instruction Set Manual [1, p. 83].
For ordering, the FENCE instruction is implemented, where the immediate field has three
sections. The lowest bits, from bit 0 to bit 3, are used to encode the successor accesses. The next
four bits, from bit 4 to bit 7, encode the predecessor accesses. The highest bits encode the fence
mode field, which can be used to implement different semantics of the FENCE instruction.

Environment Call and Breakpoints

For environment calls and breakpoints, the SYSTEM instructions, encoded using the I-type
format, are provided. The functions ECALL and EBREAK are specified, where function
ECALL triggers a services request to the execution environment. Whereas, EBREAK returns
the control to the debugging environment.

HINT Instructions

HINT instructions can be used to progress the pc without changing any architecturally visible
state, such as all computational instructions from section 2.1.2 with the constraint rd = x0. The
no operation instruction (NOP) for example is such an instruction.

– 8 –
2.1 RISC-V

2.1.3 Integer Multiplication and Division Instruction Extension

To accelerate integer multiplication and division the "M" extension is specified in the RISC-V
Instruction Set Manual [1, p. 43]. The instructions perform a multiplication or division on two
register values. Therefore, the instructions are encoded using the R-type format.

Table 2.5: Integer multiplication and division instructions excluding the 64-bit specific instruc-
tions MULW, DIV[U]W and REM[U]W.
Function Short Description

MUL Performs a 32-bit x 32-bit multiplication and stores the lower 32-bit of the result
in rd.
MULH Performs the multiplication and stores the upper 32-bit of the result including
the sign bit in rd.
MULHU Performs the multiplication of two unsigned values and stores the upper 32-bit
of the result in rd.
MULHSU Performs the multiplication of a signed and unsigned value. Afterward, it stores
the upper 32-bit of the result including the sign bit in rd.
DIV Performs a 32-bit by 32-bit signed division and stores the result in rd. The result
is rounded towards zero.
REM Provides the remainder of the signed division and stores it in rd.
DIVU Performs a 32-bit by 32-bit unsigned division and stores the result in rd. The
result is rounded towards zero.
REMU Provides the remainder of the unsigned division and stores it in rd.

– 9 –
2 Research

2.1.4 Compressed Instruction Extension

The compressed instruction extension "C" adds the possibility to use 16-bit instructions with
any base instruction set. Therefore, the four-byte boundary is relaxed to a two-bye boundary
resulting that no instruction can raise an instruction-address-misaligned exception anymore.
Besides that, the manual[1, p. 97] specifies the generic term "RVC" for the compressed instruction
extension, which is also used in this thesis. Additionally, the manual[1, p. 97] provides an
estimation of the achieved code-size reduction that is typically around 25%-30% when 50%-60%
of the RISC-V base instructions are replaced.
To achieve the compressed 16-bit version of a common 32-bit instruction RVC follows a
simple compression scheme. First, the size of the immediate field can be reduced resulting in a
smaller possible address offset or an immediate value. Second, one register can be fixed to x0
(zero register), x1 (ABI link register) or x2 (ABI stack pointer). Third, source and destination
registers can be identical and finally the number of selectable registers can be limited to 8.
Additionally, for many RVC instructions the additional constraints imm 6= 0 or rd, rs1, rs2 6= x0
are necessary to free up encoding space for other instructions with fewer operand bits [1, p. 100].
As an example, the compressed no operation ([Link]) has the constraint nzimm 6= 0.
Figure 2.2 shows an overview of the instruction formats used by the RVC instructions.
Whereas, table 2.6 provides the list of the 8 commonly used registers rs1´, rs2´ and rd´. As
for the base instructions to simplify decoding the position of the registers are kept at the same
location.

Figure 2.2: Compressed 16-bit RVC instruction formats[1, p. 100, Table 16.1]

– 10 –
2.1 RISC-V

Table 2.6: Common 8 registers accessible by rs1´, rs2´ and rd´ fields in the compressed instruc-
tions formats CIW, CL, CS, CA and CB[1, p. 100, Table 16.2].
RVC Register Number 000 001 010 011 100 101 110 111
Integer Register Number x8 x9 x10 x11 x12 x13 x14 x15
Integer Register ABI Name s0 s1 a0 a1 a2 a3 a4 a5

Load and Store Instructions

For loading and storing data to memory, four RVC instructions (see table 2.7) are available in
combination with an RV32I instruction set. All are using zero-extended immediates to increase
the reachable memory address. The scaling factor depends on the data size, which is 32-bit for
RV32I, which results in the scaling factor of 4.

Table 2.7: Compressed load and store instructions with the used instruction format.
Function Format Short Description

[Link] CI Loads a 32-bit value into rd. The memory address is calculated by
adding the offset stored in the immediate field to the stack pointer
x2.
[Link] CSS Stores content of register rs2 into memory. The memory address is
calculated by adding the offset stored in the immediate field to the
stack pointer x2.
[Link] CL Loads a 32-bit value into rd´. The memory address is calculated by
adding the offset stored in the immediate field to rs1´.
[Link] CS Stores content of register rs2 into memory. The memory address is
calculated by adding the offset stored in the immediate field to rs1´.

Control Transfer Instructions

To control the program flow RVC provides unconditional jump and conditional branch instruc-
tions, where for both instruction types the offset stored in the immediate field is multiplied by
2.

– 11 –
2 Research

Table 2.8: Compressed unconditional jump and conditional branch instructions with the used
instruction format.
Function Format Short Description

C.J CJ Performs an unconditional jump. The target address is calculated by


adding the signed offset to pc.
[Link] CJ Performs an unconditional jump and stores current pc + 2 into link
register x1. The target address is calculated by adding the signed
offset to pc.
[Link] CR Performs an unconditional jump to target address stored in rs1.
[Link] CR Performs an unconditional jump to target address stored in rs1 and
stores current pc + 2 into link register x1.
[Link] CB Performs a conditional jump. Takes the branch when the content of
rs´ is zero. The target address is calculated by adding the signed
offset to pc.
[Link] CB Same as [Link] but branches when value is nonzero.

Integer Computational Instructions

For constant generation and integer arithmetic operations, RVC provides multiple instructions.
Tables 2.9, 2.10 and 2.11 provide a short overview of the available instructions. For a more
comprehensive description including all constraints, a look into section 16.5 of the manual [1, p.
106] is recommended.

Table 2.9: Integer constant generation instructions with the used instruction format.
Function Format Short Description

[Link] CI Loads the signed immediate into rd.


[Link] CI Loads nonzero immediate into bits 17-12 of rd and clears lower 12
bits. Bit 17 in rd holds the sign bit.

– 12 –
2.1 RISC-V

Table 2.10: Integer register-immediate instructions with the used instruction format.
Function Format Short Description

[Link] CI Adds the nonzero signed immediate to rd.


C.ADDI16SP CI Adds the scaled nonzero signed immediate to stack pointer x2.
The immediate scaling factor is 16.
C.ADDI4SPN CIW Adds the scaled nonzero immediate to stack pointer x2 and stores
it into rd´. The immediate scaling factor is 4.
[Link] CI Performs a logical left shift of rd. The immediate holds the shift
amount.
[Link] CB Performs a logical right shift of rd. The immediate holds the shift
amount.
[Link] CB Performs an arithmetic right shift of rd. The immediate holds the
shift amount.
[Link] CB Bitwise AND of rd´ and the signed immediate.

Table 2.11: Integer register-register instructions with the used instruction format.
Function Format Short Description

[Link] CR Copies rs2 to rd.


[Link] CR Adds rs2 to rd.
[Link]
[Link] CA Bitwise AND, OR or XOR of rd´ and rs2´.
[Link]
[Link] CA Subtracts rs2´ from rd´.

– 13 –
2 Research

HINT Instructions

The behavior of RVC HINT instructions is the same as for RV32I described in section 2.1.2.
As a short reminder, HINT instructions do not modify any architectural visible state. As for
RV32I HINT instructions are implemented as computational instructions, where rd = x0 or rd
is overwritten by itself.

– 14 –
2.2 Code Generation

2.2 Code Generation

For increasing productivity in research and development, it is common practice to use code
generation. In this thesis, Infineon’s internal code generation framework called "MetaGen" with
"MetaRTL" is used. The framework is based upon metamodelling and concepts of the Model
Driven Architecture (MDA)[4, 10]. Therefore, in the following sections, a brief overview of the
used model techniques and MDA is provided.

2.2.1 Metamodeling

The term "metamodel" consists out of two words. "Meta" is Greek and translates to after or
beyond. Whereas, a "Model" presents a certain level of abstraction of, for example, a system.
As a result, "metamodel" can be seen as a model that goes beyond a model; a model of a model.
This approach is similar to the class abstraction of object-oriented programming languages.
Figure 2.3 shows the metamodel of an ISA as a Unified Markup Language (UML) class
diagram. The model instance on the right is an instance of the metamodel definition on the left.
In this example, the metamodel defines a component with four attributes. Each attribute has
its own type and multiplicity value. The component can have multiple, at least one, relations
to the "Instruction" class, where the class has its own attributes defined. As seen in figure 2.3
the model instance has only one root node with four instructions related to it. All attributes
are filled with valid values, therefore, this instance meets exactly the constraints set by the
metamodel.

– 15 –
2 Research

Figure 2.3: UML class diagram showing an example of metamodel and a model instance[2]

2.2.2 Metamodel-base Code Automation

As previously mentioned Infineon uses metamodel-based code generation for repetitive code to
reduce engineering costs. Figure 2.4 illustrates the workflow and describes the metamodel’s
role in the flow. To enable metamodeling, a metamodel needs to be defined. Therefore, a
metamodel description using a UML modeling tool is generated based on certain requirements
and specifications. Afterwards, a python framework based on the previously defined metamodel
is generated.
For modifying the input specification, the framework provides a graphical user interface
(GUI). There the user can fill the metamodel with data. Afterwards, the specification is passed
to a reader that reads the input specification into the metamodel framework. As a result,
the model is accessible through a Python Application Program Interface (API). For generating
target code, the model is passed to writers or a template engine. However, the template engine is
the most important and commonly used output mechanism. Due to the possibility of providing
different template files, there are barely limitations of the generated target views.

– 16 –
2.2 Code Generation

Figure 2.4: Illustration of a simplified utilization of metamodeling at Infineon[3]

2.2.3 Model Driven Architecture for Code Automation

MDA proposed by the Object Management Group2 (OMG) is an idea to reduce the growing
productivity gap for using models in software design. Therefore, MDA adds additional steps
before the code is generated. Figure 2.5 illustrates the three main models of MDA with the
additional model of the target platform.

• Computation Independent Model (CIM) is close to the specification without considering


architecture and algorithm implementation.

• Platform Independent Model (PIM) avoids platform details and is the result of transform-
ing of the CIM adding more details accordingly to the architecture

• Platform Model (PM) provides the target platform details.

• Platform Specific Model (PSM) combines PIM and PM. From this view, the final code is
generated.

Figure 2.5: MDA as Y-Chart[4]


2
[Link]

– 17 –
2 Research

For RTL generation adaptions to the MDA were necessary. Figure 2.6 sketches the enhanced
and adopted MDA for hardware generation by introducing new terms that describe the involved
hardware-related models [11].

• Model of Things (MoT) corresponds to CIM capturing requirements and specifications.


MoT also defines attributes and relations to intended functionality. An example of a MoT
would be the ISA described in chapter 2.

• Model of Design (MoD) corresponds to PIM and is the transformed MoT using templates
of design (ToD). A memory subsystem could be a possible example of a MoD.

• Model of View (MoV) corresponds to PSM where platform-specific details are added using
templates of view (ToV). As a result, HDL code can be generated depending on the used
target view models.

Figure 2.6: MDA applied to hardware generation[2]

– 18 –
2.2 Code Generation

2.2.4 MetaGen and MetaRTL

The implementation at Infineon of the previously described techniques is called "MetaGen".


Metamodels can be captured either textually or graphically. As an example, extensible markup
language (XML) schemata’s can be used, where objects with attributes and their relation can be
defined. Additionally, metamodels can be used to combine and relate other known metamodels.
MetaGen is completely written in Python and strongly uses libraries and tools, like the Mako
template engine which is used for MoV generation.
MetaRTL is adding a metamodel to the MetaGen framework to formalize models on the MoD
layer providing basic components and commands to generate RTL code. MetaRTL provides
multiple component constructors written in Python for defining the MoD.

– 19 –
Short Title of your Thesis

Bus Interfaces and Protocols


3
For data transfer between modules and components inside the SoC four different buses are used.
The Advanced High-Performance Bus (AHB) Lite and the Advanced Peripheral Bus (APB) are
taken from the ARM AMBA Specification [12], whereas the Control Status Configuration (CSC)
and Handshake (HNDSHK) bus are Infineon internal specified bus protocols.

3.1 AHB

AHB Lite [5] is a pipelined high-performance bus interface. It is the reduced implementation of
AHB with the main difference that it supports only a single master. Therefore, no arbitration is
needed. In addition, some signals, slave responses and signal behaviors are missing or differing.
A comprehensive list of all differences can be found in the technical reference manual [13, p.
A-3]. The explanation for the bus signals can be taken from the AHB protocol specification [5,
p.2-1].
To achieve high-performance, features like burst transfers, single-clock edge operation, non-
tristate implementation and wide data bus configurations are implemented. AHB Lite is com-
monly used for memory and high bandwidth peripherals. However, low-performance peripherals
can be added by using an AHB-to-APB bridge.
Figure 3.1 sketches a basic AHB read transfer. At the first clock cycle, the source address is
applied on signal HADDR and signal HWRITE is set to zero indicating a read access. During
the following clock cycle, the read data is provided on the signal HRDATA. A write transfer
sketched in figure 3.2 follows the same timing as a read transfer. Only HWRITE is pulled to
high indicating a write access and the data to be written is applied on HWDATA during the
following clock cycle. However, both figures sketch the transfers without wait states. Additional
timing diagrams including diagrams with wait states can be found in chapter "Transfers" of the

– 21 –
3 Bus Interfaces and Protocols

AHB protocol specification [5, p. 3-1].

Figure 3.1: AHB Lite basic read transfer[5, p. 3-2]

Figure 3.2: AHB Lite basic write transfer[5, p. 3-2]

– 22 –
3.2 APB

3.2 APB

APB [6] is an unpipelined low bandwidth bus interface with a focus on reduced complexity. It
supports only a single master and every transfer needs at least two clock cycles. In combination
with a bridge, a communication with high-performance bus interfaces, like AHB, is possible. An
explanation of the bus signals can be taken from the APB protocol specification [6, p. 4-2].
An APB read transfer is sketched in figure 3.3. A high-level PSEL indicates the start of
a transfer. At time T1, the source address is driven on PADDR with PWRITE set to zero
indicating a read access. Earliest a clock cycle later the read data is driven on PRDATA with
PENABLE and PREADY asserted. The transfer finishes by releasing PENABLE. In case of
signal PSEL is kept asserted another transfer is initialized. A write transfer sketched in figure
3.4 follows the same timing as a read transfer. Only PWRITE is pulled to high indicating a
write access and the data to be written is applied on PWDATA. However, both figures sketch
the transfers without wait states. Additional timing diagrams including diagrams with wait
states can be found in chapter "Transfers" of the APB protocol specification [6, p. 2-1].

Figure 3.3: APB read transfer[6, p. 2-4]

– 23 –
3 Bus Interfaces and Protocols

Figure 3.4: APB write transfer[6, p. 2-2]

3.3 CSC

The CSC bus interface is a simple single-cycle data bus protocol defining the signals from table
3.1, where the signal sizes of addr, rdata and wdata are commonly defined to 32-bit. There
is no size dependency between the address signal addr and the data signals rdata and wdata.
Nevertheless, there is a size dependency between rdata and wdata, where both must be the same
size. CSC works as a single cycle bus, therefore, on every rising clock edge along cs and wen or
ren are set a data transmission happens. For clocking of the bus, no dedicated clock signal is
specified. Thus, the bus is clocked using the module clock.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
clk

rst

addr addr0 addr1 addr2 addr3 addr4 addr5

cs

rdata rdata0 rdata1 rdata2

ren

wdata wdata0 wdata1 wdata2

wen

Figure 3.5: CSC timing diagram.

Figure 3.5 shows a timing diagram for multiple read and write transfers. The diagram also
includes the module signals clk and rst to illustrate the relation between them and the bus
signals. At tick 2 of the diagram, the reset is released that triggers immediately the first read
access because signals cs and ren are set. For the next two clock cycles address and read data

– 24 –
3.4 HNDSHK

are updated, where every cycle a read access is done. At tick 8 of the timing diagram, ren is
released but wen is set. Therefore, the value of data signal rdata is kept but a change on wdata
can be observed. On each rising edge and along cs and wen are set data from wdata is taken
over by the slave. At tick 14 cs and wen are released and signals addr and wdata are kept at
their last values until new changes due to another transmission on the bus is modifying their
value.

Table 3.1: CSC bus interface signals.


Signal Vector Short Description

addr 32 Address for read and write operation.


cs - Chip select to activate the slave and enables the data transfer.
rdata 32 Data read from the slave.
ren - Enables read operation. The signal must not be set together with wen
wdata 32 Data written to the slave.
wen - Enables write operation. The signal must not be set together with ren

3.4 HNDSHK

The HNDSHK bus interface is a simple data bus protocol with the intention to be used across
power and clock domain boundaries. Therefore, the bus interface is split into control and
non-control signals. To enable cross-domain usage the control signals must be synchronized
to the destination clock domain. Whereas, the address, data and error signals shall be directly
connected. The HNDSHK specification names the source producer and the destination consumer
instead of master and slave. Figure 3.6 sketches a typical setup with the signals described in
table 3.2 including the synchronization flip-flops. However, these flip-flops can be neglected
when producer and consumer are in the same clock domain.
Figure 3.7 shows the timing diagram for two transfers with a minimum timing. First, a write
transfer is sketched, where valid, addr, wdata and write are asserted from the producer. On the
next clock edge, the consumer takes over the data and asserts ack. After another clock cycle, the
producer releases at least valid and write. The transfer is finished when the consumer de-asserts
ack. Before another transfer can be started the signal valid must be at least one producer clock
cycle set to zero.
The second transfer follows the same timing but keeps the write signal de-asserted. However,

– 25 –
3 Bus Interfaces and Protocols

valid FF0

ack

Consumer
Producer
FF1

nack

FF2

addr 16

wdata 16

16 rdata

write

err

Figure 3.6: Cross-domain structure diagram between producer and consumer. The flip-flops
FF1 and FF2 synchronize ack and nack to the producer clock domain. Whereas, flip-flop FF0
synchronizes valid to the consumer clock domain.

the consumer had an error taking over the data and therefore, asserts nack and err. In that case,
the signal rdata contains an error code. The transfer is finished when the consumer de-asserts
nack and err.
ATV-PS Cross Domain Handshake Schema
0 1 2 3 4 5 6 7 8 9 10
clk

rst_n

valid a c
Control

wait for ack or nack (i) (ii)

ack b d

nack
Handshake

addr A0 A1
Non-control

wdata D0

rdata e0

write

err

(i) finish valid handshake, (ii) finish ack/nack handshake,


next valid can be asserted as soon as ack/nack goes low (comb.)

Figure 3.7: HNDSHK timing diagram for valid write and invalid read transfer.

– 26 –
3.4 HNDSHK

Table 3.2: HNDSHK bus interface signals.


Signal Vector Short Description

valid - Signal generated by the producer indicating the start of a transfer. Be-
tween two transfers this signal must be at least one producer clock cycle
set to zero. This signal must be synchronized from producer to consumer
in case of a clock domain crossing.
ack - Acknowledge signal set by the consumer after successful access. This
signal must be synchronized from consumer to producer in case of a clock
domain crossing.
nack - Not-Acknowledge signal set by the consumer after failed access. This
signal must be synchronized from consumer to producer in case of a clock
domain crossing.
addr 16 Address for read and write operation set by the producer.
rdata 16 Data read by the producer from the consumer.
wdata 16 Data written to the consumer.
write - Direction indication signal set by the producer.
err - Error signal set by the consumer to indicate that rdata contains an error
code.

– 27 –
Short Title of your Thesis

Components
4
For building the digital part of the SoC multiple components taken from two libraries were used.
Both libraries are developed company intern from two independent development teams.
The first library uses MetaRTL and provides multiple components which use metamodeling
for code generation. For all components, the generated code can be configured using a GUI.
Afterward, ready-to-use VHDL and/or Verilog code should be generated which is wrapped for
further usage with MetaGen. Nevertheless, during generation and simulation errors in some
components occurred. In case that happened, a short error description including the current
working solution is stated.
The second library is the digital building blocks library (DBBL). This library contains hand-
written components with partly generated code using MetaGen. MetaGen is used to generate
component entities, register components and the interconnections between multiple components
combining it to a module. All DBBL components and modules are written in SystemVerilog.

4.1 MetaRTL

4.1.1 RISC-V CPU Core

Overview

MetaRTL provides an implementation of the RISC-V ISA briefly described in chapter 2. After
generation of the CPU, a wrapper for further integration into the SoC is added. Therefore, the
MetaGen flow is used to generate the module entities. Figure 4.1 shows the block diagram of
the wrapped CPU core with all signals and interfaces accessible outside of the wrapper. For
accessing data and instructions the core provides two AHB Lite master interfaces. Both will be
connected to the AHB matrix during the integration. The signals mei_ext_i and mei_ended_o

– 29 –
4 Components

are used to connect the interrupt controller described in section 4.1.9 with the CPU.

illegal_state_o clk_i
mei_ended_o reset_n_i
mei_ext_i

RISC-V
AHB Master Data
CPU Core

AHB Master Instr

Figure 4.1: Block diagram of the wrapped CPU.

Generation

For generation, the metamodel shown in figure 4.2 with the hardware extensions shown in
figure 4.3 is used. To configure the metamodel for generation the GUI provided by MetaGen is
used. The RISC-V core is configured to implement RV32IMC without the custom instructions
Custom32 and CustomMAC. The attributes of the metamodel class "Config" are set to implement
a 5-stage CPU with nested non-vectorized exception handling and a startup address of 32768.
For the hardware extensions, only the "ExceptionUnit" is set to be active, where the attribute
CSR is left untouched. The Exception attribute is configured with the exceptions machine
software interrupt (MSI) and machine timer interrupt (MTI) deactivated.
However, the initial described configuration could not be used due to failures during the code
generation. The solution is to activate the custom instructions Custom32 and CustomMAC.
Additionally, the exceptions MSI and MTI are enabled again.
As a targeted view Verilog was selected. During setting up the simulation a compile error
inside the CPU was thrown. The generated code had two missing ports of the control and status
register instance inside of the exception unit instance. For this problem, a fix was issued. As an
intermediate solution, the missing ports are hardwired to zero.

– 30 –
4.1 MetaRTL

Figure 4.2: Metamodel of the RISC-V core

Figure 4.3: Metamodel of the hardware extensions

– 31 –
4 Components

Pipeline

As previously mentioned the CPU implements a 5-stage pipeline sketched in figure 4.4 with the
following stages:

• Instruction Bus (IB) stage is commonly known as instruction fetch. In this state, the next
instruction is read from memory.

• Instruction Decode (ID) stage decodes the instruction. The logic checks if the pipeline is
ready to be executed.

• Execute (EX) stage is where the ALU operates. Computation units of extensions, like a
multiplier, are also operating in this stage.

• Memory access (MEM) stage accesses data memory if needed.

• Writeback (WB) stage is where the instructions write their results to the register files.
instructions

IB ID EX MEM WB
IB ID EX MEM WB
IB ID EX MEM WB
IB ID EX MEM WB
IB ID EX MEM WB
time

Figure 4.4: Five-stage pipeline of the RISC-V CPU[7]. The orange column shows the first
instruction reaching the WB stage.

Arithmetic Logic Unit

Additionally, to the generated Verilog code, a schematic of the RISC-V CPU is generated. Figure
4.5 shows the ALU instance inside the CPU, where the supported arithmetic functions can
be graphically seen. Furthermore, MetaRTL offers an easy possibility to compare schematics
with code in different languages. To provide an example also the VHDL code of the RISC-
V CPU is generated, where listing 4.1 shows the Verilog implementation and listing 4.2 the
VHDL implementation. Due to the similar naming of the signals in the code, it is easy to
make a connection to the schematic. Therefore, it is traceable how the arithmetic functions are
represented and implemented in the different languages.

– 32 –
4.1 MetaRTL

Figure 4.5: Generated schematic of the RISC-V ALU

Listing 4.1: Verilog code of RISC-V ALU without the result multiplexer.
1 assign SLICE_Outp_s = alu_param_2 [ 4 : 0 ] ;
2 assign SIGNEDCAST00_Outp_s = s i g n e d ( alu_param_1 ) ;
3 assign SIGNEDCAST01_Outp_s = s i g n e d ( alu_param_2 ) ;
4 assign LS_Outp_s = alu_param_1 << SLICE_Outp_s ;
5 assign RSL_Outp_s = alu_param_1 >> SLICE_Outp_s ;
6 assign RSA_Outp_s = s i g n e d ( alu_param_1 ) >>> SLICE_Outp_s ;
7 assign BAND_Outp_s = alu_param_1 & alu_param_2 ;
8 assign BOR_Outp_s = alu_param_1 | alu_param_2 ;
9 assign BXOR_Outp_s = alu_param_1 ^ alu_param_2 ;
10 assign HWPLUS_Outp_s = SIGNEDCAST00_Outp_s + SIGNEDCAST01_Outp_s ;
11 assign HWMINUS_Outp_s = alu_param_1 − alu_param_2 ;
12 assign LT00_Outp_s = alu_param_1 < alu_param_2 ? 3 2 ’ b1 : 3 2 ’ b0 ;
13 assign LT01_Outp_s = SIGNEDCAST00_Outp_s < SIGNEDCAST01_Outp_s ? 3 2 ’ b1 : 3 2 ’ b0 ;

Listing 4.2: VHDL code of RISC-V ALU without the result multiplexer.
1 SLICE_Outp_s <= alu_param_2 ( 4 downto 0 ) ;
2 SIGNEDCAST00_Outp_s <= alu_param_1 ;
3 SIGNEDCAST01_Outp_s <= alu_param_2 ;
4 LS_Outp_s <= s t d _ l o g i c _ v e c t o r ( s h i f t _ l e f t ( u n s i g n e d ( alu_param_1 ) , t o _ i n t e g e r (
u n s i g n e d ( s t d _ l o g i c _ v e c t o r ( s t d _ l o g i c _ v e c t o r ’ ( " " & SLICE_Outp_s ) ) ) ) ) ) ;
5 RSL_Outp_s <= s t d _ l o g i c _ v e c t o r ( s h i f t _ r i g h t ( u n s i g n e d ( alu_param_1 ) , t o _ i n t e g e r

– 33 –
4 Components

( u n s i g n e d ( s t d _ l o g i c _ v e c t o r ( s t d _ l o g i c _ v e c t o r ’ ( " " & SLICE_Outp_s ) ) ) ) ) ) ;


6 RSA_Outp_s <= s t d _ l o g i c _ v e c t o r ( s h i f t _ r i g h t ( s i g n e d ( alu_param_1 ) , t o _ i n t e g e r (
u n s i g n e d ( s t d _ l o g i c _ v e c t o r ( s t d _ l o g i c _ v e c t o r ’ ( " " & SLICE_Outp_s ) ) ) ) ) ) ;
7 BAND_Outp_s <= s t d _ l o g i c _ v e c t o r ( alu_param_1 ) AND s t d _ l o g i c _ v e c t o r (
alu_param_2 ) ;
8 BOR_Outp_s <= s t d _ l o g i c _ v e c t o r ( alu_param_1 ) OR s t d _ l o g i c _ v e c t o r ( alu_param_2 )
;
9 BXOR_Outp_s <= s t d _ l o g i c _ v e c t o r ( alu_param_1 ) XOR s t d _ l o g i c _ v e c t o r (
alu_param_2 ) ;
10 HWPLUS_Outp_s <= s t d _ l o g i c _ v e c t o r ( s i g n e d ( SIGNEDCAST00_Outp_s ) + s i g n e d (
SIGNEDCAST01_Outp_s ) ) ;
11 HWMINUS_Outp_s <= s t d _ l o g i c _ v e c t o r ( s t d _ l o g i c _ v e c t o r ( alu_param_1 ) −
s t d _ l o g i c _ v e c t o r ( alu_param_2 ) ) ;
12 LT00_Outp_s <= " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 " when s t d _ l o g i c _ v e c t o r (
alu_param_1 ) < s t d _ l o g i c _ v e c t o r ( alu_param_2 ) e l s e "
00000000000000000000000000000000 " ;
13 LT01_Outp_s <= " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 " when s i g n e d (
SIGNEDCAST00_Outp_s ) < s i g n e d ( SIGNEDCAST01_Outp_s ) e l s e "
00000000000000000000000000000000 " ;

– 34 –
4.1 MetaRTL

4.1.2 AHB Matrix

Overview

The AHB Matrix is an M-to-N AHB Lite connection matrix, where M masters with fixed priority
can access N slaves. The matrix allows parallel access. This means multiple masters can operate
at the same time as long they access different slaves. In case two masters want to access the
same slave a stall on the lower priority master will happen.
Figure 4.6 shows the wrapped matrix with all master and slaves of the current configuration,
where the master for the serial peripheral interface (SPI) has the highest priority followed by
the master for the CPU instruction bus.
clk_i

reset_n_i

AHB Slave APB0

AHB Slave TIMER

AHB Slave INTR


AHB Matrix

AHB Master SPI


AHB Slave APB1

AHB Master IMEM AHB Slave DRAM

AHB Slave IRAM0


AHB Master DRAM

AHB Slave IRAM1

AHB Slave BROM

AHB Slave ROM

AHB Slave OTP

AHB Slave ECC

Figure 4.6: Block diagram of the AHB matrix wrapper.

Generation

The AHB Matrix is generated using the metamodel sketched in figure 4.7. The configuration
of the metamodel is done using the GUI provided by MetaGen. The attributes for the root
node are kept at default values. The three master interfaces ahb_master_spi, ahb_master_ib
and ahb_master_db are added. Important here is the position in the list because the position
is defining the priority. In the last step the list of slave interfaces is specified, where table 4.1

– 35 –
4 Components

provides the list of slaves with their base address and address range. Based on that information
the matrix is multiplexing the master interfaces to the slaves.

Figure 4.7: Metamodel of the AHB matrix

Table 4.1: Base address and address range of the AHB matrix slaves.
Slave Base Address Address Range

AHB Slave APB0 0 2942


AHB Slave TIMER 2944 512
AHB Slave INTR 3456 256
AHB Slave APB1 3712 4480
AHB Slave DRAM 8192 8192
AHB Slave IRAM0 16384 8192
AHB Slave IRAM1 24576 8192
AHB Slave BROM 32768 4096
AHB Slave ROM 36864 16384
AHB Slave OTP 53248 4096
AHB Slave ECC 57344 80

– 36 –
4.1 MetaRTL

4.1.3 AHB-to-APB Bridge

Overview

For connecting modules from DBBL with the RISC-V and MetaRTL components an AHB-to-
APB bridge is used. The bridge converts an AHB transaction to an APB transaction. The
generation uses the metamodel shown in figure 4.9. As already seen in the metamodel only the
instance name can be modified using the GUI. After generation, a wrapper using MetaGen is
added around the generated code resulting in the block diagram shown in figure 4.8.
However, during setting up the simulation a compilation error inside of the generated code
of the AHB-to-APB bridge was thrown. This is caused by a faulty implementation of the AHB
signal HREADYOUT. The simulator reported that the signals have multiple drivers. Therefore,
as a quick solution, the additional driver was manually removed.

clk_i

reset_n_i
AHB-to-APB
APB Master
Bridge
AHB Slave

Figure 4.8: Block diagram of the AHB-to-APB bridge wrapper.

Figure 4.9: Metamodel of the AHB-to-APB bridge

– 37 –
4 Components

4.1.4 AHB-to-CSC Bridge

The AHB-to-CSC bridge metamodel is the same as the metamodel for the AHB-to-APB bridge.
After setting the instance name using the GUI the code can be generated. As for the AHB-to-
APB bridge, a wrapper is added using MetaGen. The AHB-to-CSC bridge is used to connect
CSC register instances which are commonly used by MetaRTL components when they contain
registers or memory.

clk_i

reset_n_i
AHB-to-CSC
CSC Master
Bridge
AHB Slave

Figure 4.10: Block diagram of the AHB-to-CSC bridge wrapper.

4.1.5 HNDSHK-to-AHB Bridge

The HNDSHK-to-AHB bridge metamodel is the same as the metamodel for the AHB-to-APB
bridge. After setting the instance name using the GUI the code can be generated. As for the
AHB-to-APB bridge, a wrapper is added using MetaGen. This bridge is used to connect the
SPI module provided by DBBL as an AHB master to the AHB matrix.
During testing, a misalignment in the protocol specification of AHB was found. In the
simulation waveform, it could be observed that the AHB signal HTRANS is set incorrectly in
the state machine. The signal is kept too long at high what caused the signal HREADY to be
toggled all time. Therefore, the data written to a certain register was lost. The solution was to
remove the signal HTRANS from states, where it should not be set to 1.

clk_i

reset_n_i
HNDSHK-to-AHB
HNDSHK Slave
Bridge

AHB Master

Figure 4.11: Block diagram of the HNDSHK-to-AHB bridge wrapper.

– 38 –
4.1 MetaRTL

4.1.6 Data and Instruction SRAM

Overview

One type of memory used inside the SoC is static random-access memory (SRAM) for storing
data and instructions. The SoC has multiple instances of SRAM implemented. The generated
code implements the AHB slave interface and connects it to an SRAM IP provided by the used
production technology, where each SRAM IP instance provides space for 2048 words. Providing
more details about the SRAM IP is not possible due to company restrictions.
To use the generated code in the integration flow, another wrapper on top is added. This
needs to be done due to a restriction of the code generator that implements interface signals
without using SV interfaces. However, the wrapper generated by MetaGen defines the SV
interface and inside the wrapper, all separate AHB signals are properly connected. Figure 4.12
shows the block diagram of an SRAM module, where the signals and interfaces of the wrapper
are shown.

clk_i

SRAM reset_n_i

SRAM IP AHB Slave

Figure 4.12: Block diagram of the wrapped SRAM for data and instruction storage.

Code Generation

The Metamodel for code generation in figure 4.13 is kept simple and only provides a single class.
Table 4.2 and 4.3 provides the settings used to generate the instance for data and instruction
SRAM. This results in three generated instances of SRAM.

– 39 –
4 Components

Figure 4.13: Metamodel of the SRAM

Table 4.2: Settings of the MetaRTL specification for the SRAM instance used to store data.
Field Value Short Description

Name AhbDataRam Name of the generated component.


BaseAddress 8192 Base address of the generated component used to calculate
the relative address.
SizeInBytes 8192 Size of the SRAM in bytes.
FileName [Link] File loaded at simulation start up.

Table 4.3: Settings of the MetaRTL specification for the SRAM instance used to store instruc-
tions.
Field Value Short Description
AhbIRam0
Name Name of the generated component.
AhbIRam1
16384 Base address of the generated component used to
BaseAddress
24576 calculate the relative address.
SizeInBytes 8192 Size of the SRAM in bytes.
[Link]
FileName File loaded at simulation start up.
[Link]

– 40 –
4.1 MetaRTL

4.1.7 ROM

Overview

The second type of memory used inside the SoC is a ROM for storing data and/or instructions
using a metal mask. The ROM component is generated using MetaRTL and it implements a
CSC slave connected to a ROM IP provided by the used production technology. Compared to
the size of the SRAM IP, the ROM IP provides the double amount of capacity per IP instance,
which is 4096 instead of 2048 words. As for the SRAM IP, providing more details about the
ROM IP is not possible due to company restrictions.
For easier integration and reducing the complexity for integrating the module in the SoC, the
wrapper instantiates the generated ROM and an AHB-to-CSC bridge. As a result, the module
already provides the AHB slave interface needed for integration. Figure 4.14 shows the wrapper
with the two components including the connection inside the wrapper and the provided output
signals.

clk_i

reset_n_i
ROM
AHB-to-CSC
CSC Slave
Bridge
ROM IP AHB Slave

Figure 4.14: Block diagram of the wrapped ROM.

Generation and Wrapping

The Metamodel for code generation in figure 4.15 is kept simple and only provides a single class.
Table 4.4 provides the settings used to generate the ROM instance.
The currently available implementation of the MetaRTL library needs manual modifications.
First, the generator python script must be updated. During setup, the newest version of the
library is provided by the package manager. However, the function addInterface() of MetaRTL
library metartl_core has changed. Previously, it had a parameter for setting a prefix which is
now removed. Therefore, the function call inside the generation script is updated to also remove
this prefix parameter.
The second modification needed is the replacement of the used IP inside of the generated
code. Used directly after setup including the Python modifications and configuration the gen-

– 41 –
4 Components

erated code instantiates an SRAM IP in read-only configuration. This IP must be manually


replaced by the ROM IP.

Figure 4.15: Metamodel of the ROM

Table 4.4: Settings of the MetaRTL specification for the ROM instance.
Field Value Short Description

Name ROM Name of the generated component.


BaseAddress 36864 Base address of the generated component used to calculate
the relative address.
SizeInBytes 16384 Size of the ROM in bytes.
FileName rom4kx33_0.cod File loaded at simulation startup.

– 42 –
4.1 MetaRTL

4.1.8 Boot ROM

Overview

The third type of memory inside the SoC is the initial boot ROM. This component is generated
using MetaRTL with the same approach as for the normal ROM. However, the ROM IP is
removed and instead a simple memory array is implemented. These changes are not visible
outside of the module and the boot ROM behaves like the normal ROM. To enable further
usage in the MetaGen integration flow a wrapper as shown in figure 4.16 is added. As for the
ROM, the wrapper contains the AHB-to-CSC bridge.

clk_i

reset_n_i
BOOT ROM
AHB-to-CSC
CSC Slave
Bridge
ROM_ARRAY
AHB Slave

Figure 4.16: Block diagram of the wrapped Boot ROM.

Generation

The boot ROM uses the same metamodel as the ROM shown in figure 4.15. Therefore, similar
settings for generation can be used and only the base address and the size must be adopted.
The base address is set to 32768 what is the startup address set during the generation of the
RISC-V CPU. For the boot ROM, the size in bytes is limited to 4096.

Code

As previously mentioned the ROM IP was replaced by a memory array. This array must be
filled with an initial code. Therefore, the C code from listing 4.3 is compiled using the toolchain
for RISC-V. As compilation output representation hex is chosen. Listing 4.4 shows the final hex
file with manual modifications before it is converted to binary. The conversion is done using a
Python script converting hex to binary and slicing it into 8-bit blocks. The results are afterward
copied into the array.
During the simulation setup, to bring the SoC up to at least loop the initial boot code some
issues with the natively compiled code were discovered. First, the initial provided boot code

– 43 –
4 Components

tried to use the data RAM what should be avoided. The C code shown in listing 4.3 already
contains the fix by replacing the variables used in line 12 and 16 by values.
The second issue discovered is due to a delay in the CPU. When a jump instruction is
executed followed by another jump instruction the second instruction is executed 4 clock cycles
delayed. That results in certain cases that the memory array size is too small leading to a
read failure. The MetaRTL development is already working on an improved and speed-up jump
and branch implementation of the RISC-V CPU. However, the current solution is to add NOP
instructions to the code. In listing 4.4 line 5 is the relative jump back to line 25 and to overcome
that the program counter tries to read from a not available memory address the lines 1 to 4 are
added.

Listing 4.3: C code for selecting startup memory


1 u i n t 3 2 _ t cnt , r e s u l t ;
2 while (1)
3 {
4 f o r ( c n t = 0 ; c n t < 1 0 0 ; c n t++)
5 {
6 i f ( c n t == 1 )
7 {
8
9 r e s u l t = ∗ ( u i n t 1 6 _ t ∗ ) (SYS_CTRL_ADR) ;
10 i f ( r e s u l t == 0 x2 ) // r e a d y t o b o o t from ROM
11 {
12 g o t o ∗ ( ( v o i d ∗ ) 0 x9000 ) ;
13 }
14 e l s e i f ( r e s u l t == 0 x4 ) // r e a d y t o b o o t from i n s t r u c t i o n RAM
15 {
16 g o t o ∗ ( ( v o i d ∗ ) 0 x4000 ) ;
17 }
18 }
19 }
20 }

– 44 –
4.1 MetaRTL

Listing 4.4: Hard coded boot code in hexadecimal representation stored in Boot ROM
1 00000013
2 00000013
3 00000013
4 00000013
5 fefff06f
6 00000013
7 000047 b7
8 fca79ae3
9 00078067
10 00000013
11 000097 b7
12 00 b79863
13 0107 e7b3
14 00779793
15 00 e 04 803
16 01004783
17 fec71ce3
18 fed70ce3
19 00170713
20 00000713
21 00400513
22 00200593
23 00100613
24 06400693
25 0040006 f
26 00000013
27 00000013
28 00000013
29 00000013
30 00000013

– 45 –
4 Components

4.1.9 Interrupt Controller

Overview

The interrupt controller is a programmable interrupt controller (PIC) that connects multiple
interrupts to a single host device. The generated code provides interrupt control registers ac-
cessible over CSC. Therefore, for further usage in the MetaGen integration flow a wrapper to
instantiate the AHB-to-CSC bridge is implemented. Figure 4.17 shows the block diagram of
the wrapper including all visible signals. To handle the interrupt signals from the components
GPIO, SAR ADC and Timer multiple input signals are implemented. A non-vectorized signal
intr_cpu_o is provided to forward the interrupt requests to the RISC-V CPU. To clear the
pending interrupt request, the signal intr_ended_i is provided by the CPU.

clk_i

reset_n_i
Interrupt AHB-to-CSC
Controller CSC Slave
Bridge
CSC Register AHB Slave
gpio0_i
intr_cpu_o

gpio2_i

gpio1_i
gpio3_i

sar_adc0_i
intr_ended_i

timer1_i

timer0_i

Figure 4.17: Block diagram of the wrapped interrupt controller.

Generation

The interrupt controller is generated using the metamodel in figure 4.18. The configuration of
the metamodel is done using the GUI provided by MetaGen. After setting the name attribute
of the root node the attribute IRQDisable of "Generalconfiguration" is changed to 1. Afterwards
the interrupts for the timer, SAR ADC and GPIO are specified. The two timer interrupts have
the highest priority followed by the SAR ADC interrupt. The lowest priority has the GPIO
interrupts. All generated interrupts are maskable by the bits in register INTERRUPT_CTRL.

– 46 –
4.1 MetaRTL

Figure 4.18: Metamodel of the interrupt controller

– 47 –
4 Components

4.1.10 Timer

Overview

The timer module provides two timer instances with optional capture compare units (CCU). As
typically for MetaRTL components the register access is implemented using the CSC interface.
Therefore, to connect the timer to the AHB matrix the AHB-to-CSC bridge is implemented
inside of the wrapper. Additionally, the wrapper is used to combine the timer output signals
from table 4.5 into an interface called tim. To handle timer interrupts to the interrupt controller
the signals timer0_intr_o and timer1_intr_o are provided. A list of all registers can be found
in Appendix A.2.

clk_i

reset_n_i
Timer AHB-to-CSC
CSC Slave
Bridge
CSC Register AHB Slave
timer1_intr_o

timer0_intr_o
tim

Figure 4.19: Block diagram of the wrapped timer.

Generation

The timer is generated using the metamodel in figure 4.20. For configuration of the metamodel,
the GUI provided by MetaGen is used. At first, a single timer with two channels is specified.
Both channels provide a counter width of 32 bits. For channel 0 only one, whereas, for channel
1 two capture compare units are specified. All attributes for software and hardware access are
set to 1.

– 48 –
4.1 MetaRTL

Table 4.5: Signal list for interface tim implemented in the wrapper. All signals have a width of
1-bit.
Signal Direction Short Description

TIM0_ExtRes input External reset to set the counter to MaxValue


TIM0_ExtCnt input External count input
TIM0_OvfInt output Shows that the counter has an overflow
TIM0_OvfIntRes input External reset impulse for overflow trigger
TIM0_CCU0_ExtCap input External trigger for capture mode
TIM0_CCU0_ExtComp output Shows result of the comparison
TIM1_ExtRes input External reset to set the counter to MaxValue
TIM1_ExtCnt input External count input
TIM1_OvfInt output Shows that the counter has an overflow
TIM1_OvfIntRes input External reset impulse for overflow trigger
TIM1_CCU0_ExtCap input External trigger for capture mode
TIM1_CCU0_ExtComp output Shows the result of comparison
TIM1_CCU1_ExtCap input External trigger for capture mode
TIM1_CCU1_ExtComp output Shows the result of comparison

– 49 –
4 Components

Figure 4.20: Metamodel of the timer

– 50 –
4.2 Digital Building Block Library

4.2 Digital Building Block Library

4.2.1 Serial Peripheral Interface

For external communication, a serial peripheral interface (SPI) is used. Figure 4.21 sketches the
block diagram of the SPI module. However, the SPI is not directly used as a module but inside
the SoC, the SPI bridge and the SPI frame encoder/decoder are instantiated separately.
The SPI bridge converts the serial data stream from the signal Master-Out-Slave-In (MOSI)
into parallel data. For transmitting data to the master via the Master-In-Slave-Out (MISO)
signal this process is inverted. Additionally, the SPI bridge checks the number of shift bits and
triggers a frame error when the amount of shift bits differs from 32. In case that the received
frame match any of the two defined test mode entry keys a key received trigger is sent over the
according signal in interface tm_en_ctrl. However, the frame is thrown away without being
transmitted to the frame encoder/decoder.
The SPI Frame Encoder/Decoder checks the parallel data provided by the SPI bridge ac-
cordingly to the protocol described in section SPI Protocol. In case the provided data is correct
a HNDSHK bus transmission is triggered.
clk_i
SPI reset_n_i
scan_mode_i

crcmd_i
sys_stat_i

SPI FRAME
SPI Slave SPI BRIDGE HNDSHK Master
Encoder/Decoder

tm_en_ctrl

SPI_FRAME

Figure 4.21: Block diagram of the serial peripheral interface module.

SPI Protocol

Each SPI transmission consists out of a MOSI and MISO frame both with a fixed size of 32-bit.
Table 4.7 shows the common frame structure for MOSI and MISO. In the current version of the
module two cyclic redundancy check (CRC) modes CRC24 and IGN_CRC are implemented.
The mode CRC24 implements a CRC according to AUTOSAR standard SAE J1850[14] which
calculates the CRC over 24bits of the MOSI frame excluding the CRC field. Whereas, IGN_CRC
skips the CRC calculation. In that case, an extended write address space is available, where the

– 51 –
4 Components

Table 4.6: Signal list of interface SPI_FRAME


Signal Direction Short Description

spi_in_data output 32-bit signal containing the received MOSI data.


spi_out_data input 32-bit signal containing the MISO data transmitted at the next
SPI transaction.
ready output Access over SPI is done.
frame_error output During the access a frame error occured.
access_done input Signals the SPI bridge that the data was processed.

CRC field represents the upper 8bits of the address. For both implementations, a read operation
is indicated by 0xFF in the address field.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
MOSI
CRC ADDRESS / READ INDICATOR WRITE DATA / READ ADDRESS

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
MISO
CRC OFC LFOK SYSTEM STATUS READ DATA / ERROR CODE

Table 4.7: MOSI/MISO frame structure for read and write operations.

The MISO field OFC (One Frame Corrupted) indicates that at least one frame since the
last system reset or OFC bit readout was faulty. Last Frame OK (LFOK) indicates that the
last transmitted MOSI frame was valid. In that case, the data field of MISO contains an error
code. As seen in table 4.8 the upper 3 bits contain a predefined communication error code (short
CEC). The lower bits are containing error codes from the internal bus. The usage of the lower
bits will be indicated by a CEC value of 0x7.

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
CEC ERROR CODE

Table 4.8: Structure of the MISO error code field. CEC contains predefined codes which are
described in table 4.9. The error code field contains the transmitted error code from the internal
bus.

– 52 –
4.2 Digital Building Block Library

Encoding (hex) Description

0x0 Interframe delay time violation


0x1 Frame error detected
0x2 CRC error detected
0x3 Write to read-only register or parity error on internal bus access
0x4 Access to non-existing register
0x5 Internal bus contention detected
0x6 Internal bus data read parity fail
0x7 Internal bus error message

Table 4.9: List of predefined communication error codes.

– 53 –
4 Components

4.2.2 Test Mode Control

The test mode control (TMCTRL) provides design for test (DFT) features like registers to
control test mode behavior and limiting access to the chip. One key feature of the module is
providing three test mode levels. Therefore, to enter test mode level 1 two consecutive predefined
keys must be received over SPI. The entry key definition is done in the SPI module and just
the information that a test mode key was received is transmitted to TMCTRL by two signals
of the interface tm_en_ctrl. TMCTRL only enters the test mode when test mode entry key 1
is received before key 2 and both keys are received consecutively. However, as soon as the test
mode is entered the test mode control increases the available address space for read and write
operations over the SPI module by setting the corresponding signals in interface tm_en_ctrl to
their maximal value.
The second defined interface tm with the signals from table 4.10 is used to distribute the
active test mode level and enabled test mode features to other modules. The most commonly
used test mode feature is indicated by the signal scan_mode. It indicates that the device is set
into a state for testing[15].
tm_en_ctrl

APB

clk_i

reset_n_i
TMCTRL

Registers tm
OTP_CTRL
OTP_STAT

Figure 4.22: Block diagram of test mode control.

Figure 4.22 sketches the TMCTRL with its signals and interfaces also including the inter-
faces OTP_STAT and OTP_CTRL. Both are defined by the One-Time-Programmable (OTP)
memory module. However, the current implementation of the SoC skips implementing the OTP.
Therefore, these interfaces are left unconnected or tied to fixed values.

– 54 –
4.2 Digital Building Block Library

Table 4.10: Signal list of interface tm


Signal Direction Short Description

level_1_test_mode output Signal indicating that test mode level 1 is active.


level_2_test_mode output Signal indicating that test mode level 2 is active.
level_3_test_mode output Signal indicating that test mode level 3 is active.
tm_reset_n output Test mode reset signal triggered by reset signal or active
test mode level is below level 2.
ignore_por output Ignore system power on reset signal enabled by register
when in test mode level 2.
ignore_reset output Ignore system reset signal enabled by register when in
test mode level 2.
scan_mode output Scan mode active signal enabled by register.
iddq_mode output Iddq mode active signal enabled by register when in test
mode level 3.

Table 4.11: Signal list of interface tm_en_ctrl


Signal Direction Short Description

tm_key_1_received input Strobe signal indicating that test mode en-


try 1 key was received.
tm_key_2_received input Strobe signal indicating that test mode en-
try 2 key was received.
tm_non_ key_message_received input Strobe signal that any none test mode entry
key message was received.
tm_enabled output Indication signal that the test mode is ac-
tive. At least test mode level 1 is enabled.
tm_max_read_addr output 16-bit signal for the maximum value that
can be read. SPI must prevent read access
on internal bus for read addresses above
that value.
tm_max_write_addr output 16-bit signal for the maximum value that
can be written. SPI must prevent write
access on internal bus for write addresses
above that value.

– 55 –
4 Components

4.2.3 Successive-Approximation-Register Analog-to-Digital Converter

A Successive-Approximation-Register Analog-to-Digital Converter (SAR ADC) is used to sam-


ple analog inputs. Figure 4.23 sketches the digital part including a SAR ADC IP. Due to
company restrictions, only limited information about the used IP can be provided. However,
the used IP provides an 11-bit SAR ADC with one additional bit for over-conversion. Whereas,
the mentioned interface analog_if provides all analog inputs for the IP like positive and neg-
ative reference voltages. The IP provides the possibility to combine multiple samples in one
measurement delivering the averaged result. This function can be enabled by a register. A list
of all available registers can be found in Appendix A.4.
As sketched in the block diagram the registers are accessed via APB. For testing purposes,
the interface tm and the signals scan_clk_i and scan_reset_n_i must be provided.
clk_i

SAR ADC reset_n_i


scan_clk_i
scan_reset_n_i
por_n_i
analog_if SAR ADC IP

tm

APB REGISTERS APB Slave

Figure 4.23: Block diagram of the SAR ADC.

– 56 –
4.2 Digital Building Block Library

4.2.4 General Purpose Input/Output

The General-Purpose Input/Output (GPIO) module provides the control logic for the general
input and outputs of the SoC. The SoC has 32 pins, where pins 0 to 3 provide additional features
by being connected to a multiplexer. A register block accessible via APB provides the registers
for controlling direction, multiplexing and storing/setting of the pin value.
As previously mentioned pins 0 to 3 can be multiplexed to be connected to interrupt inputs
and timer outputs. The register GPO_OUT_MUX0 is used to select the multiplexed signal.
Here it must be mentioned that for interrupt inputs the direction of the pin must be configured
as an input.
To control the direction of all pins registers GP_CTRL0 and GP_CTRL1 are used. De-
pending on the settings the current input of a pin can be read in register GP_IN0 or GP_IN1.
Setting the output value of a pin can be done using registers GP_OUT0 and GP_OUT1.

clk_i

reset_n_i

tm

GPIO tim

gpio

APB REGISTERS APB Slave


intr1_o
intr2_o
intr3_o

intr0_o

Figure 4.24: Block diagram of the GPIO module.

– 57 –
4 Components

4.2.5 System-On-Chip Control

The System-On-Chip control (SOCCTRL) module provides SoC-specific register fields for con-
trol and software frame handling. Four 16bit registers store the data for software frame handling,
where two registers are used to store transmission data from the SoC to the SPI master, whereas,
the other two registers store the raw 32bit of the received SPI frame. The interpretation and
consistency check of the data must be done in software. A status register provides information
about the current data status. It includes a flag for new data available and frame error both
provided by the SPI module described in section 4.2.1. For the selection of the boot memory, the
bitfield STARTUP_LOC in register IMEM_CTRL0 is provided. Additional trimming registers
can be used for trimming the PLL clock, bandgap and current reference. A complete list of
available registers can be found in appendix A.6.
The block diagram in figure 4.25 shows all interfaces and sideband signals. The module reg-
isters are connected as APB slaves via APB bus to the AHB-to-APB bridge. All control signals
stated in table 4.12 are bundled into the interface SOCCTRL_IF. For software frame handling
and data status a connection to the SPI module is needed. This connection is implemented
using the SPI_FRAME interface provided by the SPI module.

clk_i

SOCCTRL reset_n_i
SPI_FRAME

APB REGISTERS APB Slave


SOCCTRL_IF

Figure 4.25: Structural overview of the SOCCTRL module.

– 58 –
4.2 Digital Building Block Library

Table 4.12: Signal list for interface SOCCTRL_IF with a short description providing the signal
source.
Signal Direction Short Description

imem_ram_ready input Directly tied to bit field in register


IMEM_CTRL0
imem_startup_loc output 2-bit signal directly tied to bit field in register
IMEM_CTRL0
imem_ram_ahb_access_en output Directly tied to bit field in register
IMEM_CTRL0
imem_rom_ahb_read_en output Directly tied to bit field in register
IMEM_CTRL0
soft_dec_en output Directly tied to bit field in register
FRAME_STATUS
pll_n_div output 9-bit signal directly tied to bit field in register
PLL_CTRL0
pll_r_div output 6-bit signal directly tied to bit field in register
PLL_CTRL0
pll_en output Directly tied to bit field in register PLL_CTRL0
bg_trim output 3-bit signal directly tied to bit field in register
GEN_CTRL0
iref_trim output 4-bit signal directly tied to bit field in register
GEN_CTRL0
all_pd_en output Directly tied to bit field in register GEN_CTRL0
io_bias_en output Directly tied to bit field in register GEN_CTRL0

– 59 –
4 Components

4.2.6 System Control

The module system control (SYSCTRL) handles reset and clock signals provided from analog
and scan. Figure 4.26 shows the block diagram with all inputs and outputs. The power-on-
reset (por_n_i) is an input vector whose size depends on a generic. Same is valid for the
output signals en_clk_o, clk_delta_early_o and clk_div_o. Table 4.13 provides an overview
and description of all generics including the signals which sizes depend on generics.

por_n_i en_clk_o
reset_n_i clk_delta_early_o
clk_i clk_div_o
scan_clk_i SYSCTRL sys_reset_n_o
scan_reset_n_i sys_clk_o

tm rst_o

Figure 4.26: Simple block diagram of SYSCTRL.

The generation of the internal resets sys_reset_n_o and rst_o is shown in figure 4.27,
where the external asynchronous resets reset_n_i and por_n_i are synchronized using a two-
stage synchronizer resulting in the synchronized signals reset_n_s and por_n_s. Both signals
can be masked when the chip is in test mode (see register TM_CTRL).
por_n_s

tm.ignore_por

tm.scan_mode

reset_n_s 0
sys_reset_n_o
scan_reset_n_i 1
tm.ignore_reset

rst_o

Figure 4.27: Logic circuit for output signal sys_reset_n_o.

SYSCTRL provides multiple clock sources to the system. The signal sys_clk_o provides
the fastest clock because it is a pass-through of the input signal clk_i. Whereas, the signal
clk_div_o in combination with en_clk_o provides clocks speeds depending on the set value of
the generic clk_div_cnt_g. The output clock frequency can be calculated as

clk_i
clk_div_o[index] = (4.1)
2index+1
where index is a number between 0 and clk_div_cnt_g-1.

– 60 –
4.2 Digital Building Block Library

Table 4.13: Generics of SYSCTRL and its associated signals.


Signal Generic Description

gen_reset_g Enables reset generation. Setting to zero leads


to a ignore of signal reset_n_i threatening it
as always released (reset_n_i=1)
por_n_i no_of_por_inputs_g Number of power-on-reset inputs
en_clk_o
clk_div_cnt_g Number of clock divisions
clk_div_o
clk_delta_early_o clk_delta_cnt_g Number of clock deltas

– 61 –
Short Title of your Thesis

Integration
5
5.1 MetaGen

The integration of all components described in chapter 4 is done using MetaGen. The specifica-
tion for connecting multiple MetaGen components and modules is using a clear-text file. Listing
5.1 provides an example of two MetaGen component instances and interface connections. This
file is used as input for the code generation resulting in SystemVerilog code. Listing 5.2 contains
a code snippet of the generated code using the previously described specification.
To use this integration flow it was necessary to wrap all generated components to match
the MetaGen flow for integration. For manual written modules, like SOCCTRL, no wrapping is
needed. Here the MetaGen flow is already followed.

Listing 5.1: Code snippet of the MetaGen specification file for the SoC digital top.
1 component a h b _ m a t r i x IFX : ATVPTS : AHB_MATRIX_RISCV_TC : 1 0 0 ;
2 component ahb2apb0 IFX : ATVPTS : AHB2APB : 1 1 0 ;
3
4 //AHBMATRIX SLAVES
5 c o n n e c t i o n ahbmatrix_s_apb0_con = a h b _ m a t r i x . ahb_s_apb0 , ahb2apb0 . ahb_s ;
6 //AHBMATRIX MASTER
7 c o n n e c t i o n ahbmatrix_m_spi_con = a h b _ m a t r i x . ahb_m_spi , h n d s h k 2 a h b . ahb_m ;

Listing 5.2: Code snippet of the SoC digital top SystemVerilog file generated using Meta-
Gen
1 // I n t e r f a c e I n s t a n c e s
2 a h b _ s l a v e _ i f d ahbmatrix_s_apb0_con ( ) ;
3
4 d_ahb_matrix_riscv_tc
5 inst_ahb_matrix

– 63 –
5 Integration

6 (
7 . reset_n_i ( sys_reset_n_con_s ) ,
8 . c l k _ i ( sys_clk_con_s ) ,
9 . ahb_s_apb0 ( ahbmatrix_s_apb0_con . M i r r o r e d S l a v e ) ,
10 . ahb_s_timer ( ahbmatrix_s_timer_con . M i r r o r e d S l a v e ) ,
11 . ahb_s_intr ( ahbmatrix_s_interrupt_con . MirroredSlave ) ,
12 . ahb_s_apb1 ( ahbmatrix_s_apb1_con . M i r r o r e d S l a v e ) ,
13 . ahb_s_data_ram ( ahbmatrix_s_dram_con . M i r r o r e d S l a v e ) ,
14 . ahb_s_instr_ram ( ahbmatrix_s_iram_con . M i r r o r e d S l a v e ) ,
15 . ahb_s_brom ( ahbmatrix_s_brom_con . M i r r o r e d S l a v e ) ,
16 . ahb_s_rom ( ahbmatrix_s_rom_con . M i r r o r e d S l a v e ) ,
17 . ahb_s_otp ( a h b m a t r i x _ s _ o t p _ c o n . M i r r o r e d S l a v e ) ,
18 . ahb_s_ecc ( a h b m a t r i x _ s _ e c c _ c o n . M i r r o r e d S l a v e ) ,
19 . ahb_m_spi ( ahbmatrix_m_spi_con . M i r r o r e d M a s t e r ) ,
20 . ahb_m_imem ( a h b m a t r i x _ m _ i n s t r _ c o n . M i r r o r e d M a s t e r ) ,
21 . ahb_m_dram ( ahbmatrix_m_data_con . M i r r o r e d M a s t e r )
22 );
23
24 d_ahb2apb
25 inst_ahb2apb0
26 (
27 . reset_n_i ( sys_reset_n_con_s ) ,
28 . c l k _ i ( sys_clk_con_s ) ,
29 . ahb_s ( ahbmatrix_s_apb0_con . S l a v e ) ,
30 . apb_m ( apb0_bus_con . M a s t e r )
31 );

5.2 Integration Notes

The AHB matrix provides more AHB slaves as depicted in figure 5.1. Nevertheless, unconnected
AHB slaves cause compilation errors during simulation. Therefore, all unused AHB slaves are
connected to dummies which are tying the signals to defined values. The same is valid for unused
input signals which are tied to constant signals inside the MetaGen specification.

– 64 –
5.2 Integration Notes

SoC Digital Top

AHB
AHB DRAM
SYSCTRL RISC-V
Core
AHB
AHB IRAM0

AHB IRAM1
HNDSHK
SPI SPI HNDSHK to AHB
AHB
AHB ROM

AHB Matrix
Boot
AHB
ROM
GPIO GPIO

AHB TIMER
ANALOG SAR
ADC
AHB
APB to AHB
APB
AHB Interrupt

TMCTRL

SOCCTRL

Figure 5.1: Block diagram of the System-on-Chip showing internal bus connections and interfaces
provided to the outside.

– 65 –
Short Title of your Thesis

Simulation
6
For simulation, a Universal Verification Methodology [16] (UVM) testbench written in Sys-
temVerilog [18] is used. However, an overview of UVM testbenches implemented using MetaGen
is not part of this thesis. As simulation software Cadence Xcelium Logic Simulator [19] version
18.09.009 with SimVision as GUI is used. At Infineon, the simulator setup is provided by the
development flow.

6.1 Test Bench

6.1.1 Setup

The test bench is generated using MetaGen and instantiates the following modules and interface
verification components (IVC):

• Device-Under-Test (DUT)

• SPI IVC

• SYSCTRL IVC

• GPIO IVC

The SoC is the DUT that is connected to all IVCs as sketched in figure 6.1. Due to different
naming conventions between DUT and IVCs, all signals must be manually wired. The SPI IVC
is configured to emulate a connected master. It can be used to write and read registers and
download a program to the instruction RAM. For checking the input and output pins the GPIO
IVC is used. This IVC is fully generated using MetaGen by setting enabling the sideband mode
of the generator. As a result, the generated IVC provides generic set and get functions for each

– 67 –
6 Simulation

signal. To provide a clock and reset signals the SYSCTRL IVC is instantiated and connected
to the DUT. The generated clock has a randomized period between 68ns and 74ns with a duty
cycle of 50%.

TB Top

IVC DUT IVC

Figure 6.1: Common testbench top with instantiated DUT connected to two IVCs.

6.1.2 Test Case

For simulating Software, only one UVM test case is needed. The test case called "test_sandbox"
is generated using MetaGen. Listing 6.1 shows the executed code of the test case. At first, the
clock is enabled followed by releasing the reset signals reset_n_i and por_n_i. After a waiting
time of 200ns, the startup location is chosen. Therefore, a write to register IMEM_CTRL0
selecting the instruction ROM as used memory for the startup is done. Another 10us are waited
before the register IMEM_CTRL0 is read two times. After another 10ms of waiting time, the
test case finishes and the simulation is stopped.

Listing 6.1: Code of test case sandbox.


1 sys_if_reset_release_seq rst_release_seq ;
2 sys_if_por_release_seq por_release_seq ;
3 sys_if_clock_on_seq clk_on_seq ;
4
5 spi_if_v2_data_read_seq spi_read_seq ;
6 spi_if_v2_data_write_seq spi_write_seq ;
7
8 ‘uvm_do_on ( c l k _ o n _ s e q , p _ s e q u e n c e r . s y s _ i f _ i _ a g e n t _ s e q u e n c e r ) ;
9 #1u s ;
10 ‘uvm_do_on ( r s t _ r e l e a s e _ s e q , p _ s e q u e n c e r . s y s _ i f _ i _ a g e n t _ s e q u e n c e r ) ;
11 #100 n s ;
12 ‘uvm_do_on ( p o r _ r e l e a s e _ s e q , p _ s e q u e n c e r . s y s _ i f _ i _ a g e n t _ s e q u e n c e r ) ;
13 #200 n s ;
14 ‘uvm_do_on_with ( s p i _ w r i t e _ s e q ,
15 p_sequencer . spi_if_i_data_agent_sequencer ,
16 {

– 68 –
6.2 Results

17 s p i _ a d d r == ’ he ;
18 s p i _ d a t a == ’ h2 ;
19 })
20 #10u s ;
21 ‘uvm_do_on_with ( s p i _ r e a d _ s e q ,
22 p_sequencer . spi_if_i_data_agent_sequencer ,
23 {
24 s p i _ a d d r == ’ he ;
25 })
26 ‘uvm_do_on_with ( s p i _ r e a d _ s e q ,
27 p_sequencer . spi_if_i_data_agent_sequencer ,
28 {
29 s p i _ a d d r == ’ he ;
30 })
31 #10ms ;

6.2 Results

All results are generated using the previously mentioned test case test_sandbox. The simulator
is started with the test case and the seed for the randomization is fixed to 0. This leads to a
randomized clock frequency of 14,482MHz. Additionally, the instruction ROM is filled at startup
with a binary file holding the two instructions for addition and multiplication. Due to the size
of the waveform figures in the following sub-sections, a landscape version of each is provided in
Appendix B.

6.2.1 CPU Boot

In figure 6.2 the start of the simulation is shown. At the waveform marker 9, the reset signal
reset_n_i of the SoC is released leading to invalid data on the instruction bus signal HRDATA.
During the first start of the simulation, this bug was discovered. To overcome the propagation
of faulty instruction data the reset signal rst_pin of the CPU is delayed by one clock cycle. The
release of the CPU reset is marked with the cursor TimeA. At this point, the CPU starts to
fetch data over the instruction bus to fill the pipeline.
Comparing the instruction bus signals HADDR and HRDATA with the snippet of the boot
code from listing 6.2 it can be observed that a delay of one clock cycle between data and address
is present. This delay is caused in the boot ROM module by the AHB-to-CSC bridge. However,
a solution to overcome this additional delay would be the direct use of an AHB instance in the
boot ROM.

– 69 –
6 Simulation

Figure 6.2: Simulator waveform showing the initialization of the RISC-V CPU and loading first
instruction from the boot ROM.

Listing 6.2: Initial part of the boot code shown in hexadecimal with disassembler infor-
mation
1 801C : 00100613 l i a2 , 1
2 8018: 06400693 l i a3 , 1 0 0
3 8014: 0040006 f j 8018 <main>
4 8010: 00000013 nop
5 800C : 00000013 nop
6 8008: 00000013 nop
7 8004: 00000013 nop
8 8000: 00000013 nop

Figure 6.4 shows the jump to instruction ROM after bit field STARTUP_LOC in register
IMEM_CTRL0 is set to 1. The SPI transmission with the according bus transaction is shown
in figure 6.3. Due to an SPI clock frequency of 5MHz in combination with an SPI transmission
size of 32 bits, it is not possible to show both in one figure. Nevertheless, marker 1 shows when
the written data is taken over by the register.
Marker 4 in figure 6.4 indicates the data AHB transmission start triggered by the CPU to
read the register IMEM_CTRL0. After the register access is done, marked by marker 5, the
CPU stores the read value into register x15. At marker 7 the CPU changes the instruction
address signal ib_addr to the start of the instruction ROM located at 0x00009000. At the next
clock cycle the stall of instruction decode and execute can be observed. After another four clock
cycles, the pipeline is again fully utilized.

– 70 –
6.2 Results

Figure 6.3: Simulator waveform showing the SPI transaction setting the startup location.

Figure 6.4: Simulator waveform showing the selection of startup location and the triggered jump
to the instruction ROM.

– 71 –
6 Simulation

6.2.2 Addition

As previously described the CPU performs the jump to the first address of the instruction
ROM, which is already pre-filled with two instructions. Whereas, the first instruction encodes
an addition, which shall add register x11 to register x12 and stores it to register x30.
During "instruction decode" the register values are read from the internal registers. Marker
13 in figure 6.5 marks the readout in the waveform. At the next clock cycle, the ALU performs
the addition. Finally, the data is stored during "write back", which is marked by TimeA.

Figure 6.5: Simulator waveform showing a simple addition with the signals of the RISC-V ALU
and registers.

6.2.3 Hardware Multiplication

The second instruction encodes a multiplication of register x10 and x11 storing the result to x29
using the hardware multiplier. Figure 6.6 shows the waveform with the executed multiplication.
However, two problems with the loaded ROM file showed up.
First, the location of the second instruction is expected to be 0x00009004. During the
simulation, it showed up that the second instruction is loaded at 0x0000A008. Nevertheless,
this is not a simulation-breaking issue but needs further investigation to find the root cause.
Second, the ROM misses the load instructions for register x10 and x11. Therefore, a force
of the registers to x10 = 0x2222 and x11 = 0x5 is applied.
At marker 18, "instruction decode" reads the registers. One clock cycle later, during "ex-
ecute", the hardware multiplier is utilized and performs the multiplication. Finally, two clock
cycles later the result 0x0000AAAA is stored to register x29.

– 72 –
6.2 Results

Figure 6.6: Simulator waveform showing a multiplication using the hardware multiplier added
by the RISC-V "M" extension.

6.2.4 Software Multiplication

For this simulation, an implementation of the C math library is used. Listing 6.3 provides a
code snippet of the C math library which shows the multiplication function mult_uint16(). The
code running in the simulation performs 8738 · 5 = 0xAAAA. Figure 6.7 shows the simulator
waveform with the instructions needed to just perform the multiplication. Marker 1 marks the
entry of the provided code snippet, whereas, marker Baseline marks the jump back to the main
routine.
After simulating both possible implementations and having a look at their waveforms. It can
be immediately seen that the hardware multiplier performs better. Nevertheless, a count of the
clock cycles between entering and leaving of the multiplication function was done with the result
of 64 cycles. Comparing that with the 6 clock cycles needed by the hardware multiplier this
results in a factor of 10,6 times faster calculation when any produced overhead of the software
multiplication is neglected.

– 73 –
6 Simulation

Figure 6.7: Simulator waveform showing a multiplication using a software implementation of


the C math library.

Listing 6.3: Code snipped of the C math library showing the 16bit unsigned integer
multiplication.
1 00004354 <m u l t _ u i n t 1 6 > ( F i l e O f f s e t : 0 x4354 ) :
2 mult_uint16 () :
3 4 3 5 4 : 00050793 mv a5 , a0
4 4 3 5 8 : 00000513 li a0 , 0
5 435 c : 00079463 b n e z a5 , 4 3 6 4 <m u l t _ u i n t 1 6+0x10> ( F i l e O f f s e t : 0
x4364 )
6 4 3 6 0 : 00008067 ret
7 4 3 6 4 : 0017 f 7 1 3 a n d i a4 , a5 , 1
8 4 3 6 8 : 00070463 b e q z a4 , 4 3 7 0 <m u l t _ u i n t 1 6+0x1c> ( F i l e O f f s e t : 0
x4370 )
9 436 c : 00 b50533 add a0 , a0 , a1
10 4 3 7 0 : 0017 d793 srli a5 , a5 , 0 x1
11 4 3 7 4 : 00159593 slli a1 , a1 , 0 x1
12 4378: f e 5 f f 0 6 f j 435 c <m u l t _ u i n t 1 6+0x8> ( F i l e O f f s e t : 0 x 4 3 5 c )

– 74 –
Short Title of your Thesis

Conclusion and Outlook


7
7.1 Conclusion

This work aimed to develop a System-On-Chip using Infineon’s code generation framework which
bases upon metamodeling. Research on the RISC-V ISA and code generation was done for a
common understanding of the used technologies and methodologies.
All necessary components were generated using MetaGen and MetaRTL and combined with
handwritten modules from DBBL to form an SoC. This SoC was instantiated into a UVM
testbench and properly connected to IVCs. Finally, a simulator setup including one test case
was implemented. The simulation was used to debug the SoC and provide simulations waveforms
showing the CPU operation.
As targeted by this thesis, it could be proven that MetaGen with MetaRTL is capable of
providing more complex RTL modules with optional extensions and configurations. In addition,
its flexibility regarding combining generated code with handwritten code was shown. The final
outcome of this work is a simulatable SoC including a RISC-V RV32IMC 5-stage CPU.

7.2 Outlook

The SoC is working but some issues must be addressed in the future. First, the versioning of
MetaRTL needs to be adopted to check frequently that all modules can be generated using the
default setup. As already mentioned in chapter 4, versions of some dependencies changed and
the setup was using wrong or outdated versions causing generation failures. In that case, the
solution could be adopting the setup to only use a certain version of the used dependencies or
to update the module to support the newest version. The previous mentioned frequently check

– 75 –
7 Conclusion and Outlook

could be implemented using a Jenkins3 setup.


Second, some combination of generation options inside the GUI produce failures during
generation. This especially could be observed during the generation of the RISC-V CPU. In
case not all combinations are intended to be used more comprehensive documentation would be
needed to overcome that issue.
Third, some modules do produce not compileable RTL code. This was also faced during
generation of the RISC-V CPU. In that case, it is not clear if an invalid option combination or
a missing default implementation inside MetaRTL is missing.
For the future, the SoC shall be extended with the implementation of an internal Joint
Test Action Group [20] (iJTAG) module. The iJTAG module shall be provided by MetaRTL
and implements JTAG to be used inside of the SoC. The common JTAG implementations were
designed to test an SoC or integrated circuit (IC) on a Printed Circuit Board (PCB). Nowadays,
ICs like SoCs already implement multiple modules and components inside one chip. Therefore,
JTAG could be insufficient to test the behavior.
In future, it is also planned to prepare the SoC to be used on a FGPA or to be manufactured
as an ASIC. For FPGA some modifications in the code must be implemented. One example
of such a code change is the replacement of the ROM and RAM IPs with IPs provided by the
FPGA manufacturer to use the available memory of the FPGA. In case an ASIC should be
manufactured a scan implementation is needed and additional linting checks must be done.

3
[Link]

– 76 –
Short Title of your Thesis

Registers
A
A.1 Interrupt Registers

INTERRUPT_CTRL

Memory Set SFR


Base Address 0x0D80
Register Offset 0x0000
D
SE
U
N
U

31 16

EN
0_

ER N
EN
E
EN

EN

EN

EN

1_

0_
D
D

3_

2_

1_

0_

ER
SE

_
O

O
U

IM

IM
PI

PI

PI

PI
N

SA
G

T
U

15 7 6 5 4 3 2 1 0

- rw-(0) rw-(0) rw-(0) rw-(0) rw-(0) rw-(0) rw-(0)

GPIO3_EN Enables the interrupt for GPIO3

GPIO2_EN Enables the interrupt for GPIO2

GPIO1_EN Enables the interrupt for GPIO1

GPIO0_EN Enables the interrupt for GPIO0

SAR_ADC0_EN Enables the interrupt for SAR ADC

TIMER1_EN Enables the interrupt for Timer1

– 77 –
A Registers

TIMER0_EN Enables the interrupt for Timer0

A.2 Timer Registers

TIM0CTRLSTAT

Memory Set SFR


Base Address 0x0B80
Register Offset 0x0000
D
SE
U
N
U

31 16

-
D

D
n
tE
SE

SE

En
In
U

im
vf
N

N
O
U

T
15 8 7 6 1 0

- rw-(0) - rw-(0)

OvfIntEn Enables reset from external interrupt controller

TimEn Timer enable

TIM0ACTVAL

Memory Set SFR


Base Address 0x0B80
Register Offset 0x0004
L
VA
T
C
A

31 16

rw-(0x0000)
L
VA
T
C
A

15 0

rw-(0x0000)

ACTVAL Actual counter value of first timer

– 78 –
A.2 Timer Registers

TIM0MAXVAL

Memory Set SFR


Base Address 0x0B80
Register Offset 0x0008

L
VA
X
A
M
31 16

rw-(0x0000)

L
VA
X
A
M
15 0

rw-(0x0000)

MAXVAL Maximum counter value of first timer

TIM0CCUCTRL0

Memory Set SFR


Base Address 0x0B80
Register Offset 0x000C
D
SE
U
N
U

31 16

-
D

0
SE

IM
0
M
U

ap
N

C
U

15 6 5 3 2 0

- rw-(0x0) rw-(0x0)

CCM0 Capture Compare Mode 0.


• 0x00 - Write ACTVAL to CCUVAL on ExtCap trigger
• 0x01 - Compare: ACT V AL < CCU V AL
• 0x02 - Compare: ACT V AL <= CCU V AL
• 0x03 - Compare: ACT V AL > CCU V AL
• 0x04 - Compare: ACT V AL >= CCU V AL
• 0x05 - Compare: ACT V AL == CCU V AL
• 0x06 - Compare: ACT V AL! = CCU V AL
• 0x07 - Capture/Compare disabled

– 79 –
A Registers

CapIM0 Input mode for all external input signals.


• 0x00 - Input has no effect on timer
• 0x01 - Enable when signal changes from 0 to 1
• 0x02 - Enables when signal changes from 1 to 0
• 0x03 - Signal triggers as long as its high
• 0x04 - Signal triggers as long as its low
• 0x05 - Signal triggers on any edge
• 0x06 - Reserved for future use
• 0x07 - Reset Counter when reached 0 (only for ExtRes, for others
undefined

TIM0CCUVAL0

Memory Set SFR


Base Address 0x0B80
Register Offset 0x0010
L0
VA
U
C
C

31 16

rw-(0x0000)
L0
VA
U
C
C

15 0

rw-(0x0000)

CCUVAL0 Capture compare value 0

TIM1CTRLSTAT

Memory Set SFR


Base Address 0x0B80
Register Offset 0x0018

– 80 –
A.2 Timer Registers

D
SE
U
N
U
31 16

D
n
tE
SE

SE

En
In
U

im
vf
N

N
O
U

T
15 8 7 6 1 0

- rw-(0) - rw-(0)

OvfIntEn Enables reset from external interrupt controller

TimEn Timer enable

TIM1ACTVAL

Memory Set SFR


Base Address 0x0B80
Register Offset 0x001C
L
VA
T
C
A

31 16

rw-(0x0000)
L
VA
T
C
A

15 0

rw-(0x0000)

ACTVAL Actual counter value of second timer

TIM1MAXVAL

Memory Set SFR


Base Address 0x0B80
Register Offset 0x0020

– 81 –
A Registers

L
VA
X
A
M
31 16

rw-(0x0000)

L
VA
X
A
M
15 0

rw-(0x0000)

MAXVAL Maximum counter value of second timer

TIM1CCUCTRL0

Memory Set SFR


Base Address 0x0B80
Register Offset 0x0024
D
SE
U
N
U

31 16

-
D

0
SE

IM
0
M
U

ap
N

C
U

15 6 5 3 2 0

- rw-(0x0) rw-(0x0)

CCM0 Capture Compare Mode 0.


• 0x00 - Write ACTVAL to CCUVAL on ExtCap trigger
• 0x01 - Compare: ACT V AL < CCU V AL
• 0x02 - Compare: ACT V AL <= CCU V AL
• 0x03 - Compare: ACT V AL > CCU V AL
• 0x04 - Compare: ACT V AL >= CCU V AL
• 0x05 - Compare: ACT V AL == CCU V AL
• 0x06 - Compare: ACT V AL! = CCU V AL
• 0x07 - Capture/Compare disabled

– 82 –
A.2 Timer Registers

CapIM0 Input mode for all external input signals.


• 0x00 - Input has no effect on timer
• 0x01 - Enable when signal changes from 0 to 1
• 0x02 - Enables when signal changes from 1 to 0
• 0x03 - Signal triggers as long as its high
• 0x04 - Signal triggers as long as its low
• 0x05 - Signal triggers on any edge
• 0x06 - Reserved for future use
• 0x07 - Reset Counter when reached 0 (only for ExtRes, for others
undefined

TIM1CCUVAL0

Memory Set SFR


Base Address 0x0B80
Register Offset 0x0028
L0
VA
U
C
C

31 16

rw-(0x0000)
L0
VA
U
C
C

15 0

rw-(0x0000)

CCUVAL0 Capture compare value 0

TIM1CCUCTRL1

Memory Set SFR


Base Address 0x0B80
Register Offset 0x0024

– 83 –
A Registers

D
SE
U
N
U
31 16

1
SE

M
1
M

I
U

ap
N

C
U

C
15 6 5 3 2 0

- rw-(0x0) rw-(0x0)

CCM1 Capture Compare Mode 1.


• 0x00 - Write ACTVAL to CCUVAL on ExtCap trigger
• 0x01 - Compare: ACT V AL < CCU V AL
• 0x02 - Compare: ACT V AL <= CCU V AL
• 0x03 - Compare: ACT V AL > CCU V AL
• 0x04 - Compare: ACT V AL >= CCU V AL
• 0x05 - Compare: ACT V AL == CCU V AL
• 0x06 - Compare: ACT V AL! = CCU V AL
• 0x07 - Capture/Compare disabled
CapIM1 Input mode for all external input signals.
• 0x00 - Input has no effect on timer
• 0x01 - Enable when signal changes from 0 to 1
• 0x02 - Enables when signal changes from 1 to 0
• 0x03 - Signal triggers as long as its high
• 0x04 - Signal triggers as long as its low
• 0x05 - Signal triggers on any edge
• 0x06 - Reserved for future use
• 0x07 - Reset Counter when reached 0 (only for ExtRes, for others
undefined

TIM1CCUVAL1

Memory Set SFR


Base Address 0x0B80
Register Offset 0x0028

– 84 –
A.3 Test Mode Control Registers

L1
VA
U
C
C
31 16

rw-(0x0000)

L1
VA
U
C
C
15 0

rw-(0x0000)

CCUVAL1 Capture compare value 1

A.3 Test Mode Control Registers

TM_MODES

Memory Set TM_USER


Base Address 0x0010
Register Offset 0x0000

S
U

U
AT

AT

AT
ST

ST

ST
_

_
3

1
E

EL

EL

EL
D
O

V
D

D
M

LE

LE

LE
SE

SE
_

_
C
U

M
R
N

N
C

T
U

15 10 9 8 7 6 5 4 3 2 1 0

- rw-(0x0) - r-(0x1) r-(0x1) r-(0x1)

CRC_MODE CRC mode selection bits. Select the used CRC algo-
rithm in the host communication interface module.
• 0x0 - CRC24
• 0x1 - CRC40
• 0x2 - NO_CRC
• 0x3 - IGN_CRC
TM_LEVEL3_STATUS Read only. Bits for indicating test mode level 3 status.
• 0x0 - Undefined
• 0x1 - Disabled
• 0x2 - Enabled
• 0x3 - Undefined

– 85 –
A Registers

TM_LEVEL2_STATUS Read only. Bits for indicating test mode level 2 status.
• 0x0 - Undefined
• 0x1 - Disabled
• 0x2 - Enabled
• 0x3 - Undefined
TM_LEVEL1_STATUS Read only. Bits for indicating test mode level 1 status.
• 0x0 - Undefined
• 0x1 - Disabled
• 0x2 - Enabled
• 0x3 - Undefined

TM_LEVEL2_KEY

Memory Set TM_SFR


Base Address 0x0010
Register Offset 0x0100
E
LU
VA
_
EY
K

15 0

rw-(0x0000)

KEY_VALUE Test mode level 2 activation key. A valid key activates level 2. An
invalid key leaves level 2 and all higher active test mode levels.

TM_LEVEL3_KEY

Memory Set TM_SFR


Base Address 0x0010
Register Offset 0x0101
E
LU
VA
_
EY
K

15 0

rw-(0x0000)

KEY_VALUE Test mode level 3 activation key. A valid key activates level 3. An
invalid key leaves level 3.

– 86 –
A.3 Test Mode Control Registers

TM_EXIT_ALL_KEY

Memory Set TM_SFR


Base Address 0x0010
Register Offset 0x0102

E
LU
VA
_
EY
K
15 0

rw-(0x0000)

KEY_VALUE Test mode exit all key. A valid key deactivates the test mode.

TM_OTP_STAT

Memory Set TM_SFR


Base Address 0x0010
Register Offset 0x0103

G
R

IN
O

R
R
R

N
O
ER
D

R
R
D

A
ER
D

W
A

IN
SE

R
_

_
W

G
IL

C
U

IR
V

EC

EC
N
FA

O
U

15 8 7 4 3 2 1 0

rw-(0x00) - rw-(0) rw-(0) rw-(0) rw-(0)

FAIL_ADDR Address of the last occurred ECC warning/error. Write all


zero to clear.

OVWR_ERROR Over voltage write error. Write zero to clear.

VIRGIN OTP was never written. Write zero to clear.

ECC_ERROR An ECC error occured. The fail address is stored in


FAIL_ADDR. Write zero to clear.

ECC_WARNING An ECC warning occured. The fail address is stored in


FAIL_ADDR. Write zero to clear.

– 87 –
A Registers

TM_SCAN_CTRL

Memory Set TM_SFR


Base Address 0x0010
Register Offset 0x0104

N
_
K
C
LO
E_
IT

EN
EN

R
D

_
_
SE

P_

N
Q
U

A
D

T
N

SC
ID

O
U

15 5 4 3 2 1 0

- rw-(0x1) rw-(0) rw-(0x1)

IDDQ_EN Enables the IDDQ mode.


• 0x0 - Invalid
• 0x1 - Disabled
• 0x2 - Enabled
• 0x3 - Invalid
OTP_WRITE_LOCK_N Locks the OTP for further write
operations.

SCAN_EN Enables the scan mode.


• 0x0 - Invalid
• 0x1 - Disabled
• 0x2 - Enabled
• 0x3 - Invalid

TM_CTRL

Memory Set TM_SFR


Base Address 0x0010
Register Offset 0x0105
R

ES
PO

R
E_

E_
D

D
R

R
SE

SE
O

O
U

U
N

N
N

N
IG

IG
U

15 10 9 8 7 6 5 0

- rw-(0x1) rw-(0x1) -

– 88 –
A.4 SAR ADC Registers

IGNORE_POR Enables the "Ignore Power On Reset" function. This bit disables
the power on reset for testing purpose.
• 0x0 - Invalid
• 0x1 - Disabled
• 0x2 - Enabled
• 0x3 - Invalid
IGNORE_RES Enables the "Ignore Reset" function. This bit disables the reset
for testing purpose.
• 0x0 - Invalid
• 0x1 - Disabled
• 0x2 - Enabled
• 0x3 - Invalid

TM_TRIM_CTRL

Memory Set TM_SFR


Base Address 0x0010
Register Offset 0x0106

K
C
LO
N
U
D

_
SE

IM
U

R
N

T
U

15 1 0

- rw-(0)

TRIM_UNLOCK Unlocks trimming registers. The trimming bits are locked after
first write, therefore, for further modifications this unlock must
be set.

A.4 SAR ADC Registers

ADC_CTRL

Memory Set GPIO_SFR


Base Address 0x0030
Register Offset 0x0000

– 89 –
A Registers

FG
FG

C
C

L
_
S_

A
D

C
SE

L
L
ER

A
A
A
U

C
SC
C
V

R
N

ST

EP
ST
O

D
U

15 10 9 8 7 6 5 4 3 2 1 0

- rw-(0x3) rw-(0x3) rw-(x03) rw-(0x3) rw-(1) rw-(1)

ADC_COMP_CONF

Memory Set GPIO_SFR


Base Address 0x0030
Register Offset 0x0001
EN
E_
PL
M
SA

L
O
EN

VA
N

D
P_

P_

P_
SE
M

M
U
O

O
N
C

C
U

15 14 13 12 0

rw-(0) rw-(0) - rw-(0x0001)

DYN_SAMPLING_CTRL

Memory Set GPIO_SFR


Base Address 0x0030
Register Offset 0x0002
G
IN
PL
M
SA
_
N
D

Y
SE

D
_
U

EN
N
U

15 4 3 0

- rw-(0x0)

EN_DYN_SAMPLING

SAMPLE_CTRL1_X

Memory Set GPIO_SFR


Base Address 0x0030
Register Offset 0x0003 + X

– 90 –
A.5 GPIO Registers

EX
D
IN

R
_

N
ER

H
C
D

D
T

S_
SE

SE
IS

EA
EG
U

U
N

M
R
U

U
15 13 12 8 7 5 4 0

- rw-(0x00) - rw-(0x00)

SAMPLE_CTRL2_X

Memory Set GPIO_SFR


Base Address 0x0030
Register Offset 0x0008 + X

EX
D
IN
_
ER
IN
D

G
SE

IG
U

R
N

LV

T
U

15 7 5 4 0

- rw-(0) rw-(0x00)

RESULTS_X

Memory Set GPIO_SFR


Base Address 0x0030
Register Offset 0x000C + X
T
N
C
_
ER
D

G
SE

E
LU
IG
U

R
N

VA
T
U

15 14 12 11 0

- r-(0x0) r-(0x000)

A.5 GPIO Registers

GP_IN0

Memory Set GPIO_SFR


Base Address 0x0020
Register Offset 0x0000

– 91 –
A Registers

A
AT
D
15 0

r-(0x0000)

DATA Holds input value of gpio pins 0 to 15 when selected as input in GP_CTRL0

GP_IN1

Memory Set GPIO_SFR


Base Address 0x0020
Register Offset 0x0001
A
AT
D

15 0

r-(0x0000)

DATA Holds input value of gpio pins 16 to 31 when selected as input in GP_CTRL1

GP_OUT0

Memory Set GPIO_SFR


Base Address 0x0020
Register Offset 0x0004
A
AT
D

15 0

rw-(0x0000)

DATA Holds output value of gpio pins 0 to 15 when selected as output in


GP_CTRL0

GP_OUT1

Memory Set GPIO_SFR


Base Address 0x0020
Register Offset 0x0005
A
AT
D

15 0

rw-(0x0000)

– 92 –
A.5 GPIO Registers

DATA Holds output value of gpio pins 16 to 31 when selected as output in


GP_CTRL1

GP_CTRL0

Memory Set GPIO_SFR


Base Address 0x0020
Register Offset 0x0008

EN
P_
T
U
O
15 0

rw-(0x0000)

OUTP_EN Data flow of gpio pins 0 to 15. A one indicates that the gpio is used
as output.

GP_CTRL1

Memory Set GPIO_SFR


Base Address 0x0020
Register Offset 0x0009
EN
P_
T
U
O

15 0

rw-(0x0000)

OUTP_EN Data flow of gpio pins 16 to 31. A one indicates that the gpio is used
as output.

GPO_OUT_MUX0

Memory Set GPIO_SFR


Base Address 0x0020
Register Offset 0x000A
ue

ue

ue

ue
l

l
D

va

va

va

va
SE

3_

2_

_
L1

L0
U

L
N

SE

SE

SE

SE
U

15 8 7 6 5 4 3 2 1 0

- rw-(0x0) rw-(0x0) rw-(0x0) rw-(0x0)

– 93 –
A Registers

SEL3_value • 0x0 - Invalid


• 0x1 - Invalid
• 0x2 - GPIO2 interrupt
• 0x3 - GPIO3 interrupt
SEL2_value • 0x0 - Invalid
• 0x1 - Timer 1 CCU1 Output mapped to pin 2
• 0x2 - GPIO2 interrupt
• 0x3 - GPIO3 interrupt
SEL1_value • 0x0 - Invalid
• 0x1 - Timer 1 CCU0 Output mapped to pin 1
• 0x2 - GPIO0 interrupt
• 0x3 - GPIO1 interrupt
SEL0_value • 0x0 - Invalid
• 0x1 - Timer 0 CCU0 Output mapped to pin 0
• 0x2 - GPIO0 interrupt
• 0x3 - GPIO1 interrupt

A.6 SOCCTRL Registers

FRAME_RECEIVED_LO

Memory Set SFR


Base Address 0x0000
Register Offset 0x0000
A
AT
D

15 0

r-(0)

DATA Read only. Contains lower 16bit received data from the last SPI transaction.

FRAME_RECEIVED_HI

Memory Set SFR


Base Address 0x0000
Register Offset 0x0001

– 94 –
A.6 SOCCTRL Registers

A
AT
D
15 0

r-(0)

DATA Read only. Contains upper 16bit received data from the last SPI transaction.

FRAME_TRANSMIT_LO

Memory Set SFR


Base Address 0x0000
Register Offset 0x0002

A
AT
D
15 0

w-(0)

DATA Write only. Contains lower 16bit transmit data for the next SPI transaction.

FRAME_TRANSMIT_HI

Memory Set SFR


Base Address 0x0000
Register Offset 0x0003
A
AT
D

15 0

w-(0)

DATA Write only. Contains upper 16bit transmit data for the next SPI transaction.

FRAME_STATUS

Memory Set SFR


Base Address 0x0000
Register Offset 0x0004
L
C

R
EX

O
R

IL
_

ER
EC

A
AV
E_
D
D

_
_
SE

SE

A
FT

AT
U

FR
N

N
SO

D
U

15 9 8 7 2 1 0

- rw-(0) - rw-(0) rw-(0)

– 95 –
A Registers

SOFT_DEC_EXCL Enables software handling of the SPI frames.

FRAME_ERROR Status bit indicating that a frame error occured. Must be


cleared by user.
DATA_AVAIL Status bit indicating that a new frame was received. Must
be cleared by user after each read-out of the received data.

IMEM_CTRL0

Memory Set SFR


Base Address 0x0000
Register Offset 0x0005

EN
EN

S_
_

ES
D
EA

C
C

O
A
R

Y
L
_

D
P_
B

EA
H

H
D

U
A

R
SE

RT
_

_
M

M
U

A
O

A
N

ST
R

R
U

15 5 4 3 2 1 0

- rw-(0) rw-(0) rw-(0) r-(0)

ROM_AHB_READ_EN Enable ROM read access over AHB.

RAM_AHB_ACCESS_EN Enable RAM read write access over AHB.

STARTUP_LOC Selection of the instruction memory location.


• 0x00 - Undefined
• 0x01 - ROM
• 0x02 - RAM
• 0x03 - Undefined
RAM_READY Read only. Status bit indicating that RAM is ready
for use.

PLL_CTRL0

Memory Set SFR


Base Address 0x0000
Register Offset 0x0006

– 96 –
A.6 SOCCTRL Registers

LE
IV

IV

B
D

A
_

EN
N

R
15 7 6 1 0

rw-(0) rw-(0) rw-(0)

N_DIV Numerical devider for the PLL clock.

R_DIV Rational devider for the PLL clock.

ENABLE Enable the PLL clock.

GEN_CTRL0

Memory Set SFR


Base Address 0x0000
Register Offset 0x0007
EN

EN

IM
S_

IM
_

R
PD
D

T
IA

R
SE

SE

T
_
B

EF

_
U

U
LL
_

G
N

N
IO

IR
A

B
U

15 10 9 8 7 6 3 2 0

- rw-(0) rw-(0) - rw-(0) rw-(0)

IO_BIAS_EN Trim register for band gap.

ALL_PD_EN Trim register for current reference.

IREF_TRIM Enable all pull-down resistors.

BG_TRIM Enable IO bias.

– 97 –
Short Title of your Thesis

– 98 –
Short Title of your Thesis

Simulation Waveforms
B

– 99 –
B Simulation Waveforms

Figure B.1: Simulator waveform showing the initialization of the RISC-V CPU and loading first instruction from the boot ROM.

– 100 –
Figure B.2: Simulator waveform showing the selection of start up location and the triggered jump to the instruction ROM.

– 101 –
A.6 SOCCTRL Registers
Figure B.3: Simulator waveform showing the SPI transaction setting the startup location.
B Simulation Waveforms

– 102 –
Figure B.4: Simulator waveform showing a simple addition with the signals of the RISC-V ALU and registers.

– 103 –
A.6 SOCCTRL Registers
Figure B.5: Simulator waveform showing a multiplication using the hardware multiplier added by the RISC-V "M" extension.
B Simulation Waveforms

– 104 –
Figure B.6: Simulator waveform showing a multiplication using a software implementation of the C math library.

– 105 –
A.6 SOCCTRL Registers
Short Title of your Thesis

Bibliography

[1] A. K. SiFive Inc. (2019, Dec.) The RISC-V Instruction Set Manual Volume I:
Unprivileged ISA. [Last visit 29.10.2021]. [Online]. Available: [Link]
riscv-isa-manual/releases/download/Ratified-IMAFDQC/[Link]

[2] K. Devarajegowda, “Model-base generation of assertions for pre-silicon verification,” Ph.D.


dissertation, TU Kaiserslautern, 2021.

[3] J. Schreiner, “Automated generation of pipelined risc cpus following themodel-driven ar-
chitecture principle,” Master’s thesis, TU München, 2016.

[4] L. Research. What is mda? why concerns bpmn? [last visited 15.12.2021]. [Online].
Available: [Link]

[5] ARM Limited, AMBA3 AHB-Lite Protocol Specification, ARM Std., [Last vis-
ited 22.11.2021]. [Online]. Available: [Link]
5f914801f86e16515cdc2a27?token=

[6] ——, AMBA3 APB Protocol Specification, ARM Std., [Last visited 22.11.2021]. [Online].
Available: [Link]
token=

[7] Wikipedia. Classic risc pipeline. [last visited 17.12.2021]. [Online]. Available: https:
//[Link]/wiki/Classic_RISC_pipeline

[8] K. Crystal Chen, Greg Novick. risc vs. cisc. [Last visit 22.11.2021]. [Online]. Available:
[Link]

[9] RISC-V International, “History of risc-v,” 2021, [Last visit 29.10.2021]. [Online]. Available:
[Link]

– 107 –
BIBLIOGRAPHY

[10] F. Truyen. (2006) The basics of model driven architecture. [last visited 15.12.2021]. [Online].
Available: [Link]

[11] E. Wolfgang and S. Johannes, “Introducing model-of-things (mot)and model-of-design


(mod) for simpler and more efficient hardware generators,” in 2016 IFIP/IEEE
International Conference on Very Large Scale Integration (VLSI-SoC), 2016. [Online].
Available: [Link]

[12] ARM Limited. Amba3 overview. [Last visited 22.11.2021]. [Online]. Available: https:
//[Link]/architectures/system-architectures/amba

[13] ——. Amba design kit technical reference manual. [last visited 15.12.2021]. [Online]. Avail-
able: [Link]

[14] AUTOSAR CRC Routines, AUTOSAR Std., [Last access on 12.01.2021]. [On-
line]. Available: [Link]
AUTOSAR_SWS_CRCLibrary.pdf

[15] S. Engineering. Scan test. [last visited 16.12.2021]. [Online]. Available: https:
//[Link]/knowledge_centers/test/scan-test-2/

[16] K. A Meade, S. Rosenberg, A Practical Guide to Adopting the Universal Verification


Methodology (UVM), second edition ed. Cadence Design Systems, Inc. (Cadence), 2655
Seely Ave., San Jose, CA95134, USA: Cadence Design Systems, 2013.

[17] T. Grubelnik, “Verification of an interface module using the universal verification method-
ology,” 2018.

[18] C. Spear, G. Tumbush, SystemVerilog for Verification - A Guide to Learning the Tesbench
Language Features, third edition ed. Chris Spear, Synopsys, Inc., Marlborough, MA, USA
Greg Tumbush, University of Colorado, Colorado Springs, Colorado Springs, CO ,USA:
Springer, 2012.

[19] Cadence. Xcelium logic simulator. [last visited 15.12.21]. [Online]. Avail-
able: [Link]
simulation-and-testbench-verification/[Link]

– 108 –
BIBLIOGRAPHY

[20] A. I. Inc. (2011) Ieee p1687 internal jtag (ijtag) tutorial. [last visited 15.12.2021]. [On-
line]. Available: [Link]
[Link]

– 109 –

You might also like