Embedded Software Design
&
Field Programmable Gate Arrays
(FPGA)
SERDAR DURAN
ITU Embedded System Design Laboratory
1 gstl.itu.edu.tr
Embedded Software Design
2 gstl.itu.edu.tr
Embedded Software Design
• Embedded software engineers are tasked with determining the best
implementation platform to get a project to market.
• Implementation platform includes:
• Programming languages (C, C++, Python etc.)
• Programming environments (IDEs, Tools)
• Operating systems if any (Linux Distributions)
• Hardware platforms ( Microprocessors, Development Boards/Kits, SoCs and Heterogeneous
Hardware )
SoC: System on a chip (all in one chip: peripherals, memory and uP)
Heterogeneous Hardware: containing co-processors and/or programmable logic out of the microprocessors
and peripherals (can be also SoC).
3 gstl.itu.edu.tr
Raspberry Pi 4 Development Board
• Arm-based microprocessor
• SDRAM off-chip memory
• Bluetooth
• Wireless support
• USB interfaces
• GPIO ports
• Lots of peripherals
4 gstl.itu.edu.tr
Xilinx Zynq7000 SoC that includes:
• Arm-based microprocessor
• On-chip memory
• Programmable Logic (FPGA)
• Lots of interfaces
5 gstl.itu.edu.tr
Avnet Zedboard includes:
• Zynq7000 SoC
• External DDR3 RAM
• LEDs, Switches
• Lots of peripherals
6 gstl.itu.edu.tr
Embedded Software Design
• To select an implementation platform, there are some parameters to be evaluated:
• Size-Throughput-Power tradeoffs,
• Latency,
• Flexibility (ability to modify),
• Cost (cash is king!)
• Due to the physical limitations of the transistors size, the semiconductor technologies is
not developing fast compared to the past decades.
• However, computer algorithms are getting more and more complicated (especially AI
and ML applications) and demanding more resources and speed.
• In recent years, regarding where to run an algorithm, there is an increasing focus on
parallelization and concurrency.
7 gstl.itu.edu.tr
number of cores increasing
❖ Semiconductor advancement has slowed
industry-wide since around 2010.
8 gstl.itu.edu.tr
Embedded Software Design
• Besides, for a large set of applications especially real-time applications,
application-specific processors are created.
• Digital Signal Processors (DSP),
• Graphics Processing Unit (GPU)
• Some Real-Time Applications:
• Communications,
• Radar systems,
• Video processing,
• Avionic systems
9 gstl.itu.edu.tr
Embedded Software Design
• DSPs and GPUs are capable of executing an algorithm written in a high-level languages,
e.g. C, C++, Python.
• They have function-specific accelerators and
have function-specific instructions to improve
the overall performance.
• For example: Multiply and Accumulate (MAC)
units to perform various convolution
operations, dot products, matrix multiplications
and Fast-Fourier Transform (FFT).
A Basic MAC Unit
10 gstl.itu.edu.tr
Texas Instruments 66AK2Hxx Multicore
DSP+ARM Keystone II SoC
• Arm-based microprocessors
• TI multicore DSPs,
• On-chip Memory,
• Specific memory architecture and memory
management system,
• Lots of interfaces
Typical applications:
• Communications,
• Networking,
• Radar (Mission Critical),
• Audio processing
11 gstl.itu.edu.tr
FPGAs and ASICs
• For getting more performance out of general or specialized processors, custom-
integrated circuits (ASIC) and programmable logic units (FPGA) can be used.
• To benefit from Microprocessors flexibility along with FPGAs and ASICs performance,
heterogeneous computing platforms has been developed.
Cost Time to market Flexibility Performance
uProcessors
☺ ☺ ☺
FPGA
ASIC
☺
12 gstl.itu.edu.tr
What is an FPGA?
• In a microprocessor, the computation hardware (circuit) is fixed, and the
compiler determines how to best fit the instructions for software application.
• An FPGA is a box of logic blocks. The job of the FPGA tools is to build a logic
circuit by making interconnections between these logic blocks that best fits the
HDL code.
• Hardware Description Languages (HDLs) and specific FPGA tools/IDEs are used
for programming FPGAs.
• Hardware Description Languages (HDLs) describe the structure and behavior of digital
logic circuits.
• Verilog and VHDL are most common HDLs.
13 gstl.itu.edu.tr
FPGA Parallelism
• In general, each instruction of a microprocessor must go through the following
stages (an instruction cycle):
1. Instruction fetch (IF)
2. Instruction decode (ID)
3. Execute (EXE)
4. Memory operations (MEM)
5. Write back (WB)
• However there is no need for these overhead stages in an FPGA.
14 gstl.itu.edu.tr
FPGA Parallelism
Instruction Cycles (pipelined) FPGA Equivalent
for a uProcessor
15 gstl.itu.edu.tr
FPGA Architecture
& Design Flow
16 gstl.itu.edu.tr
FPGA Architecture
• An FPGA is an integrated circuit (IC) that can be programmed for different logic
circuits after fabrication.
• FPGA devices consist of thousands logic cells that can be configured in any time.
• Advantages of the FPGA devices over ASICs are flexibility and low cost.
• The basic structure of an FPGA is composed of the following elements:
• Look-up table (LUT): Basic element of the FPGA, performs logic operations.
• Flip-Flop (FF): Stores the output of the LUTs.
• Wires and Switches: Connect elements to one another.
• Input/Output (I/O) pads: Physically available ports get data in and out of the FPGA.
17 gstl.itu.edu.tr
Look-up table (LUT)
• LUT is the basic building block of an
FPGA and is capable of implementing
any logic function
• It is essentially a truth table in which
different combinations of the inputs
implement different functions to get
output values.
RAM Cells (Truth Table)
Multiplexers, x1-x0 are
selection bits
18 gstl.itu.edu.tr
Look-up table (LUT)
• The limit on the size of the truth table is N, where N represents the number of
inputs to the LUT.
• N input (selection bits )-> 2N memory locations -> (2N) N logic functions
FF is necessary for building
MUX
sequential circuits.
19 gstl.itu.edu.tr
Look-up table (LUT)
20 gstl.itu.edu.tr
Switch Blocks
❖ Switch Blocks and
interconnections
Modern designs may also contain: between LUTs are
• DSP Blocks also programmed.
• Specific Arithmetic Blocks (Multiplier)
• Block RAMs
21 gstl.itu.edu.tr
Xilinx Artix-7 FPGA Chip
• 15,850 logic slices, each with
four 6-input LUTs and 8 flip-flops
• 4,860 Kbits of block RAM
• Six clock management tiles, each
with phase-locked loop (PLL)
• 240 DSP slices
• Internal clock speeds exceeding
450 MHz
NEXSYS DDR 4 Development Kit
22 gstl.itu.edu.tr
FPGA Design Flow
Behavioral Simulate Test • Coding HDL, C - C++, MATLAB or Python.
Algorithm Results • Describing the behavior of the circuit.
• Textual represantation.
Simulate • High-level represantation of the circuit.
Register Test
• Describing registers and combinatorial logic.
Transfer Level Results • Schematical represantation at high-level.
Simulate • Generating gates and their interconnections
Test using synthesis tools.
Gate Level Netlist Results • Schematical represantation.
• Generate the layout (physical chip/circuit) by
Place + Route using placement/routing tools.
• Physical represantation.
23 gstl.itu.edu.tr
Behavioral Simulate
Test
Algorithm Results
Manual ❖ HDL tools are used for
Simulate automatic translation.
Register Test
Transfer Level Results ❖ In each step we need to
Logic Synthesis simulate and verify our
Simulate design.
Test
Gate Level Results ▪ Behavioral Simulation
▪ RTL Sim.
Auto Place + Route
▪ Post Synthesis Sim.
Simulate
Test ▪ Post Imp Sim.
Results
24 gstl.itu.edu.tr
RTL Description
Verilog Code
module and2( output z, input x, input y);
assign z = x&y;
endmodule
Layout
Gate-Level Netlist
Taken from Vivado, Xilinx Nexsys 4 DDR FPGA Chip
Artix-7 XC7A100T-1CSG324C
25 gstl.itu.edu.tr
Design Tools
• Intel Altera FPGAs: Quartus, Modelsim (old)
• Xilinx FPGAs: Vitis-Vivado
• For Asic Designs: Cadence Virtuoso
• Xcelium (RTL Sim)
• Genus (Synthesis)
• Innovus (Place/Route)
• Calibre (DRC Check), Quantus(Noise), Tempus(Temperature), Voltus(Power Integrity) and
others.
26 gstl.itu.edu.tr
References
• Xilinx - Introduction to FPGA Design with Vivado High-Level Synthesis
• https://www.raspberrypi.com/products/raspberry-pi-4-model-b/
• https://www.xilinx.com/products/silicon-devices/soc/zynq-7000.html
• Müştak Erhan Yalçın - EHB326E
• https://en.wikipedia.org/wiki/Moore%27s_law
• https://www.ti.com/lit/ds/symlink/66ak2h12.pdf?ts=1665592634446&ref_url=ht
tps%253A%252F%252Fwww.ti.com%252Fproduct%252F66AK2H12
• https://www.avnet.com/wps/portal/us/products/avnet-boards/avnet-board-
families/zedboard/
27 gstl.itu.edu.tr
Thank you for listening!
ITU EMBEDDED SYSTEM DESIGN LABORATORY
28 gstl.itu.edu.tr