FPGA Architecture
Abstract Architecture
LOGIC
INSTR
Interconnect
+ Storage
• Three components of all computing elements
• Control
• Compute elements
• Communication
Custom Hardware
•No Control Store
•Not General Purpose
•All “computing” is through
LOGIC spatial connections
Interconnect
+ Storage
Traditional P
•Control store only
controls logic
•Communication is in LOGIC
time INSTR
•Registers, memory
etc
Interconnect
+ Storage
Programmable Devices
• Prefabricated Silicon
• Logic implemented by programming the
basic cells and the interconnect
• Very fast turnaround time
• Limited design flexibility
• Low development time and cost
FPGA
• Combines PLDs and MPGAs
• Densities : 2K to 1000K+ gates
• Array of logic blocks and programmable interconnect
• Logic Block
• Universal gates, multiplexors, RAMs, etc
• Programmable element
• SRAM,EEPROM or antifuse
Q
Read or Write Q P1
P2 Out
Data P3
P4
Programming Bit I1I2
2-Input LUT
Where are FPGAs Used
Time to Price volume
market
Emulation Very high X Low
Emulation
Prototyping Prototyping Very high X Low
PreProduction
Production Pre-production Very high Critical Moderate
production Very high Critical High
Changing Market
FPGA
Generic 2D FPGA
SRAM Based FPGA - XILINX
Xilinx 4000 CLB
The Basic Building Block
• Logic Block
– Lookup table Based
• Xilinx
– Multiplexor Based
• Actel
– Transistor Based
– Universal Gate Based
LUT Mapping
• N-LUT direct implementation of a truth table: any function
of n-inputs.
• N-LUT requires 2N storage elements (latches)
• N-inputs select one latch location (like a memory)
Implementing Combinational
Logic
Two 4-input functions with register outputs
and one 2-input function
5 input Function
Single Port RAM
http://www.xilinx.com/bvdocs/publications/4000.pdf Page 9
Platform Computing
The Virtex Architecture
• CLBs
• IOBs
• General Routing
Matrix (GRM)
• BRAMs
• DLL
Virtex II Architecture
Virtex II CLB
V2 CLB Configuration
V2 Slice Configuration
Virtex II CLB (Half Slice)
Adder
Carry Chain
Other Features
The latest entry – Virtex II Pro
•Embedded high-speed serial transceivers enable data
bit rate up to 3.125 Gb/s per channel (RocketIO) or
10.3125 Gb/s (RocketIO X).
• Embedded IBM PowerPC 405 RISC processor blocks
provide performance up to 400 MHz.
• SelectIO-Ultra blocks provide the interface between
package pins and the internal configurable logic. Most
popular and leading-edge I/O standards are supported
by the programmable IOBs.
• Configurable Logic Blocks (CLBs) provide functional
elements for combinatorial and synchronous logic,
including basic storage elements. BUFTs (3-state
buffers) associated with each CLB element drive
dedicated segmentable horizontal routing resources.
•Block SelectRAM+ memory modules provide large
18 Kb storage elements of True Dual-Port RAM.
• Embedded multiplier blocks are 18-bit x 18-bit
dedicated multipliers.
• Digital Clock Manager (DCM) blocks provide self
calibrating,fully digital solutions for clock distribution
delay compensation, clock multiplication and division,
and coarse- and fine-grained clock phase shifting.
FPGA Technology Mapping
Outline
• Technology mapping
– Definition & Examples
– Algorithms
• FPGA structure & simple mapping
• FPGA technology mapping
– Issues
– Algorithms
Definition
Technology mapping is also referred to as
library binding.
Given a Boolean network and a
characterized cell library, generate a
mapping of the network components onto
cell library components with the objective
of cost optimization or delay optimization.
Input & Library
• Input: Boolean network - Technology
independent optimized network; typically a
multi-level network
• Library:
– Characterization in terms of area, delay and
power
– Enumerated or implicit library cells
Typical Library
A typical simple library cell :-
• a single output combinational logic function
• cost in terms of area
• delay in terms of propagation delays for each
input/output pair and as a function of load
and/or fanout. Sometimes only the worst case
values are stored.
• power in terms of average current
Network Covering
Network covering implies replacement of the
sub-networks of the original network with
cell library instances. Covering entails
recognizing the equivalence of library cell
to the identified sub-network and selecting
adequate number of them to cover the
whole network.
Example 1
Cell library consists of two and three input gates
Example: First Mapping
Cell library consists of two and three input gates
Example: Second Mapping
Cell library consists of two and three input gates
Example 2
Cell library consists of
Component Area Delay
AND2 3 2
OR2 3 2
OA21 5 3
Example: First Mapping
Cell library consists of
Component Area Delay
AND2 3 2
OR2 3 2
OA21 5 3
Area = 9, Delay = 4
Example: Second Mapping
Cell library consists of
Component Area Delay
AND2 3 2
OR2 3 2
OA21 5 3
Area = 10, Delay = 3
Example
m4
Cell library consists of
m2
Component Area Delay
m1 m5 AND2 3 2
OR2 3 2
m3
OA21 5 3
(m1 + m4 + m5)(m2 + m4)(m3 + m5)(m2’ + m1)(m3’ + m1) = 1
FPGA Structures & Mapping
FPGA Structures
• Multiplexer based (ACTEL)
– Mapping techniques similar to library based
– Library is created by enumerating all possible
“patterns”
• LUT based (XILINX)
– Significantly different mapping techniques
LUT Based FPGAs
In LUT based FPGAs (example XILINX
FPGAs) the building blocks are LUTs and
Flip-Flops. A n-input LUT can implement
all functions of n-variables.
The FPGA itself is composed of CLB’s
with each CLB containing multiple LUT’s
and flip-flops which makes the technology
mapping problem more complex.
XC3000 CLB
FF
2X4
0r
1X5
LUT
FF
XC4000 CLB
4 input
LUT
FF
3-input
LUT
FF
4 input
LUT
Mapping Objectives
• Cost optimal mapping
– Minimizing the number of LUTs
– Minimizing the number of CLBs
• Delay optimal mapping
– Minimizing the number of LUT levels
– Minimizing the delays (including routing
delays)
Cost Optimal Mapping
The problem of k-input LUT maps can be
mapped to the problem of bin packing. We
have to minimize the number of bins each
with a capacity of k.
Assume the starting point is a gate-level
netlist with each gate containing less than
equal to k inputs.
Each gate can be packed into one bin.
Example: Simple Mapping
Sum of Products: Bin Packing
• Select the product term with the most
number of variables and fit it into any table
where it fits and if it doesn’t fit anywhere
add a new table
• The table with the fewest number of unused
inputs is declared as final
• Associate this output with the first table that
can accept it
Example: 4-input LUT
Example: Overlapping Inputs
a
b
c
a
d
e
f
g
K=4
Example: Decomposition
a
b
c
h
d
e
f
g
K=4
Example: 3 input LUT
FPGA Technology Mapping:
Issues
LUT Mapping
Starting from a technology independent
optimized circuit, produce a minimal LUT
cover for the circuit. The complexities are
due to the following reasons.
• Fanout nodes
• Reconvergence
• Node decomposition and packing
Area vs. Delay
Decomposition
Decomposition
Fanout: Replication
DAG, not a tree
Fanout: Replication
Fanout: Reconvergence
Fanout: Reconvergence
CLB Mapping
Though direct mapping of technology
independent circuit onto CLBs would
involve function decomposition.
Alternatively, one can start from a circuit
mapped onto LUTs and then pack them
onto CLBs.
Thank You