Module 2
Module 2
3 Logical Effort
In this section we explore a delay model based on logical effort, a term coined by
Ivan Sutherland and Robert Sproull [1991], that has as its basis the time-constant
analysis of Carver Mead, Chuck Seitz, and others.
We add a catch all nonideal component of delay, t q , to Eq. 3.2 that includes:
(1) delay due to internal parasitic capacitance; (2) the time for the input to reach
the switching threshold of the cell; and (3) the dependence of the delay on the
slew rate of the input waveform. With these assumptions we can express the
delay as follows:
t PD = R ( C out + C p ) + t q . (3.10)
(The input capacitance of the logic cell is C , but we do not need it yet.)
We will use a standard-cell library for a 3.3 V, 0.5 m m (0.6 m m drawn)
technology (from Compass) to illustrate our model. We call this technology C5 ;
it is almost identical to the G5 process from Section 2.1 (the Compass library
uses a more accurate and more complicated SPICE model than the generic
process). The equation for the delay of a 1X drive, two-input NAND cell is in the
form of Eq. 3.10 ( C out is in pF):
t PD = (0.07 + 1.46 C out + 0.15) ns . (3.11)
The delay due to the intrinsic output capacitance (0.07 ns, equal to RC p ) and the
nonideal delay ( t q = 0.15 ns) are specified separately. The nonideal delay is a
considerable fraction of the total delay, so we may hardly ignore it. If data books
do not specify these components of delay separately, we have to estimate the
fractions of the constant part of a delay equation to assign to RC p and t q (here
the ratio RC p / t q is approximately 2).
The data book tells us the input trip point is 0.5 and the output trip points are 0.35
and 0.65. We can use Eq. 3.11 to estimate the pull resistance for this cell as R ª
1.46 nspF 1 or about 1.5 k W . Equation 3.11 is for the falling delay; the data
book equation for the rising delay gives slightly different values (but within 10
percent of the falling delay values).
We can scale any logic cell by a scaling factor s (transistor gates become s times
wider, but the gate lengths stay the same), and as a result the pull resistance R
will decrease to R / s and the parasitic capacitance C p will increase to sC p .
Since t q is nonideal, by definition it is hard to predict how it will scale. We shall
assume that t q scales linearly with s for all cells. The total cell delay then scales
as follows:
t PD = ( R / s )·( C out + sC p ) + st q . (3.12)
For example, the delay equation for a 2X drive ( s = 2), two-input NAND cell is
t PD = (0.03 + 0.75 C out + 0.51) ns . (3.13)
Compared to the 1X version (Eq. 3.11 ), the output parasitic delay has decreased
to 0.03 ns (from 0.07 ns), whereas we predicted it would remain constant (the
difference is because of the layout); the pull resistance has decreased by a factor
of 2 from 1.5 k W to 0.75 k W , as we would expect; and the nonideal delay has
increased to 0.51 ns (from 0.15 ns). The differences between our predictions and
the actual values give us a measure of the model accuracy.
We rewrite Eq. 3.12 using the input capacitance of the scaled logic cell, C in = s
C,
C out
t PD = RC + RC p + st q . (3.14)
C in
Finally we normalize the delay using the time constant formed from the pull
resistance R inv and the input capacitance C inv of a minimum-size inverter:
( RC ) ( C out / C in ) + RC p + st q
d= = f + p + q . (3.15)
t
Thus tq inv = 0.1 ns and R inv = 1.60 k W . The input capacitance of the 1X
inverter (the standard load for this library) is specified in the data book as C inv =
0.036 pF; thus t = (0.036 pF)(1.60 k W ) = 0.06 ns for the C5 technology.
The use of logical effort consists of rearranging and understanding the meaning
of the various terms in Eq. 3.15 . The delay equation is the sum of three terms,
d = f + p + q . (3.18)
The effort delay f we write as a product of logical effort, g , and electrical effort,
h:
f = gh . (3.20)
What size of logic cell do the R and C refer to? It does not matter because the R
and C will change as we scale a logic cell, but the RC product stays the samethe
logical effort is independent of the size of a logic cell. We can find the logical
effort by scaling down the logic cell so that it has the same drive capability as the
1X minimum-size inverter. Then the logical effort, g , is the ratio of the input
capacitance, C in , of the 1X version of the logic cell to C inv (see Figure 3.8 ).
FIGURE 3.8 Logical effort. (a) The input capacitance, C inv , looking into the
input of a minimum-size inverter in terms of the gate capacitance of a
minimum-size device. (b) Sizing a logic cell to have the same drive strength as a
minimum-size inverter (assuming a logic ratio of 2). The input capacitance
looking into one of the logic-cell terminals is then C in . (c) The logical effort of
a cell is C in / C inv . For a two-input NAND cell, the logical effort, g = 4/3.
The electrical effort h depends only on the load capacitance C out connected to
the output of the logic cell and the input capacitance of the logic cell, C in ; thus
h = C out / C in . (3.23)
Table 3.2 shows the logical efforts for single-stage logic cells. Suppose the
minimum-size inverter has an n -channel transistor with W/L = 1 and a p
-channel transistor with W/L = 2 (logic ratio, r , of 2). Then each two-input
NAND logic cell input is connected to an n -channel transistor with W/L = 2 and
a p -channel transistor with W/L = 2. The input capacitance of the two-input
NAND logic cell divided by that of the inverter is thus 4/3. This is the logical
effort of a two-input NAND when r = 2. Logical effort depends on the ratio of the
logic. For an n -input NAND cell with ratio r , the p -channel transistors are W/L
= r /1, and the n -channel transistors are W/L = n /1. For a NOR cell the n
-channel transistors are 1/1 and the p -channel transistors are nr /1.
TABLE 3.2 Cell effort, parasitic delay, and nonideal delay (in units of t ) for
single-stage CMOS cells.
Cell effort Cell effort Nonideal delay/
Cell Parasitic delay/ t
(logic ratio = 2) (logic ratio = r) t
p inv (by q inv (by
inverter 1 (by definition) 1 (by definition)
definition) 1 definition) 1
n -input
( n + 2)/3 ( n + r )/( r + 1) n p inv n q inv
NAND
n -input
(2 n + 1)/3 ( nr + 1)/( r + 1) n p inv n q inv
NOR
The parasitic delay arises from parasitic capacitance at the output node of a
single-stage logic cell and most (but not all) of this is due to the source and drain
capacitance. The parasitic delay of a minimum-size inverter is
p inv = C p / C inv . (3.25)
The parasitic delay is a constant, for any technology. For our C5 technology we
know RC p = 0.06 ns and, using Eq. 3.17 for a minimum-size inverter, we can
calculate p inv = RC p / t = 0.06/0.06 = 1 (this is purely a coincidence). Thus C p
is about equal to C inv and is approximately 0.036 pF. There is a large error in
calculating p inv from extracted delay values that are so small. Often we can
calculate p inv more accurately from estimating the parasitic capacitance from
layout.
Because RC p is constant, the parasitic delay is equal to the ratio of parasitic
capacitance of a logic cell to the parasitic capacitance of a minimum-size
inverter. In practice this ratio is very difficult to calculateit depends on the
layout. We can approximate the parasitic delay by assuming it is proportional to
the sum of the widths of the n -channel and p -channel transistors connected to
the output. Table 3.2 shows the parasitic delay for different cells in terms of p inv
.
The nonideal delay q is hard to predict and depends mainly on the physical size
of the logic cell (proportional to the cell area in general, or width in the case of a
standard cell or a gate-array macro),
q = st q / t . (3.26)
(Notice that g cancels out in this equation, we shall discuss this in the next
section.)
The delay of the NOR logic cell, in units of t , is thus
0.3 ¥ 10 12
d = gh + p + q = + (3)·(1) + (3)·(1.7)
(2)·(0.036 ¥ 10 12 )
= 4.1666667 + 3 + 5.1
= 12.266667 t . (3.28)
The delay for a 2X drive, three-input NOR logic cell in the C5 library is
t PD = (0.03 + 0.72 C out + 0.60) ns . (3.29)
compared to our prediction of 0.74 ns. Almost all of the error here comes from
the inaccuracy in predicting the nonideal delay. Logical effort gives us a method
to examine relative delays and not accurately calculate absolute delays. More
important is that logical effort gives us an insight into why logic has the delay it
does.
We can calculate the area of the transistors in a logic cell (ignoring the routing
area, drain area, and source area) in units of a minimum-size n -channel transistor
we call these units logical squares . We call the transistor area the logical area .
For example, the logical area of a 1X drive cell, OAI221X1, is calculated as
follows:
● n -channel transistor sizes: 3/1 + 4 ¥ (3/1)
Figure 3.10 shows a single-stage AOI221 cell, with g = (8/3, 8/3, 6/3). The
calculation of the logical area (for a AOI221X1) is as follows:
● n -channel transistor sizes: 1/1 + 4 ¥ (2/1)
● p -channel transistor sizes: 6/1 + 4 ¥ (6/1)
● logical area = 1 + (4 ¥ 2) + (5 ¥ 6) = 39 logical squares
FIGURE 3.10 An
AND-OR-INVERT cell,
an AOI221, with
logical-effort vector, g =
(8/3, 8/3, 7/3). The
logical area is 39 logical
squares.
D= gihi+ ( p i + q i ) . (3.32)
i path i path
Figure 3.11 (a) shows this implementation with each input driven by a
minimum-size inverter so we can measure the effect of the cell input capacitance.
Each of the logic cells in Figure 3.11 has a 1X drive strength. This means that
the input capacitance of each logic cell is given, as shown in the figure, by gC inv
.
Using Eq. 3.32 we can calculate the delay from the input of the inverter driving
A1 to the output ZN as
d 1 = (1)·(1.4) + 1 + 1.7 + (1.4)·(1) + 2 + 3.4
+ (1.4)·(0.7) + 2 + 3.4 + (1)· C L + 1 + 1.7
= (20 + C L ) . (3.35)
This gives the delay from an inverter driving the A input to the output ZN of the
single-stage logic cell as
d1 = ((1)·(2.6) + 1 + 1.7 + (1)· C L + 5 + 8.5 )
= 18.8 + C L . (3.37)
The single-stage delay is very close to the delay for the multistage version of this
logic cell. In some ASIC libraries the AOI221 is implemented as a multistage
logic cell instead of using a single stage. It raises the question: Can we make the
multistage logic cell any faster by adjusting the scale of the intermediate logic
cells?
The path electrical effort H is the product of the electrical efforts on the path,
C out
H= hi , (3.39)
i path C in
where C out is the last output capacitance on the path (the load) and C in is the
first input capacitance on the path.
The path effort F is the product of the path electrical effort and logical efforts,
F = GH . (3.40)
The optimum effort delay for each stage is found by minimizing the path delay D
by varying the electrical efforts of each stage h i , while keeping H , the path
electrical effort fixed. The optimum effort delay is achieved when each stage
operates with equal effort,
f^ i = g i h i = F 1/ N . (3.41)
Thus F = GH = 1.95 and the optimum stage effort is 1.95 (1/3) = 1.25, so that the
optimum delay NF 1/ N = 3.75. From Figure 3.11 (a) we see that
g 0 h 0 + g 2 h 2 + g 3 h 3 = 1.4 + 1.3 + 1 = 3.8 . (3.45)
This means that even if we scale the sizes of the cells to their optimum values, we
only save a fraction of a t (3.8 3.75 = 0.05). This is a useful result (and one that
is true in general)the delay is not very sensitive to the scale of the cells. In this
case it means that we can reduce the size of the two NAND cells in the multicell
implementation of an AOI221 without sacrificing speed. We can use logical
effort to predict what the change in delay will be for any given cell sizes.
We can use logical effort in the design of logic cells and in the design of logic
that uses logic cells. If we do have the flexibility to continuously size each logic
cell (which in ASIC design we normally do not, we usually have to choose from
1X, 2X, 4X drive strengths), each logic stage can be sized using the equation for
the individual stage electrical efforts,
F 1/ N
h^ i = . (3.46)
gi
For example, even though we know that it will not improve the delay by much,
let us size the cells in Figure 3.11 (a). We shall work backward starting at the
fixed load capacitance at the input of the last inverter.
For NAND cell 3, gh = 1.25; thus (since g = 1.4), h = C out / C in = 0.893. The
output capacitance, C out , for this NAND cell is the input capacitance of the
inverterfixed as 1 standard load, C inv . This fixes the input capacitance, C in , of
NAND cell 3 at 1/0.893 = 1.12 standard loads. Thus, the scale of NAND cell 3 is
1.12/1.4 or 0.8X.
Now for NAND cell 2, gh = 1.25; C out for NAND cell 2 is the C in of NAND
cell 3. Thus C in for NAND cell 2 is 1.12/0.893 = 1.254 standard loads. This
means the scale of NAND cell 2 is 1.254/1.4 or 0.9X.
The optimum sizes of the NAND cells are not very different from 1X in this case
because H = 1 and we are only driving a load no bigger than the input
capacitance. This raises the question: What is the optimum stage effort if we have
to drive a large load, H >> 1? Notice that, so far, we have only calculated the
optimum stage effort when we have a fixed number of stages, N . We have said
nothing about the situation in which we are free to choose, N , the number of
stages.
h h/(ln h)
1.5 3.7
2 2.9
2.7 2.7
3 2.7
4 2.9
5 3.1
10 4.3
Figure 3.12 shows us how to minimize delay regardless of area or power and
neglecting parasitic and nonideal delays. More complicated equations can be
derived, including nonideal effects, when we wish to trade off delay for smaller
area or reduced power.
1. For the Compass 0.5 m m technology (C5): p inv = 1.0, q inv = 1.7, R inv = 1.5
k W , C inv = 0.036 pF.
3.4 Library-Cell Design
The optimum cell layout for each process generation changes because the design
rules for each ASIC vendors process are always slightly differenteven for the
same generation of technology. For example, two companies may have very
similar 0.35 m m CMOS process technologies, but the third-level metal spacing
might be slightly different. If a cell library is to be used with both processes, we
could construct the library by adopting the most stringent rules from each
process. A library constructed in this fashion may not be competitive with one
that is constructed specifically for each process. Even though ASIC vendors prize
their design rules as secret, it turns out that they are similarexcept for a few
details. Unfortunately, it is the details that stop us moving designs from one
process to another. Unless we are a very large customer it is difficult to have an
ASIC vendor change or waive design rules for us. We would like all vendors to
agree on a common set of design rules. This is, in fact, easier than it sounds. The
reason that most vendors have similar rules is because most vendors use the same
manufacturing equipment and a similar process. It is possible to construct a
highest common denominator library that extracts the most from the current
manufacturing capability. Some library companies and the large Japanese ASIC
vendors are adopting this approach.
Layout of library cells is either hand-crafted or uses some form of symbolic
layout . Symbolic layout is usually performed in one of two ways: using either
interactive graphics or a text layout language. Shapes are represented by simple
lines or rectangles, known as sticks or logs , in symbolic layout. The actual
dimensions of the sticks or logs are determined after layout is completed in a
postprocessing step. An alternative to graphical symbolic layout uses a text
layout language, similar to a programming language such as C, that directs a
program to assemble layout. The spacing and dimensions of the layout shapes are
defined in terms of variables rather than constants. These variables can be
changed after symbolic layout is complete to adjust the layout spacing to a
specific process.
Mapping symbolic layout to a specific process technology uses 1020 percent
more area than hand-crafted layout (though this can then be further reduced to 5
10 percent with compaction). Most symbolic layout systems do not allow 45°
layout and this introduces a further area penalty (my experience shows this is
about 515 percent). As libraries get larger, and the capability to quickly move
libraries and ASIC designs between different generations of process technologies
becomes more important, the advantages of symbolic layout may outweigh the
disadvantages.
L ast E d ited by S P 1411 2 0 0 4
PROGRAMMABLE
ASIC LOGIC
CELLS
All programmable ASICs or FPGAs contain a basic logic cell replicated in a
regular array across the chip (analogous to a base cell in an MGA). There are the
following three different types of basic logic cells: (1) multiplexer based, (2)
look-up table based, and (3) programmable array logic. The choice among these
depends on the programming technology. We shall see examples of each in this
chapter.
5.1 Actel ACT
The basic logic cells in the Actel ACT family of FPGAs are called Logic
Modules . The ACT 1 family uses just one type of Logic Module and the ACT 2
and ACT 3 FPGA families both use two different types of Logic Module.
FIGURE 5.1 The Actel ACT architecture. (a) Organization of the basic logic
cells. (b) The ACT 1 Logic Module. (c) An implementation using pass
transistors (without any buffering). (d) An example logic macro. (Source: Actel.)
5.1.2 Shannons Expansion Theorem
In logic design we often have to deal with functions of many variables. We need
a method to break down these large functions into smaller pieces. Using the
Shannon expansion theorem, we can expand a Boolean logic function F in terms
of (or with respect to) a Boolean variable A,
F = A · F (A = '1') + A' · F (A = '0'),(5.1)
where F (A = 1) represents the function F evaluated with A set equal to '1'.
For example, we can expand the following function F with respect to (I shall use
the abbreviation wrt ) A,
F = A' · B + A · B · C' + A' · B' · C
= A · (B · C') + A' · (B + B' · C).(5.2)
We have split F into two smaller functions. We call F (A = '1') = B · C' the
cofactor of F wrt A in Eq. 5.2 . I shall sometimes write the cofactor of F wrt A as
F A (the cofactor of F wrt A' is F A' ). We may expand a function wrt any of its
variables. For example, if we expand F wrt B instead of A,
F = A' · B + A · B · C' + A' · B' · C
= B · (A' + A · C') + B' · (A' · C).(5.3)
We can continue to expand a function as many times as it has variables until we
reach the canonical form (a unique representation for any Boolean function that
uses only minterms. A minterm is a product term that contains all the variables of
Fsuch as A · B' · C). Expanding Eq. 5.3 again, this time wrt C, gives
F2 = A + D = (A · 1) + (A' · D),(5.7)
F1 = C + D = (C · 1) + (C' · D).(5.8)
From Eqs. 5.6 5.8 we see that we may implement F by arranging for A, B, C to
appear on the select lines and '1' and D to be the data inputs of the MUXes in the
ACT 1 Logic Module. This is the implementation shown in Figure 5.1 (d), with
connections: A0 = D, A1 = '1', B0 = D, B1 = '1', SA = C, SB = A, S0 = '0', and S1
= B.
Now that we know that we can implement Boolean functions using MUXes, how
do we know which functions we can implement and how to implement them?
● BUF. The MUX just passes one of the MUX inputs directly to the output.
Figure 5.3 (a) shows how we might view a 2:1 MUX as a function wheel , a
three-input black box that can generate any one of the six functions of two-input
variables: BUF, INV, AND-11, AND1-1, OR, AND. We can write the output of
a function wheel as
F1 = WHEEL1 (A, B).(5.9)
where I define the wheel function as follows:
WHEEL1 (A, B) = MUX (A0, A1, SA).(5.10)
The MUX function is not unique; we shall define it as
MUX (A0, A1, SA) = A0 · SA' + A1 · SA.(5.11)
The inputs (A0, A1, SA) are described using the notation
A0, A1, SA = {A, B, '0', '1'}(5.12)
to mean that each of the inputs (A0, A1, and SA) may be any of the values: A, B,
'0', or '1'. I chose the name of the wheel function because it is rather like a dial
that you set to your choice of function. Figure 5.3 (b) shows that the ACT 1
Logic Module is a function generator built from two function wheels, a 2:1
MUX, and a two-input OR gate.
FIGURE 5.3 The ACT 1 Logic Module as a Boolean function generator. (a) A
2:1 MUX viewed as a function wheel. (b) The ACT 1 Logic Module viewed as
two function wheels, an OR gate, and a 2:1 MUX.
We can describe the ACT 1 Logic Module in terms of two WHEEL functions:
F = MUX [ WHEEL1, WHEEL2, OR (S0, S1) ](5.13)
Now, for example, to implement a two-input NAND gate, F = NAND (A, B) =
(A · B)', using an ACT 1 Logic Module we first express F as the output of a 2:1
MUX. To split up F we expand it wrt A (or wrt B; since F is symmetric in A and
B):
F = A · (B') + A' · ('1')(5.14)
Thus to make a two-input NAND gate we assign WHEEL1 to implement INV
(B), and WHEEL2 to implement '1'. We must also set the select input to the
MUX connecting WHEEL1 and WHEEL2, S0 + S1 = Awe can do this with S0 =
A, S1 = '1'.
Before we get too carried away, we need to realize that we do not have to worry
about how to use Logic Modules to construct combinational logic functionsthis
has already been done for us. For example, if we need a two-input NAND gate,
we just use a NAND gate symbol and software takes care of connecting the
inputs in the right way to the Logic Module.
How did Actel design its Logic Modules? One of Actels engineers wrote a
program that calculates how many functions of two, three, and four variables a
given circuit would provide. The engineers tested many different circuits and
chose the best one: a small, logically efficient circuit that implemented many
functions. For example, the ACT 1 Logic Module can implement all two-input
functions, most functions with three inputs, and many with four inputs.
Apart from being able to implement a wide variety of combinational logic
functions, the ACT 1 module can implement sequential logic cells in a flexible
and efficient manner. For example, you can use one ACT 1 Logic Module for a
transparent latch or two Logic Modules for a flip-flop. The use of latches rather
than flip-flops does require a shift to a two-phase clocking scheme using two
nonoverlapping clocks and two clock trees. Two-phase synchronous design using
latches is efficient and fast but, to handle the timing complexities of two clocks
requires changes to synthesis and simulation software that have not occurred.
This means that most people still use flip-flops in their designs, and these require
two Logic Modules.
The S-Module seems like good valuewe get all the combinational logic functions
of a C-module (with delay t PD of 3 ns) as well as the setup time for a flip-flop for
only 0.8 ns? ¬ really. Next I will explain why not.
Figure 5.5 (b) shows what is happening inside an S-Module. The setup and hold
times, as measured inside (not outside) the S-Module, of the flip-flop are t' SUD
and t' H (a prime denotes parameters that are measured inside the S-Module). The
clockQ propagation delay is t' CO . The parameters t' SUD , t' H , and t' CO are
measured using the internal clock signal CLKi. The propagation delay of the
combinational logic inside the S-Module is t' PD . The delay of the combinational
logic that drives the flip-flop clock signal ( Figure 5.4 d) is t' CLKD .
From outside the S-Module, with reference to the outside clock signal CLK1:
t SUD = t' SUD + (t' PD t' CLKD ),
Figure 5.5 (c) shows an example of flip-flop timing. We have no way of knowing
what the internal flip-flop parameters t' SUD , t' H , and t' CO actually are, but we
can assume some reasonable values (just for illustration purposes):
t' SUD = 0.4 ns, t' H = 0.1 ns, t' CO = 0.4 ns.(5.16)
We do know the delay, t' PD , of the combinational logic inside the S-Module. It
is exactly the same as the C-Module delay, so t' PD = 3 ns for the ACT 3. We do
not know t' CLKD ; we shall assume a reasonable value of t' CLKD = 2.6 ns (the
exact value does not matter in the following argument).
Next we calculate the external S-Module parameters from Eq. 5.15 as follows:
These are the same as the ACT 3 S-Module parameters shown in Figure 5.5 (a),
and I chose t' CLKD and the values in Eq. 5.16 so that they would be the same. So
now we see where the combinational logic delay of 3.0 ns has gone: 0.4 ns went
into increasing the setup time and 2.6 ns went into increasing the clockoutput
delay, t CO .
From the outside we can say that the combinational logic delay is buried in the
flip-flop setup time. FPGA vendors will point this out as an advantage that they
have. Of course, we are not getting something for nothing here. It is like
borrowing moneyyou have to pay it back.
Table 5.2 shows the ACT 3 commercial worst-case timing. 6 In this table Actel
has included some estimates of the variable routing delay shown in Figure 5.5
(a). These delay estimates depend on the number of gates connected to a gate
output (the fanout).
When you design microelectronic systems (or design anything ) you must use
worst-case figures ( just as you would design a bridge for the worst-case load).
To convert nominal or typical timing figures to the worst case (or best case), we
use measured, or empirically derived, constants called derating factors that are
expressed either as a table or a graph. For example, Table 5.3 shows the ACT 3
derating factors from commercial worst-case to industrial worst-case and military
worst-case conditions (assuming T J = T A ). The ACT 1 and ACT 2 derating
factors are approximately the same. 7
TABLE 5.2 ACT 3 timing parameters. 8
Fanout
Family Delay 9 1 2 3 4 8
ACT 3-3 (data book) t PD 2.9 3.2 3.4 3.7 4.8
ACT3-2 (calculated) t PD /0.85 3.41 3.76 4.00 4.35 5.65
ACT3-1 (calculated) t PD /0.75 3.87 4.27 4.53 4.93 6.40
ACT3-Std (calculated) t PD /0.65 4.46 4.92 5.23 5.69 7.38
Source: Actel.
TABLE 5.3 ACT 3 derating factors. 10
Temperature T J ( junction) / °C
V DD / V 55 40 0 25 70 85 125
4.5 0.72 0.76 0.85 0.90 1.04 1.07 1.17
4.75 0.70 0.73 0.82 0.87 1.00 1.03 1.12
5.00 0.68 0.71 0.79 0.84 0.97 1.00 1.09
5.25 0.66 0.69 0.77 0.82 0.94 0.97 1.06
5.5 0.63 0.66 0.74 0.79 0.90 0.93 1.01
Source: Actel.
If this were the slowest path between flip-flops (very unlikely since we have only
one stage of combinational logic in this path), our estimated critical path delay
between registers , t CRIT , would be the combinational logic delay plus the
flip-flop setup time plus the clockoutput delay:
t CRIT (w-c commercial) = t PD + t SUD + t CO
Let us jump ahead a little and assume that we can calculate that T J = T A + 20 °C
= 105 °C in our application. To find the derating factor at 105 °C we linearly
interpolate between the values for 85 °C (1.07) and 125 °C (1.17) from Table 5.3
). The interpolated derating factor is 1.12 and thus
t CRIT (w-c industrial, T J = 105 °C) = 1.12 ¥ 9.5 ns = 10.6 ns ,(5.21)
1. The minterm numbers are formed from the product terms of the canonical
form. For example, A · B' = 10 = 2.
2. The minterm code is formed from the minterms. A '1' denotes the presence of
that minterm.
3. The function number is the decimal version of the minterm code.
4. Connections to a two-input MUX: A0 and A1 are the data inputs and SA is the
select input (see Eq. 5.11 ).
FIGURE 5.6 The Xilinx XC3000 CLB (configurable logic block). (Source:
Xilinx.)
A 32-bit look-up table ( LUT ), stored in 32 bits of SRAM, provides the ability to
implement combinational logic. Suppose you need to implement the function F =
A · B · C · D · E (a five-input AND). You set the contents of LUT cell number 31
(with address '11111') in the 32-bit SRAM to a '1'; all the other SRAM cells are
set to '0'. When you apply the input variables as an address to the 32-bit SRAM,
only when ABCDE = '11111' will the output F be a '1'. This means that the CLB
propagation delay is fixed, equal to the LUT access time, and independent of the
logic function you implement.
There are seven inputs for the combinational logic in the XC3000 CLB: the five
CLB inputs (AE), and the flip-flop outputs (QX and QY). There are two outputs
from the LUT (F and G). Since a 32-bit LUT requires only five variables to form
a unique address (32 = 2 5 ), there are several ways to use the LUT:
● You can use five of the seven possible inputs (AE, QX, QY) with the
entire 32-bit LUT. The CLB outputs (F and G) are then identical.
● You can split the 32-bit LUT in half to implement two functions of four
variables each. You can choose four input variables from the seven inputs
(AE, QX, QY). You have to choose two of the inputs from the five CLB
inputs (AE); then one function output connects to F and the other output
connects to G.
● You can split the 32-bit LUT in half, using one of the seven input variables
as a select input to a 2:1 MUX that switches between F and G. This allows
you to implement some functions of six and seven variables.
The inclusion of flip-flops and combinational logic inside the basic logic cell
leads to efficient implementation of state machines, for example. The
coarse-grain architecture of the Xilinx CLBs maximizes performance given the
size of the SRAM programming technology element. As a result of the increased
complexity of the basic logic cell we shall see (in Section 7.2, Xilinx LCA) that
the routing between cells is more complex than other FPGAs that use a simpler
basic logic cell.
1. Xilinx decided to use Logic Cell as a trademark in 1995 rather as if IBM were
to use Computer as a trademark today. Thus we should now only talk of a Xilinx
Logic Cell (with capital letters) and not Xilinx logic cells.
2. October 1995 (Version 3.0) data sheet.
5.3 Altera FLEX
Figure 5.10 shows the basic logic cell, a Logic Element ( LE ), that Altera uses in
its FLEX 8000 series of FPGAs. Apart from the cascade logic (which is slightly
simpler in the FLEX LE) the FLEX cell resembles the XC5200 LC architecture
shown in Figure 5.8 . This is not surprising since both architectures are based on
the same SRAM programming technology. The FLEX LE uses a four-input LUT,
a flip-flop, cascade logic, and carry logic. Eight LEs are stacked to form a Logic
Array Block (the same term as used in the MAX series, but with a different
meaning).
FIGURE 5.10 The Altera FLEX architecture. (a) Chip floorplan. (b) LAB
(Logic Array Block). (c) Details of the LE (Logic Element). ( Source: Altera
(adapted with permission).)
5.4 Altera MAX
Suppose we have a simple two-level logic circuit that implements a sum of products
as shown in Figure 5.11 (a). We may redraw any two-level circuit using a regular
structure ( Figure 5.11 b): a vector of buffers, followed by a vector of AND gates
(which construct the product terms) that feed OR gates (which form the sums of the
product terms). We can simplify this representation still further ( Figure 5.11 c), by
drawing the input lines to a multiple-input AND gate as if they were one horizontal
wire, which we call a product-term line . A structure such as Figure 5.11 (c) is called
programmable array logic , first introduced by Monolithic Memories as the PAL
series of devices.
FIGURE 5.11 Logic arrays. (a) Two-level logic. (b) Organized sum of products.
(c) A programmable-AND plane. (d) EPROM logic array. (e) Wired logic.
Because the arrangement of Figure 5.11 (c) is very similar to a ROM, we sometimes
call a horizontal product-term line, which would be the bit output from a ROM, the bit
line . The vertical input line is the word line . Figure 5.11 (d) and (e) show how to
build the programmable-AND array (or product-term array) from EPROM transistors.
The horizontal product-term lines connect to the vertical input lines using the EPROM
transistors as pull-downs at each possible connection. Applying a '1' to the gate of an
unprogrammed EPROM transistor pulls the product-term line low to a '0'. A
programmed n -channel transistor has a threshold voltage higher than V DD and is
therefore always off . Thus a programmed transistor has no effect on the product-term
line.
Notice that connecting the n -channel EPROM transistors to a pull-up resistor as
shown in Figure 5.11 (e) produces a wired-logic functionthe output is high only if all
of the outputs are high, resulting in a wired-AND function of the outputs. The
product-term line is low when any of the inputs are high. Thus, to convert the
wired-logic array into a programmable-AND array, we need to invert the sense of the
inputs. We often conveniently omit these details when we draw the schematics of
logic arrays, usually implemented as NORNOR arrays (so we need to invert the
outputs as well). They are not minor details when you implement the layout, however.
Figure 5.12 shows how a programmable-AND array can be combined with other logic
into a macrocell that contains a flip-flop. For example, the widely used 22V10 PLD,
also called a registered PAL, essentially contains 10 of the macrocells shown in
Figure 5.12 . The part number, 22V10, denotes that there are 22 inputs (44 vertical
input lines for both true and complement forms of the inputs) to the programmable
AND array and 10 macrocells. The PLD or registered PAL shown in Figure 5.12 has
an 2 i ¥ jk programmable-AND array.
FIGURE 5.12 A registered PAL with i inputs, j product terms, and k macrocells.
The disadvantage of the shared expanders is the extra logic delay incurred because of
the second pass that you need to take through the product-term array. We usually do
not know before the logic tools assign logic to macrocells ( logic assignment )
whether we need to use the logic expanders. Since we cannot predict the exact timing
the Altera MAX architecture is not strictly deterministic . However, once we do know
whether a signal has to go through the array once or twice, we can simply and
accurately predict the delay. This is a very important and useful feature of the Altera
MAX architecture.
The expander terms are sometimes called helper terms when you use a PAL. If you
use helper terms in a 22V10, for example, you have to go out to the chip I/O pad and
then back into the programmable array again, using two-pass logic .
FIGURE 5.14 Use of programmed inversion to simplify logic: (a) The function F =
A · B' + A · C' + A · D' + A' · C · D requires four product terms (P1P4) to implement
while (b) the complement, F ' = A · B · C · D + A' · D' + A' · C' requires only three
product terms (P1P3).
Another common feature in complex PLDs, also used in some PLDs, is shown in
Figure 5.13 . Programming one input of the XOR gate at the macrocell output allows
you to choose whether or not to invert the output (a '1' for inversion or to a '0' for no
inversion). This programmable inversion can reduce the required number of product
terms by using a de Morgan equivalent representation instead of a conventional
sum-of-products form, as shown in Figure 5.14 .
The Altera MAX macrocell is more like a PLD than the other FPGA architectures
discussed here; that is why Altera calls the MAX architecture a complex PLD. This
means that the MAX architecture works well in applications for which PLDs are most
useful: simple, fast logic with many inputs or variables.