Standard Cell VLSI Design - A Tutorial
Standard Cell VLSI Design - A Tutorial
CELL LAYOUT
A1 A2 B2 B1 C1
LOGIC SYMBOL
A1 A2 B2 B1 C1 C2
Types
NAND NOR AND-OR-INVERT
OR-AND-INVERT FULL ADDER
INVERT XOR XNOR
STATIC AND DYNAMIC FLIP-FLOP
TRANSMISSION GATE I/O PADS
Figure 1
JANUARY 1985 19
BLOCK DIAGRAM
Figure 2
around time from design to samples. A gate array design rules that are applied to this specific type of
manufacturer has gate array wafers premade and need circuit implementation. This is especially important
only add a few mask levels to implement the circuit. for designers coming from the discrete and TTL world
A standard cell requires a complete set of new masks of design. The rules are by no means difficult to adhere
and fabrication from scratch. However, the trend has to but must be kept in mind from the word go for the
been for shorter and shorter turnaround schedules design to work the first time and be manufacturable.
for standard cells designs.
Synchronous Design
The area to route signals in a standard cell design
The most important rule in standard cell design
is usually 50-80 percent of the chip size. One way to
and probably in all levels of VLSI implementation is
determine the size of the chip before layout is done is
that the design be synchronous. This means that all
to take the combined area of the standard cells and
storage devices (i.e., flip-flops) have the same clock.
multiply by 3-5. This routing penalty is the main dis
In silicon, there is no guarantee of how long signals
advantage over a full custom implementation of the
will take to propagate through gates and down paths.
circuit. However, by using automated layout and rout
Design tricks that are used in TTL designs, such as
ing tools, the confidence level that the circuit will work
clocking off of glitches and adding extra stages of in
the first time is much higher. A full custom imple
verters to slow down a signal, just don't work in VLSI
mentation is crucial for a microprocessor or memory
design and should be avoided.
that is a very cost conscious effort. Other designs that
A synchronous design will clock all data at the same
are in the 1000-6000 gate range are better candidates
time on the same edge of the clock. This allows all
for standard cell design because the cost penalty will
signals to propagate from the output of one flip-flop
not be that great, yet the turnaround will be shorter,
to the input of the next (through whatever random
and the likelihood that the first samples of the circuit
logic is in between them) have the signal settle down
work will be high. On the other hand, a 100-parts-a-
to its correct value, and be latched up. This design
year application involving a 1000 gate circuitry may be
method is robust, as the system clock can be set to
an ideal candidate for a gate array.
work over a wide range of processing, operating con
ditions, and skew and can have the circuit work
Design Rules correctly.
Before embarking on a VLSI design project, it is Another design constraint is for all clears on flip-
important that the designer understand a few basic flops to be synchronous as well (i.e., have the clear
20 IEEE CIRCUITS AND DEVICES MAGAZINE
Graphical Schematic Capture
NOR1
DN Di D D D D
SP SP I SP
MCK
MCKN
SCK SCK SCK
SCKN SCKN SCKN
N0R6
Figure 3
signal be presented to the D input of the flip-flop). between flip-flops. This is an important design con
The asynchronous PRESET and CLEAR leads on flip- sideration but not insurmountable as extra flip-flops
flop standard cells should be saved for a master clear can be added to break up propagations with large
or power on clear when timing is not critical at all. gate counts. As an example, imagine a device with an
The clock inputs to all flip-flop standard cells should input clock of 8 Mhz. Accounting for some clock skew,
come from the same clock signal that is as close to a this gives a maximum time between stages of about
circuit input as possible. It is possible to boost this 110 ns. Also assuming a maximum gate delay of 7 ns
signal with buffers or have a count-down circuit on and a worst-case routing delay of 3 ns, this allows for
the input clock, but care should be taken when laying about 11 gate delays between the output of one flip-
out these functions. The clock input to the standard flop to the input of the next.
cell should never be gated with some other signal, as Analog Worries
this may cause strange clock skews that could cause Good analog design practice is something that is
the circuit to malfunction. achieved over a period of years. The novice designer
Limited I/O would go a long way by understanding the critical
Another design rule that should be noted when the parameters of his functional blocks carefully and the
circuit is partitioned: VLSI circuits are normally pin interactions with their neighbors. If your design in
limited. That is, if you have 40 pins, someone will volves crucial matching, isothermal design or ultra
come up with an implementation that requires 41. high performance, an hour or two spent with an expe
Both inputs and outputs to the device may be multi rienced designer is worth its weight in gold. Analog
plexed, but this may cause the system implementa circuitry tend to be more sensitive to parsitic coupling
tion to be complex. Pins are considered a valuable and hostile digital transitions from other circuitry on
resource and should be assigned and used wisely. the same chip. Sampled data systems that have other
It is advised that the pin assignments be formulated on-chip asynchronous clocks should be avoided as
early in the design process with little chance of chang much as possible in noninsulator based bulk-CMOS
ing. In the event of extra pins being available, use them technologies.
to enhance the testability of the circuit. Overall Methodology
Finite Gate Delays Figure 2 gives an overall idea of what the standard
If the clock input to the VLSI device is fixed, it is cell VLSI design process is. The different steps in the
important to know how many levels of gates can exist methodology include:
JANUARY 1985 21
• Schematic Capture of Design Intent blocks exists freeing the designer from this level of
• Functional Vector Generation design.
• Logic Simulation For each of the blocks, there can be an internal and
• Viewing and Interpreting Simulation Results an external view of the function. The design can be
• Model Generation and Functional Testing of the entered from the top down, the bottom up, or, more
Model practically, from the middle sideways! To enter a cir
• Manufacturing Test Generation cuit from the top down, the designer may use the fol
• Place and Route Activities lowing methodology:
Once these designer-intensive steps are complete, • Enter an external view of the chip, basically a box
the design is ready for completing the mask genera with all the I/O pins labeled and numbered.
tion, fabrication, and device testing steps. • Enter an internal view of the next level of hierar
chy, for example, four blocks connected together
Schematic Capture of Design Intent as well as the external I/O pins connected to the
Once the internal requirements and the pin outs of blocks. This also requires entering the external
the circuit are decided upon, and the correct design views of the four functional blocks.
rules are understood, it is time to specify the actual • Repeat the above steps for each of the functional
design itself. This requires getting the design into a blocks, replacing the connection of the external
machine-readable data base so that CAD tools can op I/O pins with the input and output nodes of the
erate on the design for simulation, layout, etc. In the function block. The steps are repeated until the
past, this meant producing hand-drawn schematics design is down to the lowest level specifiable, i.e.,
and entering a file (or even a card deck) that describes the standard cells or predesigned subnetworks
the design into a computer system. This was a cum such as counters, shift registers, biquads, etc.
bersome and error-prone job. Functional Design of Elements in Hierarchy
Today a designer can sit down at a graphics termi The subnetworks mentioned above allow a de
nal or a CAE workstation and use a graphical sche signer to operate at a somewhat higher level than
matic capture program that allows one to enter the standard cells. Much design time is wasted on re
design electronically and automatically produce the designing a 4-bit synchronous counter with low clear
necessary files used by later programs. In standard and load. If one does not exist in a library of subnet
cell design, the designer calls to the screen the cells to works, the designer must come up with one. Pro
be used and connects them together with "wires/' grams are available to do functional design of
Figure 3 shows an example of a graphically entered counters, shift registers, adders, and random logic.
schematic. The designer is prompted for the necessary param
The advantages to this approach are as follows: eters of the subnetwork, and the function is designed
• The design is stored in a data base, and changes from the library of standard cells. These programs
are very easy to implement. can also be interfaced to the schematic capture tools,
• The graphic capture programs are usually human creating both the internal and external views of the
engineered and quite easy to use. functionally designed elements for the designers use.
• Design partitioning between designers can be im In our opinion, the complexity of today's IC's can no
plemented because each designer can enter their longer support the luxury of doing it "my way be
own section of the circuit and combine designs cause it is the right way" type of attitude.
later in the process.
• The fact that the design is in a graphical data base Creation of Functional Vectors for
can be utilized by sophisticated design aids to Design Prove-In
perform advanced functions. At this point of the design process, it is assumed
• Very often, since circuit extraction for simulation that a connectivity file exists describing the connec
is derived from such schematic capture tools, it is tion of the standard cells comprising the circuit or sub
difficult to change the circuit description without network. This is required input to a logic simulator.
changing the schematic appropriately. The other entity needed by the logic simulator is a set
of input stimulus called vectors to exercise the circuit.
More Thoughts on Hierarchical Design For each of the inputs to the circuit, a logic state needs
Most schematic capture tools enable a user to enter to be specified for each time step. The simulator takes
designs hierarchically. This allows for designs to be this information and returns the state of internal
entered as connected functions, such as counters, ad nodes and circuit outputs.
ders, biquads, references, memory, and even micro Manual Input
processor blocks. The user enters these blocks once The simplest method of entering vectors is to man
and can add them to different sections of the design. ually create a file containing logic states that get
On many systems, not only library of standard cells mapped to the input pins. A vector is a line of logic
exists, but a rather extensive collection of functional values, and each pin would have its state specified in
22 IEEE CIRCUITS A N D DEVICES MAGAZINE
a column of Ts and 0's. The vectors do not have a unit form various degrees of simulation at different levels,
of time but are applied to the input pins by the but the designers needn't concern themselves with
simulator at some given vector period, for example, the details of logic simulators. The designer's concern
every 50 ns. The following is an example of a vector is with feeding in the correct information and inter
file: preting the results. The rest of this section will be a
brief description of the kinds of logic simulators that
Column 1: CLOCK
are important to standard cell-logic designers. More
Column 2: CLEAR
feature-rich simulators are not as important at this
Column 3: START
stage of the game, as they are more meaningful for
Column 4: MODE
post layout simulation.
Column 5: TEST
1—00110 Simulation Level
2—10110 There are three basic levels of logic simulation:
3—01010 functional, gates, and transistor level. Each level is a
4—11010 finer detail of simulation, requiring more information
from the user and more computer time. The user can
Another method of specifying vectors is with clock
start at the highest level and run finer and finer sim
formats. Each input is specified in terms of state tran
ulations as needed to prove the functionality and
sitions and when they occur. Fancier clock specifica
later some worst-case simulations of the design.
tion formats allow for repeating patterns and repeat
ing clock definitions. The following is an example: Functional level: Circuits simulated at the functional
level return logic values at the inputs and outputs of
CLOCK: I L 1H Repeat
the function blocks. What is required is library of
CLEAR: 2L 1H
function block models usually written in a higher
START: 2H 100L 25H 15L
level language. The logic simulator supplies the in
MODE: 10 (10H 5L) 35H
puts to the block, and the model supplies the out
TEST: 50L 65H
puts. The user can not look inside at the internal
Higher Level Vector Compilers nodes of the function block because they do not exist.
Compilers exist that allow a user to specify vec Functional simulations are usually very fast, as the
tors in a higher level language instead of Ts and 0's simulator does not have to keep track of much infor
and clock definitions. Functions for count up, count mation.
down, pattern generation, walking l's and 0's, etc., are Gate level: At the gate level, a circuit is described as a
available. Bus definitions allow signals to be grouped collection of NAND, NOR, AND, OR, INVERTER,
together, and programming constructs such as loops and other simple logic gate functions. The standard
and subroutines may be added to allow for some com cells are comprised of these gates, so gate-level simula
plex vectors to be written with ease. An example: tion is a reasonable choice for simulating standard cell
Bus (ABUS, [A0:A7]); designs. A library must exist describing the standard
Clock_Def (CLOCK, start_time=0, period=50, cells in terms of these gates, and, usually, other infor
duty_cycle=40% ); mation is given in these libraries such as input capaci
Count (ABUS, direction=UP, start_value=lF, tances and other parameters useful for multiple delay
count_period=100); and timing simulation (see the next section). Gate-
Walk (ABUS, f u n c = l THRU0, start_value=00, level simultion returns logic values for internal nodes
of the circuit including some nodes internal to the
walk_period=200);
standard cells, obviously only down to the gate level.
Analog Stimuli This should give the designer plenty of information
Analog circuits usually have stimuli that are repre as to the functionality of the circuit.
sented either by a frequency-domain response, as is
Transistor level: When detailed simulations are re
the case with filters, or by a time-domain require
quired of critical paths in the circuit, as well as sim
ment, as is the case with the acquisition time of a
ulation of special cells, a transistor-level simulation is
PLL. Some criteria like PSRR and noise are simulated
required. This type of simulation takes as input the
and then analyzed to find out whether they are within
connection of the circuit at the device level, tran
design margins.
sistors, metal paths, poly paths, etc., and constructs a
Logic Simulation model of the circuit in terms of resistance and capaci
With a connectivity file describing the circuit and a tance. The simulator returns timing waveforms de
set of functional vectors to exercise the circuit de scribing the characteristics of the circuit. For large
scription, the next step is to feed this information into collections of standard cells, this level of simulation is
a black box called a logic simulator. The logic simulator impractical, as the designer should not care about the
simulates the logic states of all the nodes of the cir characteristics of every node, only those that are
cuit. Many varieties of logic simulators exist that per- deemed critical. Transistor-level simulations usually
JANUARY 1985 23
Logic Simulation Viewing of Simulation Results
• Functional Level
» Gate Level
Π Ι I 1 11
Figure 4 Figure 5
Simulation Output
fl s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s
Ε H H H H H M M H H H H H H H Η Η Η Η Η H H H H H H H H H H H
F 5 2 2 T B Β R C A A F F F F F F F F F F F F F F S F F F F F S F F F F F S F F F F F F F F F F F
R β 5 S T G A C S T 0 0 1 1 1 1 1 1 1 I 1 1 I I 1 1 H 1 1 1 1 1 H l 1 » 2 2 2 2 2 2 2 2
F M S S K Ï e T T M C C CCCLTCCI I
R t P P M 2 K A L G L L 2 A A A A F AF F F F F F F F I F F F F F F F F F F
D P D D O G M B D L K K F F F F F
0 0 0 0 0 F F F F F F F F F F F F F F
C 2 0 1 2 0 2 3 4 C C C K C Z Z 1 2 2 2 2 B 2 3 3 3 3 C 3 4 4 4 4 0 4 V '
I-1 0 t t 1 1 1 1 1 0 0 0 3 3 1 3 3 3 3 3
2-1 0 1 1 13 3 3 1 0 0 1 1 3 3 3 3 3 1 3 3 3 3 3 1 3 3 3 3 1 0 3 3 3 3 0 3 1 3 -
3-0 0 1 1 0 0 1 1 1 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 0 3 3 3 3 1 3 3 3 -
4-0 0 1 1 ο ι 1 0 1
ί σ ο ι
0 1 1 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 0 3 3 3 3 1 3 3 3 -
10 0 1
5-0 0 1 1 0 1 10 0 1 1 0 3 3 1 10 1 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 0 3 3 3 3 1 3 3 3 -
10 0 1
6-0 0 1 1 0 1 0 1 1 0 0 0 1 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 0 3 3 3 3 1 3 3 3
10 0 1
7-0 0 1 0 0 0 0 0 10 0 10 0 1 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 0 3 3 3 3 1 3 3 3
0 0 0 1
8-0 0 1 0 0 0 0 0 0 1 1 0 0 0 3 3 0 1 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 0 3 3 3 3 1 3 3 3
0 0 0 1
9-0 0 1 0 0 0 0 0 10 0 1 0 0 3 3 1 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 0 3 3 3 3 1 3 3 3
0 0 0 1
10-0 0 1 0 0 0 0 0 0 1 1 0 0 0 3 3 1 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 0 3 3 3 3 1 3 3 3
0 0 0 1
11-0 0 » 0 0 0 0 0 î o o i o 0 0 0 1 0 3 3 1 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 0 3 3 3 3 1 3 3 3
12-0 0 1 0 0 0 0 t 1 0 0 0 3 3 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 0 3 3 3 3 1 3 3 3
0 0 0 1
13-0 0 1 0 0 0 10 0 1 0 0 3 3 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 0 3 3 3 3 1 3 3 3
0 0 0 1
14-0 0 1 0 0 0 0 1 1 0 0 0 3 3 1 10 1 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 0 3 3 3 3 1 3 3 3
0 0 0 1
15-0 0 1 0 0 0 l o o t o 0 0 0 1 1 1 0 3 3 1 1 0 1 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 0 3 3 3 3
16-0 0 1 0 0 0 0 1 1 0 0 0 3 3 1 0 1 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 0 3 3 3 3 1 3
0 0 0 1
17-0 0 I 0 0 0 1 0 0 I 0 0 1 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 0 3 3 3 3 1 3 3 3
0 0 0 1
18-0 0 1 0 0 0 0 1 1 0 0 0 0 0 1 1 1 1 0 3 3 1 1 0 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 0 3 3 3 3 1 3 3 3
19-0 0 1 0 0 0 10 0 1 0 0 0 0 1 1 1 1 0 3 3 1 1 0 1 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 0 3 3 3 3 1 3 3 3
zo-o ο ι 0 0 0 0 1 0 0 0 0 0 1 1 1 1 0 3 3 1 1 0 1 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 0 3 3 3 3 1 3 3 3 -
t 0 0 0 0 0 1 0 0 1 1 1 0 3 3 1 1 0 1 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 0 3 3 3 3 1 3 3 3 -
21-0 0 1 0 0 0 1
1 0 0 10 0 0 0 1 1 0 0 1 » 1 0 3 3 1 10 1 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 0 3 3 3 3 1 3 3 3
22-0 0 I 0 0 0 1
23-0 0 1 1 0 0 10 0 0 10 0 1 0 1 1 1 0 3 3 1 1 0 1 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 0 3 3 3 3 1 3 3 3
0 0 0 1
24-0 0 I I 0 0 10 0 0 0 1 1 0 0 1 1 1 0 3 3 1 1 0 1 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 0 3 3 3 3 1 3 3 3
0 0 0 1
25-0 0 1 1 0 0 10 0 0 10 0 10 0 0 0 1 0 3 3 1 1 0 1 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 0 3 3 3 3 1 3 3 3
26-0 0 1 1 0 0 10 0 0 0 1 1 0 0 0 0 0 1 1 1 1 0 3 3 1 1 0 1 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 0 3 3 3 3 1 3
27-0 0 1 1 0 0 10 0 0 1 0 0 I 0 0 0 0 1 1 1 1 0 3 3 1 1 0 1 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 0 3 3 3 3 3 3
?8-0 0 1 1 0 0 10 0 0 0 1 1 0 0 0 0 0 1 1 1 1 0 3 3 1 1 0 1 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 0 3 3 3 3 3 3
29-0 0 I t 0 0 10 0 0 10 0 10 0 0 0 1 1 1 1 0 3 3 1 1 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 0 3 3 3 3 1 3 3 3 -
30-0 0 1 1 0 0 10 0 0 0 1 1 0 0 0 0 0 1 1 1 1 0 3 3 1 1 0 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 0 3 3 3 3 1 3 3 3 -
31-0 0 I 1 0 0 10 0 0 10 0 10 0 0 0 1 1 1 1 0 3 3 1 1 0 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 0 3 3 3 3 1 3 3 3 -
32-0 0 1 1 0 0 10 0 0 0 1 1 0 0 0 0 0 1 1 1 1 0 3 3 1 1 0 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 0 3 3 3 3 1 3 3 3 -
33-0 0 1 1 0 0 10 0 0 10 0 10 0 0 0 1 1 1 1 0 3 3 1 1 0 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 0 3 3 3 3 1 3 3 3
34-0 0 1 10 0 0 0 1 1 0 0 0 0 0 1 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 0 3 3 3 3 1 3 3 3 -
35-0 0 1 10 0 0 10 0 1 0 1 1 1 0 3 3 1 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
ό σ ο ι 0 3 3 3 3 1 3 3 3 -
10 0 0 0 1 1 0 0
36-0 0 1 0 0 0 1 1 1 1 0 3 3 1 0 1 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 0 3
10 0 0 10 0 1 0
37-0 0 1 0 0 0 1 1 1 1 0 3 3 1 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 0 3 3 3 3 1 3 3 3
38 0 0 I 10 0 0 0 1 1 0 0 0 3 3 1 1 0 1 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 0 3 3 3 3 1 3 3 3
0 0 0 1
39-0 0 1 ι 0 0 10 0 0 10 0 10
0 0 0 1 0 3 3 1 1 0 « 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 0 3 3 3 3 1 3 3 3
40-0 ι 0 0 10 0 0 O 3 3 1 1 0 I 0 O 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 O 3 3 3 3 1 3 3 3
1-0 0 10 0 0 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 0 3 3 3 3 1 3 3 3
0 0 0 1
42-0 0 ι ο ο 10 0 0 10 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 0 3 3 3 3 1 3 3 3
43-0 0 0 0 0 1 0 3 3 1 1 0 10 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
I 0 0 1 0 3 3 3 3 1 3 3 3
0 0 0 1
44-0 0 1 0 0 10 0 0
0 0 0 1
0 3 3 1 10 10 0 3 3 3 3 3 3 3 1 3 3 3 3 3 3 3 3 3 3 0 3 3 3 3 1 3 3 3 -
45-0 0 10 0 0 0 3 3 1 1 0 10 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 0 3 3 3 3 1 3 3 3 -
46-0 0 0 0 10 0 0 0 1 0 3 3 1 1 0 10 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
0 0 1 3 3 1 3 3 3 -
47-0 0 0 0 10 0 0
ό σ ο ι 10 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 ,
o o o i . . , . _ _
48-0 0 0 0 10 0 0 I I 1 10 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 0 3 3 3 3 1 3 3 3
49-0 0 0 0 10 0 0 0 1
0 0 0 3 3 1 1 0 ' 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 0 3 3 3 3 1 3 3 3
50 0 0 0 0 0 1
10 0 0 1 t 1 1 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 0 3 3 3 3
0 0 0 1
10 0 0 0 0 0 1
10 0 0
0 0 0 1
Figure 6
XL I III I IIII I III I I II I II III I I III I I I IIII III I I I III IIIII III I I I I I I I I I I I I I I I I I I I il I I I I
I I I I
10 20 30 40 50 60 70 80
Figure 7
JANUARY 1985 25
the circuit. This mode used in conjunction with a fea the most computer intensive because it must break up
ture to list high fanout gates should be adequate for the simulation into very small time steps and do a
standard cell design. See the section on capacitance large number of calculations at each of these steps.
extraction for more discussion on this subject. Mixing simulation modes: A useful feature of a logic
simulator is to allow for use of the various modes for
Timing simulation: The formula for voltage is
different nodes of the circuit. This way, a critical sec
v = 1/C integral from tl to t2 of i dt tion can be run in full timing simulation and the rest
in multiple delay mode, therefore not slowing down
The amount of current that a MOS can supply is finite,
the execution time of the simulation appreciably. In
so the voltage of nodes is inversely proportional to the
our opinion, this may be the most cost-effective tech
capacitance. The capcitance of nodes in VLSI devices
nique for verifying large mixed mode analog/digital
is the sum of the input capacitances of all the gates in
circuits.
the fanout list plus the parasitic routing capacitances
for all the routing paths. The libraries from which the Viewing and Interpretation of
simulations are run should list all the input capaci Simulation Output
tances of gates. The routing capacitances are not really Once a simulation of the circuit has been run, it is
known until layout time, so prelayout simulations re the responsibility of the designer to interpret the re
quire a guess at the routing capacitance. Once this sults and figure out what it all means. Verification that
information is at hand, the simulator merely applies the circuit is functionally correct is critical. It is im
the above formula in finite time steps and produces portant that the vectors exercise as much of circuit as
an accurate representation of the node voltages for possible. Alternately, all performance aspects that are
the circuit. For example, the simulator can spit out directly and indirectly specified should be verified.
voltages between zero and five volts in a tenth of volt The simulator will spit out the states of the circuit out
increments and construct a waveform from these puts and internal nodes for each time step. A me
numbers. dium size circuit could have a few thousand internal
Timing simulation is very important for critical path nodes and be simulated for tens of thousands of vec
analysis. A signal may not have enough time to reach tors. This is a lot of information and can be very baf
logic level before a latching clock comes along. The fling when trying to figure out what the circuit is
designer can easily determine this by looking at full really doing. Figure 5 covers some of today's modes of
timing waveforms for the node. Timing simulation is viewing simulation results.
0 Output
•CLOCK 1
CLEAR 1 χ. inv2
φ a)>0-
^ IMDD
Figure 8
Connectivity
File
MOS Expected
Simulation Outputs
Functional
Vectors
*
Compare
TN—
Digital Actual
Test Outputs
TTL Model
of Device
Figure 9
JANUARY 1985 27
pie in the system area will be very appreciative to get amount of time to the design cycle, which should, of
a model or models of the device to prove-in some of course, be avoided.
their own designs as well. Another important reason Because standard cells are being used, and a data
to generate a model of the circuit is for testing pur base exists describing what cells have been used and
poses. The model can be tested with the same vectors how they are connected, it should not be too much
that simulated the MOS standard cell circuit and dif trouble to access this data base and do a conversion of
ferences noted. In addition, when the production test the design from MOS to TTL. Each standard cell
of the device is being generated on the big-chip test should have a fairly straightforward functional sub
ers off the fab line, having a model of the circuit will stitution in a TTL part. More complex cells may re
save many hours of test prove-in time. quire more than one TTL dip. The conversion process
Analog circuitry is very rarely breadboarded, ex can be semiautomatic in that the user may have a sub
cept for functional verification, because of the in network that is functionally equivalent to a TTL part.
ability to accurately emulate their hostile integrated For example, the designer has a counter subnetwork
environment. Predefined integrated blocks are very that is a 74xx; the program doing the conversion is
rarely used for this purpose, since they usually lack seeded with this information, and that subnetwork is
the capability to drive the parasitic loading resulting always replaced with that one dip. This method of
from a breadboard implementation. conversion almost guarantees an exact functional
mapping between M O S and TTL. A perfect conver
sion may not be possible, depending on the design,
Methods of Model Generation
and the user may have to cheat the conversion pro
There are many ways to generate a model of a VLSI
gram to get the TTL model to functionally match.
device. Someone can take the schematics of the stan
This is better in the long run than doing the whole
dard cell design and convert them on paper to a TTL
conversion by hand.
design. The designer can re-enter the design with a
schematic capture program, generate a connectivity
Functional Testing
list, and filter this list to generate a wire wrap or a
printed circuit board. This method is very tedious, Pass/fail of model: Once the model is constructed
especially for larger designs, and is also very error and stuffed with parts, it must be tested. If a digital
prone as there is no guarantee that the TTL design tester is available, the same vectors used to simulate
will match the MOS design. This could add a large the M O S circuit can be used. Comparisons can be
Digital Functional
Tester Vectors
Wire Wrap Machine
1 PASS/FAIL
I wmrn ι
Ί . O D j
Figure 10
RDEC.PXMON ι Ui Ο L
RDEC.RXM1N ι u-i «-j uJ"
RDEC.PXM2N ι ι_ι i~j t-J—
RDEC.PXM3 I Π , Π
RDEC.PXMO I η , η
RDEC.PXM2 I Π ΓΧ
NOS ι
CMIN »
BSDO t — ,· ι r
RDI ι
RDN ι
RD I
LL2 i i —
RTI re j L—
RTIN lJt ι »
NDB I
PDB Π
LU I
Figure 11
JANUARY 1985 29
PLACE and ROUTE Activities
Capacitance
Extraction
I
Simulation
(Timing) Functional Vectors
Figure 12
Figure 13
JANUARY 1985 31
This is usually done in poly-silicon in one direction However, if the pre- and post-layout simulations
(horizontal or vertical) and in metal in the other; Fig. disagree, the circuit should be looked at in detail. The
13 gives a subset of a finished placed-and-routed cir usual cause for disagreement is that a signal did not
cuit. Routing is also a nontrivial problem as the size have time to reach its desired logic level before the
of the device is directly affected on how good of latching edge of the system clock. The extra delay due
a routing job is done. Many algorithms are imple to the routing capacitance or large fanout causes the
mented in different routers, and this is not the place signal propagation time to increase to an unaccept
for analysis of different schemes. able level. The cure for this problem is to either redo
The routing task is also interactive in that a shot at the layout, paying particular attention to these failing
routing is done and then can be optimized or the nodes, or to replace the cell, sourcing the signal with
placement can be optimized and then the router can a higher powered version of the cell. If this second
run again. The goal is to reduce dead space on the method is chosen, the circuit will have to be replaced
silicon. A typical routing job will cause the routing as well as rerouted with the new high-powered cells.
area to be less than four times the area that the stan If the place and route programs are sophisticated
dard cells require. enough, they will store their optimized placement
and routing variables so that the addition of high
Correct by Construction powered cells will not cause too many problems when
This mode of place and route via design aids not the programs are run again.
only yields a fairly optimized silicon design but
Clock Skew
should also yield a mask set that is "correct by con
Another concern at layout time is that the system
struction." The router should not cross any wires; nor
clock be given precedence at routing time. The system
should he route any signals to the wrong place. This
clock will be routed to every flip-flop in the circuit,
is a very important aspect of the layout process as a
and the idea of a synchronous design is that all out
lot of time need not be wasted verifying the layout
puts change at the same time. The routing and fanout
with design rule checkers because of the assumption
capacitance of the clock signal will obviously be very
that the cells are correct and the routing is correct. It
high, and the clock signal should be boosted with
was done without the intervention of human hands,
high powered buffers. Care should be taken to ensure
and if the design aid is bug free, the layout should be
that minimal skew be introduced by the layout. Too
correct.
much clock skew could cause the circuit to mis
Capacitance Extraction behave, especially in a circuit with a fast clock and
Once the layout is complete, the circuit should be very short time between latching edges. A transistor-
resimulated. The delay of every node has several com level simulation is best run on the system clock node
ponents: the delay of the rise of fall time of the gate, to ensure that it will not be delayed by any great ex
the delay due to the sum of the input capacitances of tent. Cures for this problem are to boost the clock sig
all the gates fanned out to, and the delay due to the nal and further optimize the clock signal routing.
routing capacitance. The first two delays are table
Fabrication
lookups, as these numbers should be part of the stan
The verified layout that includes line-width control
dard cell library. There are two methods of extracting
and alignment features is compensated on the appro
the routing capacitance. One is to run an extractor on
priate mask levels for alignment tolerances, process
the routing layout and produce a number for the ca
ing, etc., and is best done by the foundry that is going
pacitance of each routing path. The other, simpler
to produce the device. If several designs are being
method is to keep track of the capacitance as the rout
proven in at the same time, it is usually a good idea to
ing paths are "laid down." Again, because the routing
put them on the same wafer in a multiproject fashion.
is done automatically, the router should have all the
Even though this may mean some wasted silicon area,
information in hand to produce the capacitance lists
ultimately it could result in lesser developmental cost
for each node.
and faster turnaround time. Once the designs are
proven-in, an individual mask set can be ordered for
Identify Cells with Large Fanout and each chip. The process cycle is normally a 6-to-10-
High Routing Capacitances week time period. On a prove-in lot, it is usually good
The capacitance information from the layout should practice to hold a few wafers out of the lot at each
be fed back to the simulator for a finer detailed sim mask step. This enables one to quickly recover if the
ulation. Some sort of multiple delay or timing simula first parts were less than fully functional and needed
tion is run on the circuit taking into account all the a one or two mask level change. It is also a good idea,
possible sources of capacitance. This simulation is if the first parts work, to go and exercise the other
then compared to the original simulations of the cir wafers for the extremes of process spread.
cuit. If the simulations match, the circuit is in good
shape and it's time to order a mask set and start pop Device Testing
ping out chips. The foundry will usually provide tested wafers or
32 IEEE CIRCUITS A N D DEVICES MAGAZINE
tested prototypes. It is very important that the de References
signer test the part in the system for proper operation [I] L. A. Fajardo, C. C. Liaw, and M. Tong, "A System for High
and exercise it thoroughly for its functionality. Very Level Design Capture and Synthesis/' AT&T Bell Labora
tories Technical Journal, to appear.
often, comparison of the measured data with simu
[2] B. R. Chawla, H. K. Gummel, and P. Kozak, "MOTIS—An
lated results point out clearly the robustness of the MOS Timing Simulator," IEEE Trans. Circuits and Systems, vol.
simulation/modeling techniques and is very often a CAS-22, no. 12, pp. 9 0 1 - 9 1 0 , Dec. 1975.
good metric for future designs. Extremes of tempera [3] E. Frey, "ESIM: A Functional Level Simulation Tool," Proc.
tures and power supply should be carefully studied 1984 International Conference on Computer-Aided Design,
Santa Clara, Calif., Nov. 1 2 - 1 5 , 1984.
for compliance with the specifications. Contrary to
[4] A. K. Bose, "A System of Computer Aids for LSI Design,"
popular claims, even the simplest of designs that The 3rd International Conference on Semi-Custom IC's, Lon
produce perfectly functional prototypes go through don, England, Nov. 1 - 3 , 1983.
at least one mask iteration in manufacture, and it is a [5] V. D. Agrawal, A. K. Bose, P. Kozak, H. N. Nham, and E.
good idea to ensure that all of your changes be dis Pacas-Skewes, "A Mixed-Mode Simulator," Proc. 17th Design
Automation Conference, Minneapolis, Minn., June 2 3 - 2 5 ,
cussed carefully with your foundry, before the final
1980.
design is approved for large volume production. In [6] J. Dussault, C. C. Liaw, M. M. Tong, "A High Level Synthesis
the case of mixed analog/digital circuitry, a week or so Tool for MOS Chip Design," Proc. 21st Design Automation
spent in reviewing the test program and ensuring all Conference, Albuquerque, New Mexico, June 2 5 - 2 7 , 1984.
aspects of performance are adequately tested is es [7] L. W. Nagel, "SPICE2: A Computer Program to Simulate
Semiconductor Circuits," University of California, Berkeley,
sential to the ultimate success of the program. Early
ERL Memo no. ERL-M520, May 1975.
lot-test statistics should be carefully reviewed with [8] A. E. Dunlop, "SLIM—The Translation of Symbolic Layouts
the foundry for reordering the test sequence to en into Mask Data," Proc. 17th Design Automation Conference,
sure that the test time (this could be more than half pp. 5 9 5 - 6 0 2 , June 1980.
the cost of the final product) is minimized. Our expe [9] A. Kessler, "SIMULYZER: A Design through Testing CAE
Workstation," 1983 Bell System Conference on Electronic
rience indicates inadequate review of testing with the
Testing, Princeton, NJ, Oct. 4 - 6 , 1983.
foundry is the dominant cause of yield problems in [10] T. Yoshimura, "An Efficient Channel Router," Proc. 21st De
large-volume manufacture. It is our feeling that mak sign Automation Conference, Albuquerque, New Mexico,
ing one of a kind is usually a much easier task than to June 2 5 - 2 7 , 1984.
be able to make several of them in large volume; [II] M. Burstein and R. Pela vin, "Hierarchical Wire Routing,"
IEEE Trans Computer-Aided Des. of ICS, vol. CAD-2, no. 4,
hence, we would like to repeat that the test data be
pp. 2 2 3 - 2 3 4 , 1983.
carefully reviewed for any design sensitivities that
[12] Ε. I. Muehldorf and A. D. Savkar, "LSI Logic Testing—An
were overlooked. Overview," IEEE Trans on Electronic Computers, vol. C-30,
no. 1, pp. 1 - 1 7 , Jan. 1981.
Concluding Remarks [13] S. DasGupta, P. Goel, R. G. Walther, and T. W. Williams, "A
Variation of LSSD and Its Implication on Design and Test
In this paper, we have attempted to expose the
Generation," 1982 International Test Conf., pp. 6 3 - 6 6 , Nov.
practicing engineer to the process of designing an in 1982.
tegrated circuit and getting it into large-volume man [14] M. S. Abadir and H. K. Reghbati, "Test Generation for LSI:
ufacture. No claim is made that all the issues have A Case Study," Proc. 21st Design Automation Conference,
been covered exhaustively; the issues are complex Albuquerque, New Mexico, June 2 5 - 2 7 , 1984.
[15] N. H. J. Weste, "MULGA—An Interactive Symbolic Layout
and numerous to cover in any single work like this
System for the Design of Integrated Circuits," Bell System
one. In fact, everyday we learn a few more ways of Technical Journal, July-Aug., 1981.
making our design tasks a little less painful. It is our [16] S. Y. H. Su and T. Lin, "Functional Testing Techniques for
hope that after reading this work, at least some of you Digital LSI/VLSI Systems," Proc. 21st Design Automation
would say: "Gee. . . It isn't that bad after all." Conference, Albuquerque, New Mexico, June 2 5 - 2 7 , 1984.
[17] B. D. Richard, "A Standard Cell Initial Placement Strategy,"
Included is a list of valuable references that helped
Proc. 21st Design Automation Conference, Albuquerque,
in the writing of this tutorial. We would like to thank New Mexico, June 2 5 - 2 7 , 1984.
the reviewers for pointing out that the methodology [18] M. Palczewski, "Performance of Algorithms for Initial Place
suggested may not always be economically viable, ment," Proc. 21st Design Automation Conference, Albuquer
particularly in a "small company" environment. que, New Mexico, June 2 5 - 2 7 , 1984.
Acknowledgments
We apologize to the many whose works have not period of ten years. We would like to thank K. R.
been referenced. The material in this paper is the sum Laker and A. S. Sedra for giving us this opportunity
mary of the design experiences of many a contributor and T. M. Dennis for his support.
with whom the authors have been associated over a
33
JANUARY 1985
Andy Kessler has been working in the area of computer-aided en
gineering and VLSI design for the past four years. He graduated
from Cornell University in 1980 with a B.S.E.E. and the University
of Illinois in 1981 with an M.S.E.E. Mr. Kessler has presented pa
pers at the 1982 International Conference of Circuits and Comput
ers and at 1983 Bell System Conference on Electronic Testing. He is
a member of the Eta Kappa Nu and Tau Beta Pi honor societies, and
his interests include music, sports, and Siberian Huskies. Dr. Kes
sler is currently with AT&T Information Systems.
A. Ganesan
WHA Τ DO
YOU THINK?
Our stated purposes for IEEE
Circuits and Devices Magazine
are to present new technology
with emphasis on applications
and to provide members with
current information on new
products, inventions, books,
conference schedules, etc.