0% found this document useful (0 votes)
235 views26 pages

Dynamic Inverter Design for EE476

This document discusses techniques for designing logic gates for improved speed performance in VLSI circuits. It covers: 1) Reducing load capacitance and increasing transistor widths to improve speed. 2) The dependence of delay on input patterns, with different patterns producing different delays. 3) How propagation delay deteriorates rapidly with increasing fan-in. 4) Techniques like progressive transistor sizing and input reordering to reduce delay in complex gates.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
235 views26 pages

Dynamic Inverter Design for EE476

This document discusses techniques for designing logic gates for improved speed performance in VLSI circuits. It covers: 1) Reducing load capacitance and increasing transistor widths to improve speed. 2) The dependence of delay on input patterns, with different patterns producing different delays. 3) How propagation delay deteriorates rapidly with increasing fan-in. 4) Techniques like progressive transistor sizing and input reordering to reduce delay in complex gates.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

EE476

VLSI

Lecture 6: Designing for Speed

CSE477 L10 Inverter, Dynamic.1 Irwin&Vijay, PSU, 2002


Cray was a legend in computers … said that he
liked to hire inexperienced engineers right out of
school, because they do not usually know what’s
supposed to be impossible.
The Soul of a New Machine, Kidder, pg. 77

CSE477 L10 Inverter, Dynamic.2 Irwin&Vijay, PSU, 2002


Review: CMOS Inverter: Dynamic

VDD

tpHL = f(Rn, CL)

Vout
tpHL = 0.69 Reqn CL

CL tpHL = 0.69 (3/4 (CL VDD)/ IDSATn )


Rn

= 0.52 CL / (W/Ln k’n VDSATn )


Vin = V DD

CSE477 L10 Inverter, Dynamic.3 Irwin&Vijay, PSU, 2002


Review: Designing Inverters for Performance
 Reduce CL
 internal diffusion capacitance of the gate itself
 interconnect capacitance
 fanout

 Increase W/L ratio of the transistor


 the most powerful and effective performance optimization
tool in the hands of the designer
 watch out for self-loading!

 Increase VDD
 only minimal improvement in performance at the cost of
increased energy dissipation

 Slope engineering - keeping signal rise and fall times


smaller than or equal to the gate propagation delays
and of approximately equal values
 good for performance
 good for power consumption
CSE477 L10 Inverter, Dynamic.4 Irwin&Vijay, PSU, 2002
Switch Delay Model

A Req
A

Rp
Rp Rp
B
A B Rp
A Rp Cint
Rn CL A
A Rn CL
A Rn Rn CL
Rn
Cint
A B
B INVERTER

NOR
NAND
CSE477 L10 Inverter, Dynamic.5 Irwin&Vijay, PSU, 2002
Input Pattern Effects on Delay
 Delay is dependent on the pattern of
inputs

Rp Rp  Low to high transition


A B  both inputs go low
- delay is 0.69 Rp/2 CL since two p-resistors
are on in parallel
Rn CL  one input goes low
A - delay is 0.69 Rp CL

Rn
Cint  High to low transition
B  both inputs go high
- delay is 0.69 2Rn CL

 Adding transistors in series (without


sizing) slows down the circuit

CSE477 L10 Inverter, Dynamic.6 Irwin&Vijay, PSU, 2002


Delay Dependence on Input Patterns
2-input NAND with
NMOS = 0.5µm/0.25 µm
PMOS = 0.75µm/0.25 µm
3 CL = 10 fF
2.5 A=B=1→0

2 Input Data Delay


A=1 →0, B=1 Pattern (psec)
1.5
Voltage, V

A=B=0→1 69
1 A=1, B=1→0
A=1, B=0→1 62
0.5 A= 0→1, B=1 50

0 A=B=1→0 35
0 100 200 300 400
-0.5 A=1, B=1→0 76
time, psec
A= 1→0, B=1 57

CSE477 L10 Inverter, Dynamic.7 Irwin&Vijay, PSU, 2002


Fan-In Considerations

A B C D

A CL
B C3 Distributed RC model
C C2 (Elmore delay)
D C1 tpHL = 0.69 Reqn(C1+2C2+3C3+4CL)

Propagation delay deteriorates


rapidly as a function of fan-in –
quadratically in the worst case.

CSE477 L10 Inverter, Dynamic.8 Irwin&Vijay, PSU, 2002


tp as a Function of Fan-In

1250
quadratic
1000 function of
fan-in
750
tp (psec)

tpH tp
500
L

250 tpL
H linear
0 function of
2 4 6 8 10 12 14 16 fan-in
fan-in

 Gates with a fan-in greater than 4 should be avoided.


CSE477 L10 Inverter, Dynamic.9 Irwin&Vijay, PSU, 2002
Fast Complex Gates: Design Technique 1
 Transistor sizing
 as long as fan-out capacitance dominates

 Progressive sizing
Distributed RC line

InN MN CL M1 > M2 > M3 > … > MN

(the fet closest to the output


In3 M3 C3 should be the smallest)
In2 M2 C2
Can reduce delay by more
In1 M1 C1 than 20%; decreasing gains
as technology shrinks
While progressive sizing is easy in a schematic, in a real layout it may not pay off due to design-rule
considerations that force the designer to push the transistors apart, increasing internal
CSE477 L10 Inverter, Dynamic.10
capacitance.
Irwin&Vijay, PSU, 2002
Fast Complex Gates: Design Technique 2
 Input re-ordering
 when not all inputs arrive at the same time

critical path critical path

charged 0→1
In3 1 M3 CL In1 M3 CLcharged

In2 1 M2 In2 1 M2 C2 discharged


C2 charged
In1 In3 1 M1 C1 discharged
M1 C1 charged
0→1

delay determined by time to delay determined by time to


discharge CL, C1 and C2 discharge CL

CSE477 L10 Inverter, Dynamic.12 Irwin&Vijay, PSU, 2002


Sizing and Ordering Effects

A 3 B 3 C 3 D 3

A 44 CL= 100 fF
B 45 C3
C 46 Progressive sizing in pull-down
C2
chain gives up to a 23%
D 47 C1 improvement.

Input ordering saves 5%


critical path A – 23%
critical path D – 17%

CSE477 L10 Inverter, Dynamic.13 Irwin&Vijay, PSU, 2002


Fast Complex Gates: Design Technique 3
 Alternative logic structures

F = ABCDEFGH

Reduced fan-in -> deeper logic depth


Reduction in fan-in offsets, by far, the extra delay incurred by the
NOR gate (second configuration).
Only simulation will tell which of the last two configurations is
faster, lower power
CSE477 L10 Inverter, Dynamic.14 Irwin&Vijay, PSU, 2002
Fast Complex Gates: Design Technique 4
 Isolating fan-in from fan-out using buffer insertion

CL CL

 Real lesson is that optimizing the propagation delay of a


gate in isolation is misguided.

Reduce CL on large fan-in gates, especially for large CL, and size the inverters
progressively to handle the CL more effectively
CSE477 L10 Inverter, Dynamic.15 Irwin&Vijay, PSU, 2002
Fast Networks: Design Technique 5 - Logical Effort
 The optimum fan-out for a chain of N inverters driving a
load CL is N
f = √(CL/Cin)
 so, if we can, keep the fan-out per stage around 4.

 Can the same approach (logical effort) be used for any


combinational circuit?
 For a complex gate, we expand the inverter equation
tp = tp0 (1 + Cext/ γCg) = tp0 (1 + f/γ)
to
tp = tp0 (p + g f/γ)
- tp0 is the intrinsic delay of an inverter
- f is the effective fan-out (Cext/Cg) – also called the electrical effort
- p is the ratio of the instrinsic (unloaded) delay of the complex gate and
a simple inverter (a function of the gate topology and layout style)
- g is the logical effort
CSE477 L10 Inverter, Dynamic.16 Irwin&Vijay, PSU, 2002
Intrinsic Delay Term, p
 The more involved the structure of the complex gate, the
higher the intrinsic delay compared to an inverter

Gate Type p
Inverter 1
n-input NAND n
n-input NOR n
n-way mux 2n
XOR, XNOR n 2n-1

Ignoring second order


effects such as internal
node capacitances

CSE477 L10 Inverter, Dynamic.17 Irwin&Vijay, PSU, 2002


Logical Effort Term, g
 g represents the fact that, for a given load, complex gates
have to work harder than an inverter to produce a similar
(speed) response
 the logical effort of a gate tells how much worse it is at producing
an output current than an inverter (how much more input
capacitance a gate presents to deliver it same output current)

Gate Type g (for 1 to 4 input gates)


1 2 3 4
Inverter 1
NAND 4/3 5/3 (n+2)/3
NOR 5/3 7/3 (2n+1)/3
mux 2 2 2
XOR 4 12

CSE477 L10 Inverter, Dynamic.18 Irwin&Vijay, PSU, 2002


Example of Logical Effort
 Assuming a pmos/nmos ratio of 2, the input capacitance
of a minimum-sized inverter is three times the gate
capacitance of a minimum-sized nmos (Cunit)

B 4
A 2 B 2
A 2 A 4

A A•B
1 A+B
A A 2

B 2 A 1 B 1

Cunit = 3
Cunit = 4 Cunit = 5

CSE477 L10 Inverter, Dynamic.20 Irwin&Vijay, PSU, 2002


Delay as a Function of Fan-Out

 The slope of the line is


7
the logical effort of the
6 gate
normalized delay

5
 The y-axis intercept is
4 the intrinsic delay
3
effort delay
2  Can adjust the delay by
1 adjusting the effective
intrinsic delay fan-out (by sizing) or by
0
choosing a gate with a
0 1 2 3 4 5
different logical effort
fan-out f
 Gate effort: h = fg

CSE477 L10 Inverter, Dynamic.21 Irwin&Vijay, PSU, 2002


Path Delay of Complex Logic Gate Network
 Total path delay through a combinational logic block
tp = ∑ tp,j = tp0 ∑(pj + (fj gj)/γ )
 So, the minimum delay through the path determines that
each stage should bear the same gate effort
f1g1 = f2g2 = . . . = fNgN

 Consider optimizing the delay through the logic network

1 c
a b
CL 5

how do we determine a, b, and c sizes?


CSE477 L10 Inverter, Dynamic.22 Irwin&Vijay, PSU, 2002
Path Delay Equation Derivation
 The path logical effort, G = ∏ gi
 And the path effective fan-out (path electrical effort) is
F = CL/g1
 The branching effort accounts for fan-out to other gates
in the network
b = (Con-path + Coff-path)/Con-path
 The path branching effort is then B = ∏ bi
 And the total path effort is then H = GFB

 So, the minimum delay through the path is


N
D = tp0 ( ∑pj + (N √H)/ γ)
CSE477 L10 Inverter, Dynamic.23 Irwin&Vijay, PSU, 2002
Path Delay of Complex Logic Gates, con’t
 For gate i in the chain, its size is determined by
i-1
si = (g1 s1)/gi ∏ (fj/bj)
j=1

1 c
a b
CL 5

 For this network


 F = CL/Cg1 = 5
 G = 1 x 5/3 x 5/3 x 1 = 25/9
 B = 1 (no branching)
4
 H = GFB = 125/9, so the optimal stage effort is √H = 1.93
- Fan-out factors are f1=1.93, f2=1.93 x 3/5 = 1.16, f3 = 1.16, f4 = 1.93
 So the gate sizes are a = f1g1/g2 = 1.16, b = f1f2g1/g3 = 1.34 and
c = f1f2f3g1/g4 = 2.60
CSE477 L10 Inverter, Dynamic.24 Irwin&Vijay, PSU, 2002
Fast Complex Gates: Design Technique 6
 Reducing the voltage swing

tpHL = 0.69 (3/4 (CL VDD)/ IDSATn )

= 0.69 (3/4 (CL Vswing)/ IDSATn )

 linear reduction in delay


 also reduces power consumption
 requires use of “sense amplifiers” on the receiving end to
restore the signal level (will look at their design when covering
memory design)

CSE477 L10 Inverter, Dynamic.25 Irwin&Vijay, PSU, 2002


TG Logic Performance
 Effective resistance of the TG is modeled as a parallel
connection of Rp (= (VDD – Vout)/(-IDp)) and
Rn (=VDD – Vout)/IDn)
W/Lp=0.50/0.25
30
0V
25 Rn Rp
20 2.5V Vout
Rp
Resistance, kΩ

15 Rn
2.5V
10
Req = Rn || Rp W/Ln=0.50/0.25
5

0
0 1 2

 So, the assumption that the TG switch has a constant


resistive value, Req, is acceptable
CSE477 L10 Inverter, Dynamic.26 Irwin&Vijay, PSU, 2002
Delay of a TG Chain
0 0 0 0
Vin V Vi Vi+1
VN
1

5 C 5 C 5 C 5 C

Vin Req Req Req Req


V Vi Vi+1
VN
1
C C C C

 Delay of the RC chain (N TG’s in series) is


N
tp(Vn) = 0.69 ∑kCReq = 0.69 CReq (N(N+1))/2 ≈ 0.35 CReqN2
k=1

CSE477 L10 Inverter, Dynamic.27 Irwin&Vijay, PSU, 2002


TG Delay Optimization
 Can speed it up by inserting buffers every M switches
0 0 0 0 0 0

Vin VN

5 C 5 C 5 5 C 5 C 5 C

 Delay of buffered chain (M TG’s between buffer)


tp = 0.69 N/M CReq (M(M+1))/2 + (N/M - 1) tpbuf

Mopt = 1.7 √ (tpbuf/CReq ) ≈ 3 or 4

CSE477 L10 Inverter, Dynamic.28 Irwin&Vijay, PSU, 2002

You might also like