0% found this document useful (0 votes)
38 views82 pages

Lecture 5

The document discusses process-tolerant low-power design strategies for nano-meter scale electronics, highlighting the exponential increase in leakage power and the need for innovative design methods to mitigate performance degradation. It emphasizes the impact of process variations on device reliability and memory yield, proposing solutions such as self-repairing SRAM arrays and on-die leakage monitoring to enhance design robustness. Additionally, it introduces the CRISTA approach for low-voltage, variation-tolerant circuit synthesis, aiming to balance power efficiency and performance reliability.

Uploaded by

Phan Huong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views82 pages

Lecture 5

The document discusses process-tolerant low-power design strategies for nano-meter scale electronics, highlighting the exponential increase in leakage power and the need for innovative design methods to mitigate performance degradation. It emphasizes the impact of process variations on device reliability and memory yield, proposing solutions such as self-repairing SRAM arrays and on-die leakage monitoring to enhance design robustness. Additionally, it introduces the CRISTA approach for low-voltage, variation-tolerant circuit synthesis, aiming to balance power efficiency and performance reliability.

Uploaded by

Phan Huong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Process-Tolerant Low-Power

Design for the Nano-meter


Regime

Kaushik Roy
Electrical & Computer Engineering
Purdue University
Exponential Increase in Leakage
1970 1980 2000 2010 2020

5 µm 1 µm 100 nm 10 nm

Silicon Silicon Non-Silicon


Micro- electronics Nano- electronics Technology
I ON I ON I ON
= 106 = 103 ~ 102~6
I OFF I OFF I OFF

Leakage Power (% of Total)


Subthreshold Gate Leakage 50%
Leakage Must stop
Gate 40% at 50%
Source Drain 30%
n+ n+
20%
Junction
leakage 10%
A. Grove, IEDM 2002
0%
1.5 0.7 0.35 0.18 0.09 0.05
Bulk
µ)
Technology (µ
Technology Trend
2003
2009
2020

Fully-depleted body

VS
VG Nano devices
Bulk-CMOS Gate VD
DGMOS Carbon nanotube
Source Drain
VS
Gate
VG

VD
Buried Oxide (BOX) III-V devices
Source Floating Body Drain
Substrate
nano-wires
Vback
Spintronics
Buried Oxide (BOX)
FD/SOI
Substrate
FinFET Trigate
PD/SOI
Single gate device Multi-gate devices

Design methods to exploit the advantages of


technology innovations
Variation in Process Parameters
1 .4
Device 1 Device 2

Normalized Frequency
1 .3 30%
1 .2
130nm
1 .1
Source: Intel
1 .0
5X
0 .9
1 2 3 4 5
Channel length N o rm a liz e d L e a k a g e ( Is b )
Delay and Leakage Spread
10000

# dopant atoms
Source: Intel
1000

100

10
1000 500 250 130 65 32
Inter and Intra-die Technology Node (nm)
Variations Random dopant fluctuation
Device parameters are no longer deterministic
Reliability
Temporal degradation of performance -- NBTI
Failure probability

Tech. generation

Time

Defects Life time


degradation
Pessimistic Design Hurts Performance
200
(150nm CMOS Measurements, 110°°C)
Number of dies

150 nominal corner

100
worst-case corner
50

0
0 1 2 3 4 5 7
Normalized IOFF
 Substantial variation in leakage across dies
 4X variation between nominal and worst-case leakage
 Performance determined at nominal leakage
 Robustness determined at worst-case leakage
7

Global and Local Variations


Random Dopant Fluctuation

δ Vt − LOCAL

σ LOCAL
intra-die

∆Vt −GLOBAL

σ GLOBAL
inter-die
δ Vt = ∆Vt −GLOBAL + δ Vt − LOCAL
8

Process Tolerance: Memories

S. Mukhopadhaya, Mahmoodi, Roy


VLSI Circuit Symposium 2006, JSSC 2006, TCAD
9
Parametric Failures: Read Failure
WL

Voltage
VL
VTRIPRD WL

+∆ VR=‘0’
VREAD
VR=VREAD
PL PR
AXL AXR
VL=‘1’ ∆
-∆
NL NR Time ->

-∆ ∆
+∆
VR

Voltage
BL BR
WL

PRF = P(VREAD > VTRIPRD )


VL

Time ->
Read failure => Flipping of Cell Data while Reading
10

Parametric Failures in SRAM


WL

PL PR
‘1’ ‘0’
AXL AXR
NL NR

BL BR
High-Vt Low-Vt Test & Repair using
Redundancy
Parametric failures
– Read Failures
– Write Failures
– Access Failures
– Hold Failures
Faulty chips Working chips
Parametric failures can degrade SRAM yield
11

Process Variations in On-chip SRAM


350
Yield ≈ 33%
300
Fault statistics
Chip Count

250

200 σVt ≈ 30mv, using BPTM 45nm technology


150 Simulation of an 64KB Cache
100 A. Agarwal, et. al, JSSC, 05

50

0
157

262

367
419

524
577

682
0

210

315

472

629

734
52
105

839

944
996
786

890

1049
Number of faulty cells (NFaulty-Cells)

Parametric failures →Yield degradation


12

Inter-die Variation & Cell Failures

“1” “0” “1” “0”

Low–Vt Corners High–Vt Corners


− Read failure ↑ − Access failure ↑
− Hold failure ↑ − Write failure ↑
σGLOBAL
∆Vth-GLOBAL)
inter-die Vt shift (∆
13
Inter-die Variation & Memory Failure
LVT Nom. Vt HVT

BPTM 70nm Devices

Memory failure probabilities are high when


inter-die shift in process is high
14
Self-Repairing SRAM Array
LVT Nom. Vt HVT
Region A Region B Region C
Region A Region C
LVT Corner HVT Corner

Read & Hold Access & Write


failures dominate failures dominate

Reduce Reduce
RF & HF AF & WF

Reduce the dominant failures at different inter-die


corners to increase width of low failure region
15

Post-Silicon Repair: Proposed Approach

σ σ
LOCAL LOCAL

intra-dieintra-die

σ GLOBAL
inter-die

Apply correction to the global variation to reduce


number of failures due to local variations
16
Self-Repairing SRAM Array
LVT Nom. Vt HVT
Region A Region B Region C

RBB FBB
ZBB

Reduce the dominant failures at different inter-die


corners to increase width of low failure region
17

How to identify the inter-die Vt corner


under a large intra-die variation ?
WL

VDD

BL BR
GND

Monitor circuit parameters, e.g. leakage current


Effect of inter-die variation can be masked by
intra-die variation
18

Array Leakage Monitoring

N σY 1 σX
Y = ∑ X i => =
i =1 µY N µX

• Adding a large number of random variables reduces


the effect of intra-die variation
Leakage of entire SRAM array is a reliable
indicator of the inter-die Vt corner
19
Self-Repair using Leakage Monitoring
V
DD
Bypass
Switch
V V
On-chip V REF1 REF2
out VOUT
Leakage
Monitor
Calibrate SRAM
Signal Comparator ARRAY

SRAM Body
Array bias
Body-Bias
selection

FBB ZBB RBB

Entire array leakage is


monitored to detect inter-die
corner and proper body-bias
is selected
20

Yield Enhancement using Self-Repair

Self-Repairing SRAM using body-bias can


significantly improve design yield
21

Test-Chip of Self-Repairing SRAM


VCO

VCO
16 KB
Isolated cell

block 64 KB
LVT
Array
Sensor + Ref. gen. BB gen

Technology : IBM 0.13 µm


128KB SRAM Simulation results for 1MB
Dual-Vt Triple-well tech. µm
array designed in IBM 0.13µ
Number of Trans: ~ 7 million
Die size: 16mm2
VLSI CKT Symp. 2006, ITC 2005
22

Continuous vs Quantized Body Bias

Quantized (3 Level: FBB, ZBB, RBB) body bias


scheme is a cost effective solution with good
yield enhancement possibility
Process Tolerance: Register Files

Kim et. al. VLSI Circuit Symposium 2004


Process Compensating Dynamic Circuit
Technology
Conventional Static Keeper

clk

LBL0
N0
RS0 RS1 RS7

... LBL1
D0 D1 D7

 Keeper upsizing degrades average performance


Process Compensating Dynamic Circuit
Technology
3-bit programmable keeper
b[2:0]

W s 2W s 4W s
clk

LBL0
N0
RS0 RS1 RS7

... LBL1
D0 D1 D7

C. Kim et al. , VLSI Circuits Symp. ‘03

 Opportunistic speedup via keeper downsizing


Robustness Squeeze
250 Noise Conventional
floor This work
Number of dies

200

150

100
saved
50 dies

0
0.7 0.8 0.9 1.0 1.1 1.2
Normalized DC robustness

 5X reduction in robustness failing dies


Delay Squeeze
300
Conventional
250 This work
Number of dies

200 PCD μ = 0.90


Conv. μ = 1.00
150
μ : avg. delay
100

50

0
0.8 0.9 1.0 1.1 1.2
Normalized delay
 10% opportunistic speedup
On-Die Leakage Sensor For Measuring
Process Variation

current
reference
VBIAS

mirrors
current
μm

gen.
compa
73μ

rators NMOS
device

test interface

μm
83μ

C. Kim et al. , VLSI Circuits Symp. ‘04

 High leakage sensing gain – 90nm dual-Vt,


Vdd=1.2V, 7 level resolution, 0.66 mW @80Cº
Leakage Binning Results

001 010 011 100 101 110 111

Output codes from leakage sensor


Self-Contained Process Compensation
Fab Wafer test
Process detection
Leakage measurement Program
PCD
using
On-die leakage sensor fuses

Customer Package test Burn in Assembly


Self-Repair: Architecture Level

Agarawal, Roy TVLSI 2005


Fault-Tolerant Cache Architecture

Faulty

 BIST detects the faulty blocks


 Config Storage stores the fault information
Idea is to resize the cache to avoid faulty blocks
during regular operation
Mapping Issue

More than one INDEX TAG INDEX Off Include column


are mapped to same Column Address address bits into
block Off TAG bits
INDEX
New TAG

Resizing is transparent to processor → same memory address


Fault Tolerant Capability
350
Chip Count (Nchip)

Fault statistics
300
Chips saved by the proposed + redundancy (R=8, r=3)
250
Chips saved by ECC + redundancy ( R=16)
200
More number of saved chips
150 as compare to ECC ECC fails to save
100 any chips
50

0
0 105 210 315 419 524 629 734 839 944 1049

NFaulty-Cells
 Proposed architecture can handle more number of
faulty cells than ECC, as high as 890 faulty cells
 Saves more number of chips than ECC for a given
NFaulty-Cells
CPU Performance Loss
2.5

% CPU Performance Loss


For a 64K cache
2.0 averaged over SPEC
2000 benchmarks
1.5

1.0

0.5

0.0
0 105 210 315 419 524 629 734 839
NFaulty-Cells

 Increase in miss rate due to downsizing of cache


Average CPU performance loss over all SPEC 2000
benchmarks for a cache with 890 faulty cells is ~ 2%
Logic: Process Tolerance
37

Logic: A New Paradigm for Low-Voltage,


Variation Tolerant Circuit Synthesis Using
Critical Path Isolation (CRISTA)

Ghosh, Bhunia, Roy -- ICCAD 2006


38
Razor Approach
Standard Latch

D
Q
CLK

Shadow Latch

E
Delay
RAZOR: Dan Ernst et. al., MICRO 2003.

• Post-Silicon technique for dynamic supply scaling and


timing error detection/correction
• Error correction overhead is 1% for a 10% error rate
39
Vdd Scaling and Process Tolerance:
Conventional Solutions
• Low power:
– Reduce the supply voltage
• Error rate increases
– Dual-Vt/dual-VDD assignment
• Number of critical paths increases

• Robustness:
– Increase supply voltage
• Power dissipation increases
– Upsize the gates
• Switching capacitance increases

Low power and robustness: conflicting


requirements
40

CRISTA: Basic Idea


evaluate based
evaluate on prediction
Tc 2Tc

CLK

VDD=1V

VDD<1V critical
non-critical
path path
activation
activation

• Important points:
– Scale down the supply while making delay failures
predictable
– Avoid the failures by adaptive clock stretching
– Ensure that critical paths are activated rarely
41

Design Considerations for CRISTA


Design A: conventional design
Design B: proposed design
predictable and restricted to a
logic section having low activation
CLK
probability
Number of paths

Tc
VDD=1V
Design A
S1 VDD=1V
Design B
S2+S
S31
S2 S3
VDD<1V
path delay
Design B

• Few predictable critical paths


• Low activation probability of critical paths
• Slack between critical and non-critical paths under variations
42

Case Study: Adder


P0 G1 P1 G1 P2 G2 P3 G3
Ci,0 Co,0 Co,1 Co,2 Co,3
FA FA FA FA

• Interesting features:
– Single critical path (activated by P0P1P2P3=1 & Ci,0=1)
– Low activation probability of critical path
VDD = 1V, TCLK = 260ps VDD = 0.8V, TCLK = 260ps

Crit. path delay=260ps Crit. path delay=330ps


longest non-crit. path delay=165ps longest non-crit. path delay=260ps
P = 13uW (1-cycle) P = 7.4uW (rare 2-cycles, decoder)

44% power saving by reducing voltage and, operating


critical path at 2-cycle and other paths at 1-cycle

Can we apply same technique to any random logic?


43
Carry Select Adder
Long latency path (LLP)
LATENCY
Short latency path (SLP1) Ai : i-bit Adder,
PREDICTOR Mk: k-stage MUX
Short latency path (SLP2) BLOCK

A5 A4 A3 A2 A2 A5 A4 A3 A2 A2

Co0

Ci0=0
Cout
M10 M9 M8 M7 M6 M5 M4 M3 M2 M1

Ci1=1
Co1

Cin= 0
A5 A4 A3 A2 A2 A5 A4 A3 A2 A2

Stage 10 Stage 6 Stage 1

• ~20% power saving with ~6% area overhead


44
Carry Save Multiplier
Vector Merging Adder
Critical Path
Longest off-critical path
HA HA HA

FA FA FA

LATENCY FA FA FA
PREDICTOR
BLOCK
FA FA HA

• 25% power saving with ~5% area overhead


45
Wallace Tree Multiplier
Partial Products
Full Adders
Half Adder
Vector Merging Adder Stage 1
Critical path
Longest off-critical path

Stage 2

Stage 3

Vector Merging Adder


Final Product

• 29% power saving with ~4% area overhead


46
Simulation Results
Ripple Carry Adder Carry Save Multiplier

45 ISO Yield = 92% 11 25 ISO Yield = 90% 10

% Area overhead
% Power savings

% Power savings

%Area overhead
40 10 20 9
9
35 15
8 8
30 10
7
25 5 7
6
20 5 0 6
12 bits 16 bits 32 bits 12 bits 16 bits 32 bits

Wallace Tree Multiplier Performance penalty (WTM)

% Throughput penalty
14 4.4
29 7.5 % Throughput penalty
ISO Yield = 96%

% Area Overhead
%Area overhead
12 4.3
%Power savings

% Area Overhead
6.5 10 4.2
28
5.5 8 4.1
6 4
4.5
27 4 3.9
3.5
2 3.8
26 2.5 0 3.7
8 bits 12 bits 16 bits 6 bits 8 bits 10 bits 12 bits 20 bits
47
Random Logic: Shannon’s Expansion
f ( x1,..., xi ,..., xn ) = xi • f ( x1,..., xi = 1,..., xn ) + xi' • f ( x1,..., xi = 0,..., xn )
= xi • CF1 + xi' • CF2
CF1 = f ( x1,..., xi = 1,..., xn ); CF2 = f ( x1,..., xi = 0,..., xn )

f1
CF1(xi=1)
CF11
Prob =50% f
f1
MUX
Prob =25%

MUX
CF2(xi=0) xi CF12
f2
xj
Prob =50% Prob =25%

inputs
Activation probability of cofactors can be reduced
How to choose Control Variable ?
48

Further Isolation and Slack Creation by Sizing

CF32
CF11 CF53

MUX Network
CF42
Original PO
Circuit CF63

CF21

Inputs Inputs

• Slack creation strategy


– Lagrangian Relaxation based sizing (B.C. Paul et. al., DAC 2004) is
used
– Non-critical paths are selectively made faster
– Critical paths are slightly slowed down
49
Simulation Results
MCNC benchmarks, 70nm Process MCNC benchmarks, 70nm Process
7.0
100 % imp in power with switching activity = 0.2 Original design
6.0
% imp in power with switching activity = 0.5
Proposed design
80
% Imp. in power

5.0

Area (x103)um^2
60 4.0

3.0
40
2.0
20
1.0

0 0
cht sct pcle mux decod cm150a x2 alu2 count cht sct pcle mux decod cm150a x2 alu2 count

Power Area
• Average power saving = ~50%
• Average area overhead = 18%
• Avg performance penalty=5.9% (with 4 control variables) for signal
prob=0.5
50
Two-Stage Pipeline with Test Logic
Low Power Robust Pipeline
Stalling
CLK Logic gclk
freeze
TM1
TM2
Pre- Pre- GDS Layout
LFSR decoder decoder Test logic
Regular
pipeline
● Proposed
pipeline
fixed Clock
Test logic
vectors●
ahead Adder

Comparator
generator
Carry-Look-
4:1 mux

● SFFs

SFFs
SFFs

VDDm ● Outputs

Power measurement of proposed pipeline


● ● ● ●
TM1TM2 CLK Conventional Pipeline
● 20% reduction
in conventional
ahead Adder

Comparator
Carry-Look-

pipeline
2:1 Mux

SFFs
SFFs

SFFs

fixed 40% extra


vectors reduction using
CRISTA
TM1 Outputs
VDDo
● ● ● ●

~40%
~40% power
power saving
saving with
with ~13%
~13% performance
performance penalty
penalty
VDD Scaling, Process Variation, and Quality
Trade-off: DCT

Banerjee, Karakonstantis, Roy


Design Automation and Test in Europe (DATE) 2007 51
Basic Idea
• All computations are “not equally important” for
determining outputs
• Identify important and unimportant
computations based on output “sensitivity”
• Compute important computations with “higher
priority”
• Delay errors due to variations/ Vdd scaling
“affect only” non-important computations
• “Gradual degradation” in output with voltage
scaling and process variations
52
DCT Based Image Compression Process
×8 blocks

Source image X
JPEG Encoder Block Diagram
T• Z • T '
Round
Q
Compressed
Z V
Entropy Image Data
FDCT Quantizer
Encoder

×512 image
512× Z = T• X • T '

X W Transpose Y Z
1D DCT 1D DCT
Memory

• DCT is used in current international image/video coding


standards
- JPEG, MPEG, H.261, H.263 53
Energy Distribution of a 2D-DCT Output
1 2 6 7 15 16 28 29

3 5 8 14 17 27 30 43

4 9 13 18 26 31 42 44

10 12 19 25 32 41 45 54

11 20 24 33 40 46 53 55

21 23 34 39 47 52 56 61

22 35 38 48 51 57 60 62

36 37 49 50 58 59 63 64

High energy components (important outputs 75% energy)


Low energy components (less important outputs)

Can important components be computed


with higher priority ?
54
Design Methodology
x0 w0 w8 w16 w24 w32 w40 w48 w56
x1 w1
x2 [Link] w2 Faster
x3 w3 Computation
1D-DCT w4
x4
x5 w5 Slower
x6 w6 Computation
x7 w7

(a) Input Block


(b) 1D- intermediate DCT outputs
Computation

z0
Slower Computation

y0

Faster Computation

Slower Computation
Faster

z1 y1
z2 y2
z3 y3
z4 1D-DCT
y4
y5
y6
y7

(d) Final DCT outputs (c) Transpose Memory 55


Path Delays for 1D-DCT outputs w2
w0 ( x0 + x7 ) • e
(x0+ x7)•d ( x3 + x 4 ) • e
+
(2 adders ─ (3 adders
(x3+ x4)•d delay) ( x1 + x6 ) • f delay)
( x0 + x7 ) • f
(x2+ x5)•d ( x 2 + x5 ) • f
(x1+ x6)•d + w4 ( x3 + x 4 ) • f
w6
<< ─
(3 adders
delay)
( x3 + x 4 ) • f
<< ─+ ─
(4 adders
( x1 + x6 ) • f +
<< ─+ delay)
( x 2 + x5 ) • e +
( x1 + x6 ) • e ─
(x0- x7 ) • a w1
(x1- x6) • a (3 adders ( x0 - x7 ) • a w3
( x1- x6) • e delay)
( x2 - x5 ) • a ─
+
(3 adders
(x2 - x5) •e ( x0 - x7 ) • e + delay)
(x1- x6) • f ( x3 - x4 ) • e ─
<< ─ ( x0 - x7 ) • f
(x3- x4) • f + << ─
>> + w7 ( x2 - x5 ) • f
>> +
(x2- x5) • e w5

(x2- x5) • a
<< ─ (4 adders
delay)
( x3 - x4 ) • e
<< (4 adders
+ ( x3- x4 ) • a
( x3- x4) • a ─ + delay)
( x1- x6 ) • a ─
( x2- x5) • f ( x3 - x4 ) • f
<< ─ << ─
(x0- x7 ) • f + ( x2 - x5 ) • f +
>> >> 56
Proposed DCT under Vdd scaling
Proposed Design with high/low delay paths Scaled Vdd: Longer paths under Vdd scaling

w0 w0
w1 Important w1
Computations w2 D1
w2
w3 w3 @Vdd2
w4 Delay=D1 w4
w5 w5 D2 >D1
Paths Not
@ Vdd1
w6 Longer w6 @Vdd2 Computed
Delays
w7 w7

Extreme Scaled Vdd: Shorter paths affected

Only DC w0 D1 @Vdd3
Vdd3 < Vdd2 < Vdd1(nominal)
component w1
w2 D3 > D1
w3 @Vdd3
Paths Not
w4
Computed
w5 D4 >D1
w6 @Vdd3
w7

57
1D-DCT Path Delay Comparisons
4
Conventional DCT Proposed DCT
3.5
3
Delay(ns)

2.5
2
1.5
1
0.5
0
Path1(w0)

Path2(w1)

Path3(w2)

Path4(w3)

Path5(w4)

Path6(w5)

Path7(w6)

Path8(w7)
Computation Paths
58
Effect of Vdd Scaling
Different Architectures at Nominal Voltage

Convention CSHM DCT Proposed CSHM DCT DCT Proposed


al WTM DCT (2 alphabet) DCT 1.0V
(2 alphabets) with WTM DCT

1.0 V Power (mW) 25.1 29.8 26


Delay (ns) 3.2 3.64 3.57
Area (um2) 80490 108738 90337
PSNR (dB) 21.97 33.23 33.22
FAILS FAILS
0.9 V
Proposed Architecture at Reduced Voltage

Proposed DCT
Proposed DCT
FAILS FAILS Vdd=0.8V
0.8 V Vdd=0.9V
Power (mW) 17.53(41.2%) 11.09(62.8%)
PSNR (dB) 29 23.41

• Graceful degradation of proposed DCT architecture


under Vdd scaling ( Vdd can be scaled to 0.75V)
• Conventional architectures fails 59
Temporal Degradation: NBTI

Kang, Roy, et. al. – TCAD, DAC-07


Temporal Reliability Issues
in CMOS Technology
• HCI – Hot Carrier Injection
• NBTI – Negative Bias Temperature Instability
 Increase in VT of PMOS with time
 The dominant reliability factors in
scaled tech.
• TDDB, etc.
NBTI: Negative Bias Temperature Instability

Interface trap generation


due to Si-H bond breaking

• Interface trap (NIT) generation at the channel interface due to the


Si-H bond breaking, when negative gate bias is applied
• With time, VT increases, subthreshold slope (S) increases,
mobility degrades,
• Drive current (IDS) reduces and affect the PMOS speed
• Overall reduces the lifetime of PMOS
NBTI: Experimental Data
-1
10
Experimental
Simulation
[V]

-2
10
Th
V

~t0.17 trend line

-3
10 0 1 2 3 4
10 10 10 10 10
Stress time [s]

• PMOS VT degrades as a power of time due to NBTI


• Fixed exponent of 1/6 matches the simulation data*
* V. Huard and M. Denais, IRPS 2004
Power-law VT degradation Model
NH dNIT
Reaction rate: = kF [N0 − NIT ] − kR NITNH(0) ≈ 0
NH(0) dt

kF
DH ⋅ t ⋅ [N0 − NIT ] =NIT ⋅ NH(0)
kR

DH ⋅t
y
0 NIT ( t ) = ∫ NH( 0) ( y,t ) ⋅ dy
Conservation of
hydrogen:
Distance into oxide 0

1
NIT = NH( 0) ⋅ DHt
2
q ⋅ ∆N IT kFN0
∆VT = NIT ( t ) = ( DHt ) 4
1

COX 2kR

NBTI degrades in time of exponent 1/6


Mobility degradation factor

• Mobility degradation due to NBTI is expressed in


an additional VT shift, noted as m

• Overall temporal VT shift model is expressed as,

 Eox 
 
⋅ t 0.25
E
 0 
qχ Eox e
∆VT = (1 + m )
COX
Impact of NBTI on circuit
performance
Circuit Performance Degradation
2
10
Vt
Inverter Delay
% change 1
10 ROSC delay

0
10

-1
10

-2
10 0 5 9
10 10 10
time (s)

• Performance (delay) degradation also follows the power


trend with same 0.17 exponent
• In CMOS logic, only the rising (L2H) delay’s are affected
Circuit Performance degradation cont.
2
10
Si = 1 (worst case)
Si < 1 (with activity)
1
% Change 10 Vth

0
10

-1
10

-2
10 0 2 4 6 8
10 10 10 10 10
Time (s)
• Delay degradation in ISCAS c432
• Activity factor (switching activity) does not affect much on the
delay degradation
 In reality, activity factor’s are balanced in the normal operations
Design method considering the
NBTI degradation
NBTI-aware design method

Reduced lifetime due to NBTI degradation

NBTI-aware over-design
Required lifetime of the design
max ckt. delay

Delay Constraint

time

• Over-design is required to guarantee a lifetime stability


of the circuit
• LR sizing is used to optimize the circuit
 Size the circuit considering the worst-case VT degradation over the
lifetime
LR Sizing considering NBTI
1. Delay Constraint (DMAX) Lifetime Constraint
2. Required Lifetime (TLife) A new design constraint

Calibrate switching activity’s


(Si) in each node

Compute VT shift in each node


Power-law VT model
considering Si

LR sizing with delay constraint Optimal sizing from


DMAX Lagrangian Relaxation*

NBTI-aware Design

Guarantee a lifetime stability


under NBTI degradations
* C. Chen et. al., TCAD 1999
Simulation results

1. Delay degradation in ISCAS85 benchmark circuits after 10 years

No. of Nominal % delay degrad. (10 yrs)


Circuit
Trans. delay (ps) Si = 1 Si < 1
c432 590 525 8.90 7.32
c499 1816 368 9.20 8.06
c1908 1582 513.5 9.18 8.53
c3540 3638 597.3 9.00 7.86
c74181 372 194.6 9.89 8.68
c74182 92 77.2 10.35 9.63
c74283 188 131.9 7.90 6.83
c74L85 148 115.1 9.50 7.60
* All benchmarks are synthesized in BPTM 70nm technology
Simulation results cont.

2. Area overhead in NBTI-aware sizing

Nominal Nominal % Area overhead


Circuit
delay (ps) area (um) Si = 1 Si < 1
c432 385 196.7 14.8 13.6
c499 340 581.47 7.82 6.71
c1908 470 489.67 7.13 6.68
c3540 500 1146.5 3.44 3.31
c74181 180 111.1 9.45 9.0
c74182 80 31.1 11.3 11.2
c74283 125 66.71 10.0 10.0
c74L85 120 42.59 5.85 5.8
* All benchmarks are synthesized in BPTM 70nm technology
Negative Bias Temperature
V < 0V Instability
GS
H2
H2 H2
H2 H2 TE
GA DE VD = 0V
I
H OX
VS = 0V H H H
x H
x x
x H x H
R AIN
P+ Si Si Si Si P+ D
After NBTI
degradation
n-sub
VBODY = 0V

• PMOS specific Aging Effect


kFN0
( DHt ) 6
• Generation of (+) traps
NIT ( t ) =
1

• Reaction-Diffusion (RD) model* 2kR


• Time exponent ~ 1/6
q ⋅ ∆ N IT
∆ VT =
C OX
*M. A. Alam, IEDM’03
NBTI in Digital Circuits
WL

i1 6 3 o1
1 BLB BL

i2 7 VL
4 2
o2 VR

i3 8 5

Logic Circuits Memory Circuits


• fMAX decreases↓↓
• Timing failure with time • Static Noise Margin (SNM)↓

• Read & Write Stability
• Parametric Yield↓

Temporal VTh increase in PMOS affects critical


performance factors of digital VLSI circuits
NBTI: Random
Delay Degrad. STD cells
Logic Circuits
c2670 8% fMAX decrease
Delay (ps) c5315

fMAX reduction (%)


Logic c1908
fanin Δ (%)
Cell c499
t=0 3 years
c3540
INV 1 13.77 16.77 21.8 c74181
n~1/6
NAND 2 16.86 19.88 17.9
100
NAND 3 19.57 22.45 14.8 PTM 65nm
NOR 2 17.26 21.89 26.8 ISCAS’85
NOR 3 23.80 30.19 26.9
Benchmarks
0 2 4 6 8
10 10 10 10 10
Time (s) 3 years Lifetime

• ISCAS’85 Benchmark Circuits, PTM 65nm


• Gate delay: analytical delay model considering NBTI
• Circuit delay: NBTI-aware Static Timing Analysis (STA)
• Circuit fMAX  time exponent n ~ 1/6
NBTI: 6T SRAM Cell
Static Noise Margin (SNM) Distribution in Write Margin
Degradation in SNM (%)

101 1
PTM 60nm, 125°°C t=0
CR = 1.33 8
0.8 t=10
PR = 0.67 7
t=10
6
t=10

CDF
1/6 trend line 0.6

0.4
WM improves
0.2 with time

100 0
104 106 108 0.43 0.44 0.45 0.46 0.47 0.48 0.49
Time (s) Write Margin (V)

 SNM degrades by more than 10% in 3 years


 % SNM Degradation  time exponent n ~ 1/6
 WM improves with time under NBTI
Design for Reliability under NBTI
800 c1908
10ps
750
 Simulation Setup
Area (um)

Area Saving
700 From TR-based  Synthesized in PTM 65nm
 1/6 VTh degradation model
650 Cell-based  125°°C Stress temperature
 50% Signal Probability at PI’s
600
INIT
TR-Based
550
Cell-Based
580 600 620 640 660 680 700 720
Delay (ps)
 Gate Sizing applied to guarantee lifetime functionality of design
 11.7% overhead for Cell-based sizing
 6.13% overhead for TR-based sizing
 45% improvement in area overhead
 Runtime complexity for TR-based sizing is identical to that of Cell-based sizing
IDDQ based NBTI Characterization
Layout
Microphotograph

Inverter Chain
VDD

Vin

1000 stages
IDDQ Measurement
Technology CMOS 130nm
Die Size 20 (mm2)
I/O Pin 209 • Test Circuit Fabricated
• 1000 stage INV chain
Tox 1.6 (nm)
• DC Stress signal @Vin
VDD 1.2 (V)
• IDDQ measurement @GND
Correlation between IDDQ & fMAX
10 102
IDDQ (Vin=0.0) Vstress = 1.7V @150°°C

% IDDQ (Vin=1.7) degradation


% IDDQ (Vin=0.0) degradation

c2670
IDDQ (Vin=1.7) 8 c5315 Linear relationship

IDDQ reduction (%)


101 c1908
c499
107
6
n > 1/6 c3540
105
c74181
S2 n ~ 1/6 4 101
103
S1 2

MAX – MIN < 1.2% 10


0 75°°C, 65nm PTM
ISCAS Benchmarks
100 -2 100 0
100 101 102 103 104 105
10 fMAX reduction (%)
Time (s)

• DM < 3ms, Temp=125°C, Vstress=1.7V % IDDQ decrease


• IDDQ degradation  n~1/6 during % fMAX increase n~1/6
• Clear signature of NBTI
∆I leak (t ) 1/ 6 ∆f MAX (t )
• Correlation between IDDQ and fMAX can be used Rleak = ∝t ∝ = R freq
to predict circuit performance degradation I leak (0) f MAX (0)
under NBTI
R freq = K × Rleak (K :constant)
IDDQ based Characterization Technique

Design phase
• Circuit-level NBTI Reliability
Initial Characterization Characterization
• Compute Rleak , RfMAX
• Compute KfMAX = Rleak / RfMAX • IDDQ test is used
• Expensive fMAX testing is avoided
(or minimized)
Reliability Extrapolation
• For each IDDQ measurement • Accurate circuit level performance
sample, Rleak is computed degradation can be predicted
•RfMAX = KfMAX X Rleak

Post-silicon phase
• IC specific burn-in to qualify the
•Estimate fMAX degradation target produce
• Efficient way of field monitoring:
Lifetime Projection dynamic local signature of produce
•Project IDDQ using KfMAX usage
• Possible usage in other reliability
sources; HCI
Temp. Dependency

NBTI Characterization Report


Conclusions

 Process Variation and Process


Tolerance is becoming important

 There is a need to optimize designs


considering power/performance/yield

82

You might also like