0% found this document useful (0 votes)

38 views82 pages

Lecture 5

The document discusses process-tolerant low-power design strategies for nano-meter scale electronics, highlighting the exponential increase in leakage power and the need for innovative design methods to mitigate performance degradation. It emphasizes the impact of process variations on device reliability and memory yield, proposing solutions such as self-repairing SRAM arrays and on-die leakage monitoring to enhance design robustness. Additionally, it introduces the CRISTA approach for low-voltage, variation-tolerant circuit synthesis, aiming to balance power efficiency and performance reliability.

Uploaded by

Phan Huong

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views82 pages

Lecture 5

Uploaded by

Phan Huong

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Process-Tolerant Low-Power

Design for the Nano-meter

Regime

Kaushik Roy
Electrical & Computer Engineering
Purdue University
Exponential Increase in Leakage
1970 1980 2000 2010 2020

5 µm 1 µm 100 nm 10 nm

Silicon Silicon Non-Silicon

Micro- electronics Nano- electronics Technology
I ON I ON I ON
= 106 = 103 ~ 102~6
I OFF I OFF I OFF

Leakage Power (% of Total)

Subthreshold Gate Leakage 50%
Leakage Must stop
Gate 40% at 50%
Source Drain 30%
n+ n+
20%
Junction
leakage 10%
A. Grove, IEDM 2002
0%
1.5 0.7 0.35 0.18 0.09 0.05
Bulk
µ)
Technology (µ
Technology Trend
2003
2009
2020

Fully-depleted body

VS
VG Nano devices
Bulk-CMOS Gate VD
DGMOS Carbon nanotube
Source Drain
VS
Gate
VG

VD
Buried Oxide (BOX) III-V devices
Source Floating Body Drain
Substrate
nano-wires
Vback
Spintronics
Buried Oxide (BOX)
FD/SOI
Substrate
FinFET Trigate
PD/SOI
Single gate device Multi-gate devices

Design methods to exploit the advantages of

technology innovations
Variation in Process Parameters
1 .4
Device 1 Device 2

Normalized Frequency
1 .3 30%
1 .2
130nm
1 .1
Source: Intel
1 .0
5X
0 .9
1 2 3 4 5
Channel length N o rm a liz e d L e a k a g e ( Is b )
Delay and Leakage Spread
10000

# dopant atoms
Source: Intel
1000

100

10
1000 500 250 130 65 32
Inter and Intra-die Technology Node (nm)
Variations Random dopant fluctuation
Device parameters are no longer deterministic
Reliability
Temporal degradation of performance -- NBTI
Failure probability

Tech. generation

Time

Defects Life time

degradation
Pessimistic Design Hurts Performance
200
(150nm CMOS Measurements, 110°°C)
Number of dies

150 nominal corner

100
worst-case corner
50

0
0 1 2 3 4 5 7
Normalized IOFF
Substantial variation in leakage across dies
4X variation between nominal and worst-case leakage
Performance determined at nominal leakage
Robustness determined at worst-case leakage
7

Global and Local Variations

Random Dopant Fluctuation

δ Vt − LOCAL

σ LOCAL
intra-die

∆Vt −GLOBAL

σ GLOBAL
inter-die
δ Vt = ∆Vt −GLOBAL + δ Vt − LOCAL
8

Process Tolerance: Memories

S. Mukhopadhaya, Mahmoodi, Roy

VLSI Circuit Symposium 2006, JSSC 2006, TCAD
9
Parametric Failures: Read Failure
WL

Voltage
VL
VTRIPRD WL
∆
+∆ VR=‘0’
VREAD
VR=VREAD
PL PR
AXL AXR
VL=‘1’ ∆
-∆
NL NR Time ->
∆
-∆ ∆
+∆
VR

Voltage
BL BR
WL

PRF = P(VREAD > VTRIPRD )

Time ->
Read failure => Flipping of Cell Data while Reading
10

Parametric Failures in SRAM

PL PR
‘1’ ‘0’
AXL AXR
NL NR

BL BR
High-Vt Low-Vt Test & Repair using
Redundancy
Parametric failures
– Read Failures
– Write Failures
– Access Failures
– Hold Failures
Faulty chips Working chips
Parametric failures can degrade SRAM yield
11

Process Variations in On-chip SRAM

350
Yield ≈ 33%
300
Fault statistics
Chip Count

250

200 σVt ≈ 30mv, using BPTM 45nm technology

150 Simulation of an 64KB Cache
100 A. Agarwal, et. al, JSSC, 05

0
157

262

367
419

524
577

682
0

210

315

472

629

734
52
105

839

944
996
786

890

1049
Number of faulty cells (NFaulty-Cells)

Parametric failures →Yield degradation

Inter-die Variation & Cell Failures

“1” “0” “1” “0”

Low–Vt Corners High–Vt Corners

− Read failure ↑ − Access failure ↑
− Hold failure ↑ − Write failure ↑
σGLOBAL
∆Vth-GLOBAL)
inter-die Vt shift (∆
13
Inter-die Variation & Memory Failure
LVT Nom. Vt HVT

BPTM 70nm Devices

Memory failure probabilities are high when

inter-die shift in process is high
14
Self-Repairing SRAM Array
LVT Nom. Vt HVT
Region A Region B Region C
Region A Region C
LVT Corner HVT Corner

Read & Hold Access & Write

failures dominate failures dominate

Reduce Reduce
RF & HF AF & WF

Reduce the dominant failures at different inter-die

corners to increase width of low failure region
15

Post-Silicon Repair: Proposed Approach

σ σ
LOCAL LOCAL

intra-dieintra-die

σ GLOBAL
inter-die

Apply correction to the global variation to reduce

number of failures due to local variations
16
Self-Repairing SRAM Array
LVT Nom. Vt HVT
Region A Region B Region C

RBB FBB
ZBB

Reduce the dominant failures at different inter-die

corners to increase width of low failure region
17

How to identify the inter-die Vt corner

under a large intra-die variation ?
WL

VDD

BL BR
GND

Monitor circuit parameters, e.g. leakage current

Effect of inter-die variation can be masked by
intra-die variation
18

Array Leakage Monitoring

N σY 1 σX
Y = ∑ X i => =
i =1 µY N µX

• Adding a large number of random variables reduces

the effect of intra-die variation
Leakage of entire SRAM array is a reliable
indicator of the inter-die Vt corner
19
Self-Repair using Leakage Monitoring
V
DD
Bypass
Switch
V V
On-chip V REF1 REF2
out VOUT
Leakage
Monitor
Calibrate SRAM
Signal Comparator ARRAY

SRAM Body
Array bias
Body-Bias
selection

FBB ZBB RBB

Entire array leakage is

monitored to detect inter-die
corner and proper body-bias
is selected
20

Yield Enhancement using Self-Repair

Self-Repairing SRAM using body-bias can

significantly improve design yield
21

Test-Chip of Self-Repairing SRAM

VCO

VCO
16 KB
Isolated cell

block 64 KB
LVT
Array
Sensor + Ref. gen. BB gen

Technology : IBM 0.13 µm

128KB SRAM Simulation results for 1MB
Dual-Vt Triple-well tech. µm
array designed in IBM 0.13µ
Number of Trans: ~ 7 million
Die size: 16mm2
VLSI CKT Symp. 2006, ITC 2005
22

Continuous vs Quantized Body Bias

Quantized (3 Level: FBB, ZBB, RBB) body bias

scheme is a cost effective solution with good
yield enhancement possibility
Process Tolerance: Register Files

Kim et. al. VLSI Circuit Symposium 2004

Process Compensating Dynamic Circuit
Technology
Conventional Static Keeper

clk

LBL0
N0
RS0 RS1 RS7

... LBL1
D0 D1 D7

Keeper upsizing degrades average performance

Process Compensating Dynamic Circuit
Technology
3-bit programmable keeper
b[2:0]

W s 2W s 4W s
clk

LBL0
N0
RS0 RS1 RS7

... LBL1
D0 D1 D7

C. Kim et al. , VLSI Circuits Symp. ‘03

Opportunistic speedup via keeper downsizing

Robustness Squeeze
250 Noise Conventional
floor This work
Number of dies

200

150

100
saved
50 dies

0
0.7 0.8 0.9 1.0 1.1 1.2
Normalized DC robustness

5X reduction in robustness failing dies

Delay Squeeze
300
Conventional
250 This work
Number of dies

200 PCD μ = 0.90

Conv. μ = 1.00
150
μ : avg. delay
100

0
0.8 0.9 1.0 1.1 1.2
Normalized delay
10% opportunistic speedup
On-Die Leakage Sensor For Measuring
Process Variation

current
reference
VBIAS

mirrors
current
μm

gen.
compa
73μ

rators NMOS
device

test interface

μm
83μ

C. Kim et al. , VLSI Circuits Symp. ‘04

High leakage sensing gain – 90nm dual-Vt,

Vdd=1.2V, 7 level resolution, 0.66 mW @80Cº
Leakage Binning Results

001 010 011 100 101 110 111

Output codes from leakage sensor

Self-Contained Process Compensation
Fab Wafer test
Process detection
Leakage measurement Program
PCD
using
On-die leakage sensor fuses

Customer Package test Burn in Assembly

Self-Repair: Architecture Level

Agarawal, Roy TVLSI 2005

Fault-Tolerant Cache Architecture

Faulty

BIST detects the faulty blocks

Config Storage stores the fault information
Idea is to resize the cache to avoid faulty blocks
during regular operation
Mapping Issue

More than one INDEX TAG INDEX Off Include column

are mapped to same Column Address address bits into
block Off TAG bits
INDEX
New TAG

Resizing is transparent to processor → same memory address

Fault Tolerant Capability
350
Chip Count (Nchip)

Fault statistics
300
Chips saved by the proposed + redundancy (R=8, r=3)
250
Chips saved by ECC + redundancy ( R=16)
200
More number of saved chips
150 as compare to ECC ECC fails to save
100 any chips
50

0
0 105 210 315 419 524 629 734 839 944 1049

NFaulty-Cells
Proposed architecture can handle more number of
faulty cells than ECC, as high as 890 faulty cells
Saves more number of chips than ECC for a given
NFaulty-Cells
CPU Performance Loss
2.5

% CPU Performance Loss

For a 64K cache
2.0 averaged over SPEC
2000 benchmarks
1.5

1.0

0.5

0.0
0 105 210 315 419 524 629 734 839
NFaulty-Cells

Increase in miss rate due to downsizing of cache

Average CPU performance loss over all SPEC 2000
benchmarks for a cache with 890 faulty cells is ~ 2%
Logic: Process Tolerance
37

Logic: A New Paradigm for Low-Voltage,

Variation Tolerant Circuit Synthesis Using
Critical Path Isolation (CRISTA)

Ghosh, Bhunia, Roy -- ICCAD 2006

38
Razor Approach
Standard Latch

D
Q
CLK

Shadow Latch

E
Delay
RAZOR: Dan Ernst et. al., MICRO 2003.

• Post-Silicon technique for dynamic supply scaling and

timing error detection/correction
• Error correction overhead is 1% for a 10% error rate
39
Vdd Scaling and Process Tolerance:
Conventional Solutions
• Low power:
– Reduce the supply voltage
• Error rate increases
– Dual-Vt/dual-VDD assignment
• Number of critical paths increases

• Robustness:
– Increase supply voltage
• Power dissipation increases
– Upsize the gates
• Switching capacitance increases

Low power and robustness: conflicting

requirements
40

CRISTA: Basic Idea

evaluate based
evaluate on prediction
Tc 2Tc

CLK

VDD=1V

VDD<1V critical
non-critical
path path
activation
activation

• Important points:
– Scale down the supply while making delay failures
predictable
– Avoid the failures by adaptive clock stretching
– Ensure that critical paths are activated rarely
41

Design Considerations for CRISTA

Design A: conventional design
Design B: proposed design
predictable and restricted to a
logic section having low activation
CLK
probability
Number of paths

Tc
VDD=1V
Design A
S1 VDD=1V
Design B
S2+S
S31
S2 S3
VDD<1V
path delay
Design B

• Few predictable critical paths

• Low activation probability of critical paths
• Slack between critical and non-critical paths under variations
42

Case Study: Adder

P0 G1 P1 G1 P2 G2 P3 G3
Ci,0 Co,0 Co,1 Co,2 Co,3
FA FA FA FA

• Interesting features:
– Single critical path (activated by P0P1P2P3=1 & Ci,0=1)
– Low activation probability of critical path
VDD = 1V, TCLK = 260ps VDD = 0.8V, TCLK = 260ps

Crit. path delay=260ps Crit. path delay=330ps

longest non-crit. path delay=165ps longest non-crit. path delay=260ps
P = 13uW (1-cycle) P = 7.4uW (rare 2-cycles, decoder)

44% power saving by reducing voltage and, operating

critical path at 2-cycle and other paths at 1-cycle

Can we apply same technique to any random logic?

43
Carry Select Adder
Long latency path (LLP)
LATENCY
Short latency path (SLP1) Ai : i-bit Adder,
PREDICTOR Mk: k-stage MUX
Short latency path (SLP2) BLOCK

A5 A4 A3 A2 A2 A5 A4 A3 A2 A2

Co0

Ci0=0
Cout
M10 M9 M8 M7 M6 M5 M4 M3 M2 M1

Ci1=1
Co1

Cin= 0
A5 A4 A3 A2 A2 A5 A4 A3 A2 A2

Stage 10 Stage 6 Stage 1

• ~20% power saving with ~6% area overhead

44
Carry Save Multiplier
Vector Merging Adder
Critical Path
Longest off-critical path
HA HA HA

FA FA FA

LATENCY FA FA FA
PREDICTOR
BLOCK
FA FA HA

• 25% power saving with ~5% area overhead

45
Wallace Tree Multiplier
Partial Products
Full Adders
Half Adder
Vector Merging Adder Stage 1
Critical path
Longest off-critical path

Stage 2

Stage 3

Vector Merging Adder

Final Product

• 29% power saving with ~4% area overhead

46
Simulation Results
Ripple Carry Adder Carry Save Multiplier

45 ISO Yield = 92% 11 25 ISO Yield = 90% 10

% Area overhead
% Power savings

% Power savings

%Area overhead
40 10 20 9
9
35 15
8 8
30 10
7
25 5 7
6
20 5 0 6
12 bits 16 bits 32 bits 12 bits 16 bits 32 bits

Wallace Tree Multiplier Performance penalty (WTM)

% Throughput penalty
14 4.4
29 7.5 % Throughput penalty
ISO Yield = 96%

% Area Overhead
%Area overhead
12 4.3
%Power savings

% Area Overhead
6.5 10 4.2
28
5.5 8 4.1
6 4
4.5
27 4 3.9
3.5
2 3.8
26 2.5 0 3.7
8 bits 12 bits 16 bits 6 bits 8 bits 10 bits 12 bits 20 bits
47
Random Logic: Shannon’s Expansion
f ( x1,..., xi ,..., xn ) = xi • f ( x1,..., xi = 1,..., xn ) + xi' • f ( x1,..., xi = 0,..., xn )
= xi • CF1 + xi' • CF2
CF1 = f ( x1,..., xi = 1,..., xn ); CF2 = f ( x1,..., xi = 0,..., xn )

f1
CF1(xi=1)
CF11
Prob =50% f
f1
MUX
Prob =25%

MUX
CF2(xi=0) xi CF12
f2
xj
Prob =50% Prob =25%

inputs
Activation probability of cofactors can be reduced
How to choose Control Variable ?
48

Further Isolation and Slack Creation by Sizing

CF32
CF11 CF53

MUX Network
CF42
Original PO
Circuit CF63

CF21

Inputs Inputs

• Slack creation strategy

– Lagrangian Relaxation based sizing (B.C. Paul et. al., DAC 2004) is
used
– Non-critical paths are selectively made faster
– Critical paths are slightly slowed down
49
Simulation Results
MCNC benchmarks, 70nm Process MCNC benchmarks, 70nm Process
7.0
100 % imp in power with switching activity = 0.2 Original design
6.0
% imp in power with switching activity = 0.5
Proposed design
80
% Imp. in power

5.0

Area (x103)um^2
60 4.0

3.0
40
2.0
20
1.0

0 0
cht sct pcle mux decod cm150a x2 alu2 count cht sct pcle mux decod cm150a x2 alu2 count

Power Area
• Average power saving = ~50%
• Average area overhead = 18%
• Avg performance penalty=5.9% (with 4 control variables) for signal
prob=0.5
50
Two-Stage Pipeline with Test Logic
Low Power Robust Pipeline
Stalling
CLK Logic gclk
freeze
TM1
TM2
Pre- Pre- GDS Layout
LFSR decoder decoder Test logic
Regular
pipeline
● Proposed
pipeline
fixed Clock
Test logic
vectors●
ahead Adder

Comparator
generator
Carry-Look-
4:1 mux

● SFFs

SFFs
SFFs

VDDm ● Outputs

Power measurement of proposed pipeline

● ● ● ●
TM1TM2 CLK Conventional Pipeline
● 20% reduction
in conventional
ahead Adder

Comparator
Carry-Look-

pipeline
2:1 Mux

SFFs
SFFs

SFFs

fixed 40% extra

vectors reduction using
CRISTA
TM1 Outputs
VDDo
● ● ● ●

~40%
~40% power
power saving
saving with
with ~13%
~13% performance
performance penalty
penalty
VDD Scaling, Process Variation, and Quality
Trade-off: DCT

Banerjee, Karakonstantis, Roy

Design Automation and Test in Europe (DATE) 2007 51
Basic Idea
• All computations are “not equally important” for
determining outputs
• Identify important and unimportant
computations based on output “sensitivity”
• Compute important computations with “higher
priority”
• Delay errors due to variations/ Vdd scaling
“affect only” non-important computations
• “Gradual degradation” in output with voltage
scaling and process variations
52
DCT Based Image Compression Process
×8 blocks
8×
Source image X
JPEG Encoder Block Diagram
T• Z • T '
Round
Q
Compressed
Z V
Entropy Image Data
FDCT Quantizer
Encoder

×512 image
512× Z = T• X • T '

X W Transpose Y Z
1D DCT 1D DCT
Memory

• DCT is used in current international image/video coding

standards
- JPEG, MPEG, H.261, H.263 53
Energy Distribution of a 2D-DCT Output
1 2 6 7 15 16 28 29

3 5 8 14 17 27 30 43

4 9 13 18 26 31 42 44

10 12 19 25 32 41 45 54

11 20 24 33 40 46 53 55

21 23 34 39 47 52 56 61

22 35 38 48 51 57 60 62

36 37 49 50 58 59 63 64

High energy components (important outputs 75% energy)

Low energy components (less important outputs)

Can important components be computed

with higher priority ?
54
Design Methodology
x0 w0 w8 w16 w24 w32 w40 w48 w56
x1 w1
x2 [Link] w2 Faster
x3 w3 Computation
1D-DCT w4
x4
x5 w5 Slower
x6 w6 Computation
x7 w7

(a) Input Block

(b) 1D- intermediate DCT outputs
Computation

z0
Slower Computation

Faster Computation

Slower Computation
Faster

z1 y1
z2 y2
z3 y3
z4 1D-DCT
y4
y5
y6
y7

(d) Final DCT outputs (c) Transpose Memory 55

Path Delays for 1D-DCT outputs w2
w0 ( x0 + x7 ) • e
(x0+ x7)•d ( x3 + x 4 ) • e
+
(2 adders ─ (3 adders
(x3+ x4)•d delay) ( x1 + x6 ) • f delay)
( x0 + x7 ) • f
(x2+ x5)•d ( x 2 + x5 ) • f
(x1+ x6)•d + w4 ( x3 + x 4 ) • f
w6
<< ─
(3 adders
delay)
( x3 + x 4 ) • f
<< ─+ ─
(4 adders
( x1 + x6 ) • f +
<< ─+ delay)
( x 2 + x5 ) • e +
( x1 + x6 ) • e ─
(x0- x7 ) • a w1
(x1- x6) • a (3 adders ( x0 - x7 ) • a w3
( x1- x6) • e delay)
( x2 - x5 ) • a ─
+
(3 adders
(x2 - x5) •e ( x0 - x7 ) • e + delay)
(x1- x6) • f ( x3 - x4 ) • e ─
<< ─ ( x0 - x7 ) • f
(x3- x4) • f + << ─
>> + w7 ( x2 - x5 ) • f
>> +
(x2- x5) • e w5

(x2- x5) • a
<< ─ (4 adders
delay)
( x3 - x4 ) • e
<< (4 adders
+ ( x3- x4 ) • a
( x3- x4) • a ─ + delay)
( x1- x6 ) • a ─
( x2- x5) • f ( x3 - x4 ) • f
<< ─ << ─
(x0- x7 ) • f + ( x2 - x5 ) • f +
>> >> 56
Proposed DCT under Vdd scaling
Proposed Design with high/low delay paths Scaled Vdd: Longer paths under Vdd scaling

w0 w0
w1 Important w1
Computations w2 D1
w2
w3 w3 @Vdd2
w4 Delay=D1 w4
w5 w5 D2 >D1
Paths Not
@ Vdd1
w6 Longer w6 @Vdd2 Computed
Delays
w7 w7

Extreme Scaled Vdd: Shorter paths affected

Only DC w0 D1 @Vdd3
Vdd3 < Vdd2 < Vdd1(nominal)
component w1
w2 D3 > D1
w3 @Vdd3
Paths Not
w4
Computed
w5 D4 >D1
w6 @Vdd3
w7

57
1D-DCT Path Delay Comparisons
4
Conventional DCT Proposed DCT
3.5
3
Delay(ns)

2.5
2
1.5
1
0.5
0
Path1(w0)

Path2(w1)

Path3(w2)

Path4(w3)

Path5(w4)

Path6(w5)

Path7(w6)

Path8(w7)
Computation Paths
58
Effect of Vdd Scaling
Different Architectures at Nominal Voltage

Convention CSHM DCT Proposed CSHM DCT DCT Proposed

al WTM DCT (2 alphabet) DCT 1.0V
(2 alphabets) with WTM DCT

1.0 V Power (mW) 25.1 29.8 26

Delay (ns) 3.2 3.64 3.57
Area (um2) 80490 108738 90337
PSNR (dB) 21.97 33.23 33.22
FAILS FAILS
0.9 V
Proposed Architecture at Reduced Voltage

Proposed DCT
Proposed DCT
FAILS FAILS Vdd=0.8V
0.8 V Vdd=0.9V
Power (mW) 17.53(41.2%) 11.09(62.8%)
PSNR (dB) 29 23.41

• Graceful degradation of proposed DCT architecture

under Vdd scaling ( Vdd can be scaled to 0.75V)
• Conventional architectures fails 59
Temporal Degradation: NBTI

Kang, Roy, et. al. – TCAD, DAC-07

Temporal Reliability Issues
in CMOS Technology
• HCI – Hot Carrier Injection
• NBTI – Negative Bias Temperature Instability
Increase in VT of PMOS with time
The dominant reliability factors in
scaled tech.
• TDDB, etc.
NBTI: Negative Bias Temperature Instability

Interface trap generation

due to Si-H bond breaking

• Interface trap (NIT) generation at the channel interface due to the

Si-H bond breaking, when negative gate bias is applied
• With time, VT increases, subthreshold slope (S) increases,
mobility degrades,
• Drive current (IDS) reduces and affect the PMOS speed
• Overall reduces the lifetime of PMOS
NBTI: Experimental Data
-1
10
Experimental
Simulation
[V]

-2
10
Th
V

~t0.17 trend line

-3
10 0 1 2 3 4
10 10 10 10 10
Stress time [s]

• PMOS VT degrades as a power of time due to NBTI

• Fixed exponent of 1/6 matches the simulation data*
* V. Huard and M. Denais, IRPS 2004
Power-law VT degradation Model
NH dNIT
Reaction rate: = kF [N0 − NIT ] − kR NITNH(0) ≈ 0
NH(0) dt

kF
DH ⋅ t ⋅ [N0 − NIT ] =NIT ⋅ NH(0)
kR

DH ⋅t
y
0 NIT ( t ) = ∫ NH( 0) ( y,t ) ⋅ dy
Conservation of
hydrogen:
Distance into oxide 0

1
NIT = NH( 0) ⋅ DHt
2
q ⋅ ∆N IT kFN0
∆VT = NIT ( t ) = ( DHt ) 4
1

COX 2kR

NBTI degrades in time of exponent 1/6

Mobility degradation factor

• Mobility degradation due to NBTI is expressed in

an additional VT shift, noted as m

• Overall temporal VT shift model is expressed as,

 Eox 
 
⋅ t 0.25
E
 0 
qχ Eox e
∆VT = (1 + m )
COX
Impact of NBTI on circuit
performance
Circuit Performance Degradation
2
10
Vt
Inverter Delay
% change 1
10 ROSC delay

0
10

-1
10

-2
10 0 5 9
10 10 10
time (s)

• Performance (delay) degradation also follows the power

trend with same 0.17 exponent
• In CMOS logic, only the rising (L2H) delay’s are affected
Circuit Performance degradation cont.
2
10
Si = 1 (worst case)
Si < 1 (with activity)
1
% Change 10 Vth

0
10

-1
10

-2
10 0 2 4 6 8
10 10 10 10 10
Time (s)
• Delay degradation in ISCAS c432
• Activity factor (switching activity) does not affect much on the
delay degradation
In reality, activity factor’s are balanced in the normal operations
Design method considering the
NBTI degradation
NBTI-aware design method

Reduced lifetime due to NBTI degradation

NBTI-aware over-design
Required lifetime of the design
max ckt. delay

Delay Constraint

time

• Over-design is required to guarantee a lifetime stability

of the circuit
• LR sizing is used to optimize the circuit
Size the circuit considering the worst-case VT degradation over the
lifetime
LR Sizing considering NBTI
1. Delay Constraint (DMAX) Lifetime Constraint
2. Required Lifetime (TLife) A new design constraint

Calibrate switching activity’s

(Si) in each node

Compute VT shift in each node

Power-law VT model
considering Si

LR sizing with delay constraint Optimal sizing from

DMAX Lagrangian Relaxation*

NBTI-aware Design

Guarantee a lifetime stability

under NBTI degradations
* C. Chen et. al., TCAD 1999
Simulation results

1. Delay degradation in ISCAS85 benchmark circuits after 10 years

No. of Nominal % delay degrad. (10 yrs)

Circuit
Trans. delay (ps) Si = 1 Si < 1
c432 590 525 8.90 7.32
c499 1816 368 9.20 8.06
c1908 1582 513.5 9.18 8.53
c3540 3638 597.3 9.00 7.86
c74181 372 194.6 9.89 8.68
c74182 92 77.2 10.35 9.63
c74283 188 131.9 7.90 6.83
c74L85 148 115.1 9.50 7.60
* All benchmarks are synthesized in BPTM 70nm technology
Simulation results cont.

2. Area overhead in NBTI-aware sizing

Nominal Nominal % Area overhead

Circuit
delay (ps) area (um) Si = 1 Si < 1
c432 385 196.7 14.8 13.6
c499 340 581.47 7.82 6.71
c1908 470 489.67 7.13 6.68
c3540 500 1146.5 3.44 3.31
c74181 180 111.1 9.45 9.0
c74182 80 31.1 11.3 11.2
c74283 125 66.71 10.0 10.0
c74L85 120 42.59 5.85 5.8
* All benchmarks are synthesized in BPTM 70nm technology
Negative Bias Temperature
V < 0V Instability
GS
H2
H2 H2
H2 H2 TE
GA DE VD = 0V
I
H OX
VS = 0V H H H
x H
x x
x H x H
R AIN
P+ Si Si Si Si P+ D
After NBTI
degradation
n-sub
VBODY = 0V

• PMOS specific Aging Effect

kFN0
( DHt ) 6
• Generation of (+) traps
NIT ( t ) =
1

• Reaction-Diffusion (RD) model* 2kR

• Time exponent ~ 1/6
q ⋅ ∆ N IT
∆ VT =
C OX
*M. A. Alam, IEDM’03
NBTI in Digital Circuits
WL

i1 6 3 o1
1 BLB BL

i2 7 VL
4 2
o2 VR

i3 8 5

Logic Circuits Memory Circuits

• fMAX decreases↓↓
• Timing failure with time • Static Noise Margin (SNM)↓
↓
• Read & Write Stability
• Parametric Yield↓
↓

Temporal VTh increase in PMOS affects critical

performance factors of digital VLSI circuits
NBTI: Random
Delay Degrad. STD cells
Logic Circuits
c2670 8% fMAX decrease
Delay (ps) c5315

fMAX reduction (%)

Logic c1908
fanin Δ (%)
Cell c499
t=0 3 years
c3540
INV 1 13.77 16.77 21.8 c74181
n~1/6
NAND 2 16.86 19.88 17.9
100
NAND 3 19.57 22.45 14.8 PTM 65nm
NOR 2 17.26 21.89 26.8 ISCAS’85
NOR 3 23.80 30.19 26.9
Benchmarks
0 2 4 6 8
10 10 10 10 10
Time (s) 3 years Lifetime

• ISCAS’85 Benchmark Circuits, PTM 65nm

• Gate delay: analytical delay model considering NBTI
• Circuit delay: NBTI-aware Static Timing Analysis (STA)
• Circuit fMAX time exponent n ~ 1/6
NBTI: 6T SRAM Cell
Static Noise Margin (SNM) Distribution in Write Margin
Degradation in SNM (%)

101 1
PTM 60nm, 125°°C t=0
CR = 1.33 8
0.8 t=10
PR = 0.67 7
t=10
6
t=10

CDF
1/6 trend line 0.6

0.4
WM improves
0.2 with time

100 0
104 106 108 0.43 0.44 0.45 0.46 0.47 0.48 0.49
Time (s) Write Margin (V)

SNM degrades by more than 10% in 3 years

% SNM Degradation time exponent n ~ 1/6
WM improves with time under NBTI
Design for Reliability under NBTI
800 c1908
10ps
750
Simulation Setup
Area (um)

Area Saving
700 From TR-based Synthesized in PTM 65nm
1/6 VTh degradation model
650 Cell-based 125°°C Stress temperature
50% Signal Probability at PI’s
600
INIT
TR-Based
550
Cell-Based
580 600 620 640 660 680 700 720
Delay (ps)
Gate Sizing applied to guarantee lifetime functionality of design
11.7% overhead for Cell-based sizing
6.13% overhead for TR-based sizing
45% improvement in area overhead
Runtime complexity for TR-based sizing is identical to that of Cell-based sizing
IDDQ based NBTI Characterization
Layout
Microphotograph

Inverter Chain
VDD

Vin

1000 stages
IDDQ Measurement
Technology CMOS 130nm
Die Size 20 (mm2)
I/O Pin 209 • Test Circuit Fabricated
• 1000 stage INV chain
Tox 1.6 (nm)
• DC Stress signal @Vin
VDD 1.2 (V)
• IDDQ measurement @GND
Correlation between IDDQ & fMAX
10 102
IDDQ (Vin=0.0) Vstress = 1.7V @150°°C

% IDDQ (Vin=1.7) degradation

% IDDQ (Vin=0.0) degradation

c2670
IDDQ (Vin=1.7) 8 c5315 Linear relationship

IDDQ reduction (%)

101 c1908
c499
107
6
n > 1/6 c3540
105
c74181
S2 n ~ 1/6 4 101
103
S1 2

MAX – MIN < 1.2% 10

0 75°°C, 65nm PTM
ISCAS Benchmarks
100 -2 100 0
100 101 102 103 104 105
10 fMAX reduction (%)
Time (s)

• DM < 3ms, Temp=125°C, Vstress=1.7V % IDDQ decrease

• IDDQ degradation n~1/6 during % fMAX increase n~1/6
• Clear signature of NBTI
∆I leak (t ) 1/ 6 ∆f MAX (t )
• Correlation between IDDQ and fMAX can be used Rleak = ∝t ∝ = R freq
to predict circuit performance degradation I leak (0) f MAX (0)
under NBTI
R freq = K × Rleak (K :constant)
IDDQ based Characterization Technique

Design phase
• Circuit-level NBTI Reliability
Initial Characterization Characterization
• Compute Rleak , RfMAX
• Compute KfMAX = Rleak / RfMAX • IDDQ test is used
• Expensive fMAX testing is avoided
(or minimized)
Reliability Extrapolation
• For each IDDQ measurement • Accurate circuit level performance
sample, Rleak is computed degradation can be predicted
•RfMAX = KfMAX X Rleak

Post-silicon phase
• IC specific burn-in to qualify the
•Estimate fMAX degradation target produce
• Efficient way of field monitoring:
Lifetime Projection dynamic local signature of produce
•Project IDDQ using KfMAX usage
• Possible usage in other reliability
sources; HCI
Temp. Dependency

NBTI Characterization Report

Conclusions

Process Variation and Process

Tolerance is becoming important

There is a need to optimize designs

considering power/performance/yield

A 45 NM Resilient Microprocessor Core For Dynamic Variation Tolerance
No ratings yet
A 45 NM Resilient Microprocessor Core For Dynamic Variation Tolerance
15 pages
A New Sensitivity-Driven Process Variation Aware Low Power Self-Restoring SRAM Design
No ratings yet
A New Sensitivity-Driven Process Variation Aware Low Power Self-Restoring SRAM Design
6 pages
Mohanty VLSI Integration 2012jan SRAM
No ratings yet
Mohanty VLSI Integration 2012jan SRAM
30 pages
Microelectronics Reliability: A. Islam, Mohd. Hasan
No ratings yet
Microelectronics Reliability: A. Islam, Mohd. Hasan
6 pages
4x4 SRAM Design in 90nm Technology
No ratings yet
4x4 SRAM Design in 90nm Technology
4 pages
Design and Technology Trends: R. Saleh Dept. of ECE University of British Columbia Res@ece - Ubc.ca
No ratings yet
Design and Technology Trends: R. Saleh Dept. of ECE University of British Columbia Res@ece - Ubc.ca
32 pages
Lec 35
No ratings yet
Lec 35
34 pages
Ijarcce 2022 11906
No ratings yet
Ijarcce 2022 11906
10 pages
VLSI Sol
No ratings yet
VLSI Sol
23 pages
Implementation of High SNM SRAM Cell and Testing in 45 NM CMOS Logic Process 222
No ratings yet
Implementation of High SNM SRAM Cell and Testing in 45 NM CMOS Logic Process 222
5 pages
SRAM 7T With Feedback New Reference
No ratings yet
SRAM 7T With Feedback New Reference
6 pages
VLSI Design Trends for Engineers
No ratings yet
VLSI Design Trends for Engineers
76 pages
To Edit
No ratings yet
To Edit
2 pages
1.signoff Semi Blog
No ratings yet
1.signoff Semi Blog
96 pages
Lavanya Paper
No ratings yet
Lavanya Paper
4 pages
Circuit Theory Apps - 2024 - Praveen - Low Power and Noise Immune 9 T Compute SRAM Cell Design Based On Differential
No ratings yet
Circuit Theory Apps - 2024 - Praveen - Low Power and Noise Immune 9 T Compute SRAM Cell Design Based On Differential
26 pages
PRIYANKA KUMARI (Low Power SRAM)
No ratings yet
PRIYANKA KUMARI (Low Power SRAM)
15 pages
Unit 3 Vlsidesign
No ratings yet
Unit 3 Vlsidesign
29 pages
Aug 08
No ratings yet
Aug 08
61 pages
FinFET-Based SRAM Cell Performance Analysis
No ratings yet
FinFET-Based SRAM Cell Performance Analysis
4 pages
SRAM Cache Performance-Reliability Analysis
No ratings yet
SRAM Cache Performance-Reliability Analysis
14 pages
ECE3040 Lecture 28 MOSFETs Small Signal 2022 r1
No ratings yet
ECE3040 Lecture 28 MOSFETs Small Signal 2022 r1
31 pages
Leakage Power Reduction Techniques
No ratings yet
Leakage Power Reduction Techniques
44 pages
Design of 10T SRAM Cell With Improved Read Perform
No ratings yet
Design of 10T SRAM Cell With Improved Read Perform
23 pages
Layout Lec 02 Var Rel v01
No ratings yet
Layout Lec 02 Var Rel v01
31 pages
Comparative Analysis of 6T 8T and 10T SRAM Using 180nm CMOS Technology For IoT Applications
No ratings yet
Comparative Analysis of 6T 8T and 10T SRAM Using 180nm CMOS Technology For IoT Applications
6 pages
PVTA-Aware Optimization for VLSI Circuits
No ratings yet
PVTA-Aware Optimization for VLSI Circuits
6 pages
Deep Sub-Micron Design Challenges
No ratings yet
Deep Sub-Micron Design Challenges
38 pages
Razor Thesis
No ratings yet
Razor Thesis
13 pages
Electrical Variability Due To Layout Dependent Effects: Analysis, Quantification, and Mitigation On 40 and 28nm SOC Designs
No ratings yet
Electrical Variability Due To Layout Dependent Effects: Analysis, Quantification, and Mitigation On 40 and 28nm SOC Designs
18 pages
FinFET-Based SRAM Design
No ratings yet
FinFET-Based SRAM Design
6 pages
SRAM Circuit Design and Operation: Prepared By: Mr. B. H. Nagpara
No ratings yet
SRAM Circuit Design and Operation: Prepared By: Mr. B. H. Nagpara
49 pages
Lecture 14
100% (1)
Lecture 14
23 pages
IJRAR1944228
No ratings yet
IJRAR1944228
11 pages
Understanding PVT, RC, and OCV Variations
No ratings yet
Understanding PVT, RC, and OCV Variations
5 pages
256kb 8T Subthreshold SRAM Design
No ratings yet
256kb 8T Subthreshold SRAM Design
9 pages
End Sem Group 12
No ratings yet
End Sem Group 12
21 pages
SRAM
No ratings yet
SRAM
7 pages
Dehaene 1 Full Paper PDF
No ratings yet
Dehaene 1 Full Paper PDF
13 pages
Dehaene 1 Full Paper PDF
No ratings yet
Dehaene 1 Full Paper PDF
13 pages
Effect of Temperature & Supply Voltage Variation On Stability of 9T SRAM Cell at 45 NM Technology For Various Process Corners
No ratings yet
Effect of Temperature & Supply Voltage Variation On Stability of 9T SRAM Cell at 45 NM Technology For Various Process Corners
5 pages
Measurement and Analysis of Variability in CMOS Circuits
No ratings yet
Measurement and Analysis of Variability in CMOS Circuits
154 pages
Performance Comparison Between 16nm and 7nm Through Physical Design Implementation
No ratings yet
Performance Comparison Between 16nm and 7nm Through Physical Design Implementation
6 pages
Optimizing 6T-SRAM Cell Design
No ratings yet
Optimizing 6T-SRAM Cell Design
24 pages
Energy Optimization of 6T SRAM Cell Using Low-Volt
No ratings yet
Energy Optimization of 6T SRAM Cell Using Low-Volt
14 pages
Embedded SRAM Stability Testing
No ratings yet
Embedded SRAM Stability Testing
205 pages
Fine-Grained Aging Prediction in VLSI
No ratings yet
Fine-Grained Aging Prediction in VLSI
35 pages
Enhanced 8T SRAM Cell for Stability
No ratings yet
Enhanced 8T SRAM Cell for Stability
58 pages
Low Power VLSI Design: J.Ramesh ECE Department PSG College of Technology
No ratings yet
Low Power VLSI Design: J.Ramesh ECE Department PSG College of Technology
146 pages
IET Circuits Devices Syst - 2023 - Wang
No ratings yet
IET Circuits Devices Syst - 2023 - Wang
14 pages
10T SRAM Using Half-V Precharge and Row-Wise Dynamically Powered Read Port For Low Switching Power and Ultralow RBL Leakage
No ratings yet
10T SRAM Using Half-V Precharge and Row-Wise Dynamically Powered Read Port For Low Switching Power and Ultralow RBL Leakage
11 pages
Variation Tolerant Differential 8T SRAM Cell For Ultralow Power Applications
No ratings yet
Variation Tolerant Differential 8T SRAM Cell For Ultralow Power Applications
10 pages
Chapter 2 Impact of Technology
No ratings yet
Chapter 2 Impact of Technology
30 pages
8x4 SRAM in TSMC 0.25 M Technology: Ntroduction
No ratings yet
8x4 SRAM in TSMC 0.25 M Technology: Ntroduction
6 pages
The Impact of Technology Scaling On Lifetime Reliability
No ratings yet
The Impact of Technology Scaling On Lifetime Reliability
10 pages
A Comparative Analysis of Low Power FINFET SRAM Cells On Different Technology Node With Variable Number of Transistors
No ratings yet
A Comparative Analysis of Low Power FINFET SRAM Cells On Different Technology Node With Variable Number of Transistors
5 pages
Analysis of 6T SRAM Cell in Different Technologies
No ratings yet
Analysis of 6T SRAM Cell in Different Technologies
5 pages
General Digital Design Questions: How Do You Size NMOS and PMOS Transistors To Increase The Threshold Voltage?
No ratings yet
General Digital Design Questions: How Do You Size NMOS and PMOS Transistors To Increase The Threshold Voltage?
22 pages
FortiSOAR Ultra Detailed Installation Explained
No ratings yet
FortiSOAR Ultra Detailed Installation Explained
2 pages
Protocols of Application Layer
No ratings yet
Protocols of Application Layer
7 pages
How To Register Pussy888 Free Credit No Deposit Bonus
No ratings yet
How To Register Pussy888 Free Credit No Deposit Bonus
6 pages
F
No ratings yet
F
96 pages
HP Linux Tools User's Guide: What's in The Package?
No ratings yet
HP Linux Tools User's Guide: What's in The Package?
7 pages
OD M1 Introduction To Data Engineering
No ratings yet
OD M1 Introduction To Data Engineering
69 pages
STUDOCU Summary
No ratings yet
STUDOCU Summary
35 pages
Virtualization Basics & Benefits
No ratings yet
Virtualization Basics & Benefits
7 pages
5-The Triangle Inequality Theorem PDF
No ratings yet
5-The Triangle Inequality Theorem PDF
2 pages
Teams & SharePoint Integration Guide
No ratings yet
Teams & SharePoint Integration Guide
5 pages
Introduction to OOP Concepts
100% (1)
Introduction to OOP Concepts
24 pages
Best Practices - Reducing Compliance Burden
No ratings yet
Best Practices - Reducing Compliance Burden
63 pages
Introducing Apple Intelligence On Iphone - Apple Support
100% (1)
Introducing Apple Intelligence On Iphone - Apple Support
1 page
Stats2 Quiz 2 Pyq
No ratings yet
Stats2 Quiz 2 Pyq
16 pages
Mat1033 9 5
No ratings yet
Mat1033 9 5
15 pages
Fractional To Binary
No ratings yet
Fractional To Binary
2 pages
Hands-on-Lab Windows Firewall With Advanced Security
No ratings yet
Hands-on-Lab Windows Firewall With Advanced Security
22 pages
M.Ali Q - SEO Notes - NFTP
No ratings yet
M.Ali Q - SEO Notes - NFTP
7 pages
DCS Architectures
100% (3)
DCS Architectures
111 pages
Xiaopan OS: Tiny Core Linux Overview
No ratings yet
Xiaopan OS: Tiny Core Linux Overview
1 page
Embedded Coder AUTOSAR
100% (6)
Embedded Coder AUTOSAR
474 pages
GE3151 PYTHON Syllabus
No ratings yet
GE3151 PYTHON Syllabus
2 pages
Construction of Cross Country Pipeline (LPG) From Kochi To Salem Project Kochi (Udhayamperoor Palakad Coimbatore Erode Salem)
No ratings yet
Construction of Cross Country Pipeline (LPG) From Kochi To Salem Project Kochi (Udhayamperoor Palakad Coimbatore Erode Salem)
18 pages
The Ultimate C - C - TS410 - 1809 - SAP Certified Application Associate - Business Process Integration With SAP S4HANA 1809
No ratings yet
The Ultimate C - C - TS410 - 1809 - SAP Certified Application Associate - Business Process Integration With SAP S4HANA 1809
3 pages
Empowerment Technology Reviewer
No ratings yet
Empowerment Technology Reviewer
6 pages
Addictive Software Design Survey
No ratings yet
Addictive Software Design Survey
12 pages
Setting Up Edge Agent For Client Printing and Browser First Display - Kinetic 2024-1
No ratings yet
Setting Up Edge Agent For Client Printing and Browser First Display - Kinetic 2024-1
18 pages
Dong 2020
No ratings yet
Dong 2020
6 pages
IAM & Network Devices Guide
No ratings yet
IAM & Network Devices Guide
88 pages
Creating Malicious PDFs with Metasploit
No ratings yet
Creating Malicious PDFs with Metasploit
28 pages