0% found this document useful (0 votes)
11 views6 pages

Suresh 2014

The document presents a variation aware design of Post-Silicon Tunable (PST) clock buffers to mitigate the impact of process variation on chip performance by redistributing clock skew. It introduces non-linear delay intervals for PST buffers to optimize performance binning yield, reduce area and power overhead, and improve the overall efficiency of high-performance circuits. The proposed technique has shown to enhance performance binning yield by over 4% while achieving a 30% area reduction and a 20% leakage power reduction.

Uploaded by

barren2077
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views6 pages

Suresh 2014

The document presents a variation aware design of Post-Silicon Tunable (PST) clock buffers to mitigate the impact of process variation on chip performance by redistributing clock skew. It introduces non-linear delay intervals for PST buffers to optimize performance binning yield, reduce area and power overhead, and improve the overall efficiency of high-performance circuits. The proposed technique has shown to enhance performance binning yield by over 4% while achieving a 30% area reduction and a 20% leakage power reduction.

Uploaded by

barren2077
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

2014 IEEE Computer Society Annual Symposium on VLSI

Variation Aware Design of


Post-Silicon Tunable Clock Buffer
Vikram B. Suresh and Wayne P. Burleson
Department of Electrical and Computer Engineering, University of Massachusetts, Amherst, USA
{vsuresh, burleson}@[Link]

Abstract—Process variation is a major limiting factor in variable delay element used in the clock path [2-4]. Post-
designing high performance circuits in advanced CMOS fabrication, PST buffers are configured to vary clock path
technologies. Over optimizing data paths to provide adequate latencies and hence the clock skew between clock groups. This
design margin is leading to increased area and power overhead. helps in redistribution of timing slacks between different timing
In this work, we present a variation aware design of Post-Silicon paths to achieve the best possible performance for a chip at a
Tunable (PST) clock buffers to redistribute clock skew and given process corner. It is common practice to bin fabricated
mitigate impact of process variation on chip performance. chips based on maximum operating frequency. Optimal
Conventional PST buffers are designed for linear delay values. configuration of PST buffers increases performance binning
We estimate a set of non-linear delay intervals of PST buffer
yield. Depending on the position of PST buffers in a clock tree,
based on slack variation in each critical path. The configuration
skew may be varied between two flip-flops or two clock
device sizes are mapped to the non-linear delay intervals using a
set of equality conditions and solved using Linear Programming groups. A number of algorithms have been proposed in
(LP). The variation aware device sizing provides many small literature to optimize insertion of PST buffers based on
delay intervals within one standard deviation of slack statistical timing analysis to maximize binning yield and
distribution and fewer large delay intervals to compensate for minimize area/power overhead [5-8]. In [6], Khandelwal et. al
larger slack variations. This helps in optimal tuning of PST also estimate optimum tuning range based on possible
buffers to fix hold time, as well as better skew distribution to statistical variation in the delay of critical timing paths. In [6]
achieve maximum possible performance in a chip. The proposed and [9], gate sizing in data path is also optimized along with
PST buffer design technique was implemented for ISCAS’89 clock buffer insertion to reduce over compensation in shorter
benchmark circuits. Non-linear delay PST buffers improve timing paths. A design will have multiple PST buffers and
performance binning yield by more than 4%. They also provide hence offers a number of possible tuning configurations.
optimum buffer circuits with an area reduction of 30% and Nagaraj et. al, propose a test methodology using critical path
leakage power reduction of 20%. The proposed technique can be tracing to determine the optimal configuration for maximum
implemented by designing dedicated buffers for each critical path operating frequency [10]. Apart from clock tuning during test,
or using a smaller set of pre-designed buffers with non-linear PST buffers are also used to counter ageing related
delay values. performance degradation [11] and on-chip thermal
management [12].
Keywords- Post-Si Tuning, Process Variation, Clock Buffer
A lot of research efforts have been directed towards clock
I. INTRODUCTION tree synthesis, placement of PST buffers and statistical timing
Advanced CMOS technologies have enabled high density based on buffer placement. These techniques rely on analyzing
designs at the cost of complex fabrication process. Variation in and optimizing the data path and clock tree for a given clock
oxide thickness and Random Dopant Fluctuation (RDF) lead to network and PST buffer design. However, design of PST
variation in transistor threshold voltage (Vth). Optical photo- buffers also have a major impact on the performance gain and
lithography process for printing critical dimensions result in binning yield. The range and granularity of tunable delay
variation in transistor channel length and width. These values affect both the binning yield achieved as well as area
variations in fabrication process have increased uncertainty in and power overhead due to PST buffers. In this work, we
yield, reliability and performance. Increasing the chip present a variation aware design of PST buffers to closely
performance with each technology node is a major challenge match buffer delay with distribution of path delay and skew
for high performance processors. Further, micro-processors variations. We demonstrate the advantages of using non-linear
need to operate in various voltage and frequency modes and tunable delay intervals on binning yield and area/power
tolerate variations in operating conditions. This has efficiency.
necessitated large guard bands during the design phase. The
guard bands account for variation in supply voltage and II. BACKGROUND AND MOTIVATION
operating temperature; multiple voltage and frequency modes; Conventional PST buffer designs can be categorized into
clock jitter and device ageing. These design guard bands lead shunt capacitor based PST buffers [1] and current starved
to large area and power overhead. Accounting for process inverter based PST buffers [3], shown in fig. 1. Different
induced variation in path delays and clock skews further add to variants of shunt capacitor based PST buffer may be designed
existing guard band, resulting in over compensation in chips depending on the implementation of the capacitive load.
with less process variation. One of the commonly used However, they do not scale well with technology. Similarly,
technique to reduce guard band against process variation is the configurable current mirrors may be used to vary delay instead
use of Post-Silicon Tunable clock buffers (PST) [1]. A PST is a of current starved inverters [2][4].

978-1-4799-3765-3/14 $31.00 © 2014 IEEE 1


DOI 10.1109/ISVLSI.2014.95
in out 0.12

0.1 μ = 73ps
b0 b1 b2
σ = 4ps

Probability
0.08
b0 b0 b2 b2
out
0.06
in
0.04
b1 b1
0.02
b0 b1 b2
0
55 60 65 70 75 80 85 90 95
Delay (ps)
(a): Shunt Capacitor (b) Current Starved Inverter
Figure 1: Commonly used PST buffer designs Figure 2: Delay distribution of 16-inverter delay chain

The range and precision of delay variation in the PST


buffer depends on the size of configurable PMOS/NMOS
devices. Conventional sizing of these devices assume a linear
variation in configurable delays and size the devices
accordingly [3]. A PST buffer can be used to increase clock
latency to a flip-flop to fix a min path or hold timing violation.
However, this reduces the slack on setup timing of paths
originating at that flop; thus potentially degrading performance.
Similarly, if the PST buffer is used to delay the clock at the end
point flop of a timing path to meet setup time requirements, it
could potentially violate hold timing requirements of paths
ending in that flop. Hence, it is very critical to choose
appropriate granularity of tunable delays in a PST buffer. Fig. 2
Figure 3: Mean and standard deviation of delay variations
shows the distribution of delays of a 16-inverter delay chain
with ∆Vth ~ N(0,10%) and 100,000 Monte-Carlo simulations
in 32nm predictive technology. The delay variation has a depending on the criticality of timing path/s it is driving.
Gaussian distribution with mean 73ps and standard deviation of Further, each PST buffer comes with an overhead in area and
4ps. Assuming the delay chain is a part of data path, delay leakage power. Non-linear delay intervals lead to optimized
values less than the mean are synonymous to chips with timing device sizing; thereby reducing area/power overhead.
paths that could fail hold time requirements due to process
variation. Similarly, the delays larger than mean are The main contributions of this work are:
synonymous to chips with degraded data path delay and hence 1. Estimation of range and interval of tunable delays in a
reduces performance. The Gaussian distribution signifies that a PST buffer using statistical timing.
large number of fabricated chips have performance degradation
close to the expected mean. Having linear tunable delays over 2. Design of optimal PST buffers for estimated delay
compensates in a large set of chips. It can lead to limited chip values using Linear Programing (LP).
performance (when fixing hold timing violations) or decreased 3. Demonstrate impact of non-linear delay PST buffers
yield (when fixing setup timing violations). Hence, we propose on performance binning yield, buffer area and leakage power.
a variation aware design of PST buffer using non-linear tunable
delay values to close match the distribution of expected III. VARIATION AWARE NON-LINEAR TUNABLE
delay/slack variation. We estimate the potential variation in the DELAY ESTIMATION
delay of a timing path or group of timing paths driven by a PST
A. Statistical Variation in Timing Slack
buffer and correspondingly choose the delay values.
The two most critical timing checks performed in digital
Fig. 3 shows the mean and standard deviation of delay logic design are setup time (max) and hold time (min)
values in a delay chain with different inverter depth. This requirements. The setup time slack for a timing path is given
shows that timing paths with different logic depth may have by,
different variance in delay values. Shorter timing paths are
candidates for critical min paths and longer timing paths are  =  −  +   −   … (1)
candidates for critical max paths. Similarly, clock skew
where, Tdata is the sum of clock to Q delay of the launch flop
variation can also be attributed to the number of buffer levels
and delay of data path, Tperiod is the clock period, Tsetup is the
during clock tree synthesis. For optimal performance
setup time requirement of the capture flop and Tskew is the
enhancement and binning yield, it is important to estimate the
difference in clock latency between the capture flop and
range and granularity of tunable delays in each PST buffer
launch flop. Assuming the clock period and setup time

2
requirement to be constant, the two delay parameters varying the PST buffer also reduces slack of the most critical max path
due to process variation are Tdata and Tskew. They both have a starting at flop-1 (P3) and most critical min path ending at flop-
Gaussian distribution and mutually independent in a worst 1 (P4). To fix 3σ hold time violation on path P2 using a PST
case scenario. If, buffer at flop-1,
  = (  ,   )   = ( ,  ), 0 (94) ≥ ;<(>? 0 (92)) … (5)

 =  !"# , !"# $ … . . (2) Clock tree synthesis/optimization step has to satisfy equation 5
by inserting a PST buffer at the start point of P4 or having
%ℎ'*' !"# =   +  adequate hold slack to account for process variation. Any
additional positive slack beyond this provides additional
/
!"# = - /
+  margin for tuning the max path P1 and has negligible impact

on choosing the granularity of tunable delay values. Hence, we
Similarly, slack for hold time check is given by, do not consider timing path P4 in the rest of this analysis.

−( The PST buffer tuning steps are as shown in fig. 5. Since


0 =   0 +  ) … . (3)
fixing hold time violations are more critical for the
where, Thold is the hold time requirement of the capturing flip functionality of a chip, the PST buffer is first configured to
flop. Variation in hold slack due to process variation is, meet timing on min path P2. At the end of hold fixing, the
PST buffer is further tuned if path P1 does not meet the setup
0 =  4567 , 4567  … . . (4)
requirement for maximum design frequency. The tuning
%ℎ'*' 4567 =   −  process ends if the configuration bits b[2:0] = 111 or the best
trade-off between slacks of paths P1 and P3 is achieved. In
/ / practice, this algorithm/flow may be implemented during test
4567 = -  + 
by configuring the PST buffer, sensitizing critical min (P2)
Setup time violations lead to performance degradation. and max timing paths (P1, P3) and checking for hold time
Post-fabrication, if a chip has setup time violations, the clock violations and maximum operating frequency.
frequency can be reduced to gracefully degrade the
performance, but still have a functioning chip. However, hold Start tuning
time violations can potentially lead to functional yield loss. PST buffer
Hence, it is critical to design PST buffers with tunable delay
range large enough to fix any hold violation in the
corresponding clock group. YES
Increment
Slackhold(P2) < 0?
B. PST Buffer Tuning and Estimation of Non-linear Tunable configuration bits

Delay Values
NO Fix HOLD violations
Let us consider the design snippet shown in fig. 5 and
assume a maximum of 3σ variation in delays of all paths across Calculate Slackdiff =
all fabricated chips. The slack on timing paths P1 and P3 [Slacksetup(P1) ~ Slacksetup(P3)]
determine the performance of each chip. Assuming a current
starved inverter based PST buffer (fig. 1b) with 3-bit
configuration and monotonically increasing delays from 000 to
111, the buffer can be used to fix setup violations on path P1 YES Increment
Slacksetup(P1) < 0?
configuration bits
and hold violations on path P2. However, increasing delay on NO

Combinational SET
Q
YES
Q
SET
D Combinational D
Logic Logic

Did Slackdiff Calculate Slackdiff =


CLR Q CLR Q
P4 P3 decrease? [Slacksetup(P1) ~ Slacksetup(P3)]

Flop-1
Combinational D
SET
Q Combinational D
SET
Q NO
Q
SET
D Logic Optimize chip
Logic
performance
Q Q Restore previous
Q
CLR CLR
CLR
P1 P2 configuration bits
P1: Critical max path ending at flop-1
PST Buffer
P2: Critical min path starting at flop-1
P3: Critical max path starting at flop-1
P4: Critical min path ending at flop-1 End tuning
PST buffer

Figure 4: Sample design snippet with critical timing paths Figure 5: Steps for tuning PST buffers

3
Tuning the PST buffer to fix path P2 results in a Table 1: Delay values for different PST buffer configurations
corresponding decrease in slack of path P3, thereby reducing Config Linear ∆delay (ps) Non-linear ∆delay (ps)
binning yield. In a conventional n-bit configuration linear
delay PST buffer, the 3σ setup slack variation is divided into 001 6.42 3.75
2n-1 equal intervals. Since ~67% of chips violating min time 010 12.85 7.50
have a negative slack in the range of 0 to σslack_hold, linear
delay values force large shift in hold slack in these chips. 011 19.28 11.25
Similarly, ~67% of chips with positive slack on path P3 have 100 25.71 15.00
slack in the range of 0 to σslack_setup. With increase in
Slackhold(P2), path P3 in these chips observe a corresponding 101 32.14 22.50
reduction in setup slack and can potentially be decreased to 110 38.57 30.00
less than zero. This results in parametric yield loss. Non-linear
tunable delay values with smaller delay shifts with in one 111 45.00 45.00
standard deviation of hold slack lead to smaller degradation of
Slacksetup(P3) and the performance of the chip. Similarly, when 100 93.37
89.68
tuning the PST buffer to increase Slacksetup(P1), there is a 90
corresponding decrease in Slacksetup(P3). Smaller tunable delay 80

Percentage of chips
values with in one standard deviation of Slacksetup(P1) Linear Delay
70
provides a higher probability of balancing slack between paths 60 Non-linear Delay
P1 and P3 to achieve the smallest possible Worst Negative 50
Slack (WNS) in the chip. Larger tunable delay values can be 40
used to tune slack values smaller than -2σ to cover the entire 30
range.
20
9.73
Since critical max paths and critical min paths in a clock 10 6.26
0.57 0.36 0.001 0.001
group have different mean and variance of slack distribution, 0
we estimate tunable delay values based on the distribution of 3 2.75 2.6 2.4
max paths. Timing paths that have setup time violations have Operating Frequency (GHz)
larger delays and hence larger standard deviation in slack
Figure 7: Performance binning yield
distribution. PST buffer designed to target 3σ variation in
setup slack is guaranteed to cover 3σ variation in hold slack.
To further quantify the benefit of using non-linear delay
Tunable delay values may be estimated using only setup slack
tuning, let us assume the design in fig. 4 operating at 3GHz
distribution or hold slack distribution based on the criticality
with Slackhold(P2)=N(0,5ps) and Slackhold(P1,P3)=N(0,15ps).
of timing paths. Since the setup slack has a Gaussian
The tunable delay values and frequency binning using
distribution, the percentage of failing chips with slack range [0
conventional linear delay and the proposed non-linear delay
to -1σ] ~ 68.2%; with slack range [-1σ to -2σ] ~ 27.2%; range
techniques are shown in table 1 and fig. 7 respectively. Using
[-2σ to -3σ] ~ 4.2%. Accordingly, we also divide the available
non-linear delays increases the number of chips in higher
7 (001-111) configurations into groups of 4, 2 and 1. The
frequency bin from 89.68% to 93.37%; thereby improving
delay values of configurations in a group are equally spaced to
performance binning yield by ~4.11%. Apart from increased
cover one standard deviation of slack variation. Fig. 6 shows a
binning yield, designing PST buffer for each clock group
sample distribution of setup slack and tunable PST buffer
depending on the expected variance of timing slack provides
delay values with linear and non-linear intervals. Slack greater
optimized configurable device sizes. This reduces silicon area
than zero has the default PST buffer configuration of 000 (0).
and decreases leakage power overhead due to PST buffers.

PST Buffer IV. PST BUFFER DESIGN FOR NON-LINEAR


TUNABLE DELAY
configuration
000 Once the range and delay values of each PST buffer is
001 estimated, device sizes for the configurable PMOS/NMOS in
010
the current starved inverter PST have to be calculated. We
011
propose Linear Programming (LP) based technique to estimate
100
the width of devices. The proposed PST buffer design is as
shown in fig. 8. The buffer has three variable PMOS header
101
devices and three variable NMOS footer devices. These are
110
configured using the configuration bits b[2:0]. The number of
111
-3σ -2σ -σ 0 -3σ -2σ -σ 0 configurable PMOS/NMOS devices can be varied for different
tuning range and delay precisions. The output inverter boosts
Figure 6: Linear and non-linear PST buffer delays the transition time of the clock edges. The non-linear tunable

4
b[2]
0.21μ b[1] b[0]

1.44μ
in out

0.9μ

b[2]
0.12μ b[1] b[0]

Figure 8: Proposed PST buffer design Figure 9: Variation of buffer delay for with W var

delay technique and LP based device sizing can be extended to Algorithm: Generate Equations for Configurable Device Width
other PST buffer designs as well. Require: DefaultBufferDelay, DeltaDelay[6:0]
Function: Wvar(delay) = fn(PST buffer delay)
The first step of device sizing is generating a curve or
1. for i = 1 to 7 do
equation for PST buffer delay vs configurable device sizes.
2. b[2:0] Å binary(i)
Fig. 9 shows the delay of PST buffer for various effective
3. (~b[2]*Wpvar2) + (~b[1]*Wpvar1) + (~b[0]*Wpvar0) =
sizes (Wvar) of configurable transistors. The effective size W var Wvar(DefaultBufferDelay + DeltaDelay[i-1])
is the sum of widths of all configurable PMOS/NMOS devices 4. end for
that are ON for a given configuration of b[2:0]. The effective
width is largest at b[2:0]=000 when all devices are ON. The
Figure 10: Algorithm for generating equality conditions
buffer delay at Wvar = 0 is synonymous to the configuration
b[2:0]=111 when all configurable devices are OFF. This data flip-flop clocked by the PST buffer (Path P3 in fig. 4). The
is used to derive an equation for effective width as function of critical case for device sizing of PST buffer are critical end
PST buffer delay, paths whose corresponding start paths have a mean slack close
AB ('C) = D(9 ;EDD'* 'C) … . (6) to zero and large standard deviation. Fig. 12 shows the
percentage of chips that can be tuned for eight critical paths to
To calculate the required configuration devices sizes, the meet the target frequency of 3GHz using linear and non-linear
estimated tunable delay values are mapped to effective widths delays. We also demonstrate a third scenario where a single
Wpvar and Wnvar using equation 6. A set of seven equation for buffer designed for the most critical path (path with least
each ∆PST buffer delay values are obtained starting with the difference between mean end path slack and mean start path
slack). This single non-linear delay PST buffer is used to tune
target default buffer delay, as shown in fig. 10. Linear
all the critical paths. Designing dedicated buffers for each path
Programming is used to estimate the minimum sizes of
provides the best performance improvement. Using a single
Wpvar[2:0] which satisfy the seven equality conditions. In most non-linear delay PST buffer does not provide the same level of
cases, LP does not converge into a feasible solution to satisfy performance, but involves significantly less design effort. This
all equalities. The solution with least deviation from the technique can be extended by pre-designing more than one
equality conditions is chosen as the transistor sizes. This standard non-linear delay PST buffer and choosing the
solution also results in the least deviation from estimated PST appropriate one for each critical path depending on the delay
buffer delay values. To further validate the results, delay statistics.
distributions of timing paths are also modelled along with LP
solutions to estimate the binning yield and choose the most Similar experiments were performed on other ISCAS’89
benchmark circuits. The results are as shown in table 2. The
appropriate device sizes.
relative area of PST buffers are compared using the sum of
V. IMPLEMENTATION AND RESULTS widths of configurable transistors. Using dedicated PST buffers
for each critical path with non-linear delay values improves
The proposed PST buffer design was implemented for a performance binning yield by more than 4% compared to linear
set of ISCAS’89 benchmark circuits. The benchmark circuits delay PST buffers. The parametric yield can be trade-off for
were implemented in IBM 32nm SOI process using Synopsys design time by using a single non-linear delay PST buffer for
tool suite. Statistical timing analysis was performed using all critical paths. This technique increases binning yield by
Synopsys PrimeTime and linear programming in MatLab. ~2.2% over traditional PST buffers. This can be further
HSPICE was used for circuit simulation and power estimation improved by designing a set of non-linear delay PST buffers
of PST buffers. and choosing the appropriate delay distribution for each critical
Fig. 11 shows the μ/σ of critical paths in ISCAS’89 circuit path. Non-linear delay buffers also provide better balance
s1423. The end path slack refers to the critical max path that between hold-time slack. Since the range of both non-linear
needs to be fixed using the PST buffer (Path P1 in fig. 4). The and linear delay buffers cover the entire range of hold slack
start path slack refers to the most critical path starting from the distribution, improvement of ~1.4% in functional yield is seen

5
Table 2: Comparison of binning yield, area and leakage power for ISCAS'89 benchmark circuits
Performance Binning Yield Effective Area (Total Leakage
(Percentage of chips with target width of configurable devices in all PST Power (μW)
ISCAS’89 frequency of 3GHz) buffers, in μm)
Circuits
PST-1 PST-2 PST-3 PST-1 PST-2 PST-3 PST-1 PST-2 PST-3

s1423 92.12% 89.96% 87.50% 50.20 56.89 79.10 1.38 1.40 1.55

s9234 91.46% 89.32% 88.18% 61.70 64.98 94.13 1.71 1.72 1.84

s15850 91.13% 88.94% 87.48% 274.03 310.57 397.48 6.83 7.53 8.04

s35932 89.28% 88.38% 86.53% 256.86 275.12 320.39 5.96 6.91 7.76

s38584 90.12% 89.76% 87.19% 181.81 218.58 271.70 5.03 5.12 6.74
*PST-1: Dedicated non-linear delay PST buffers; PST-2: Single non-linear delay PST buffer; PST-3: Linear delay PST buffer
40 configurable delay values do not account for the non-linear
35 variation in data path delays. The large delay values close to
30 mean slack of timing paths over compensate when fixing
μ/σ of start path

25
Easy to fix by timing violations. This reduces performance binning yield. We
PST buffer tuning model the required configurable delay values for each critical
20
timing path as a function of the slack distribution. The
Difficult to fix by
15
PST buffer tuning estimated non-linear delay values are converted into a set of
10 equality conditions for width of configuration devices and
5 solved using Linear Programming (LP). Experiments on
0
ISCAS’89 circuits show an improvement of more than 4% in
0 0.5 1 1.5 2 2.5 performance binning yield. Non-linear delay values also
μ/σ of end path optimize the PST buffer circuit by improving area by ~30%
and leakage power by ~20%. We also demonstrate a trade-off
Figure 11: Criticality of timing paths for post-Si tuning in parametric yield and design time by using a single non-
linear delay PST buffer for all paths. Future work will include
Percentage of chips meeting

100
modeling PST buffers considering the effect of variation in
other transistor parameters and algorithms for efficient
target frequency

95
placement of the proposed PST buffers.
90
REFERENCES
85 [1] S. Tam, et. al, “Clock generation and distribution for the first IA-64
microprocessor,” IEEE JSSC, vol. 35, no. 11, pp. 1545–1552, 2000.
80 [2] M. Maymandi-Nejad and M. Sachdev, “A monotonic digitally
1 2 3 4 5 6 7 8 controlled delay element,” IEEE JSSC, 2005.
[3] J. Mueller and R. Saleh, “A Tunable Clock Buffer for Intra-die PVT
Dedicated non-linear delay PST buffer Compensation in Single-Edge Clock (SEC) Distribution Networks,” in
Single non-linear delay PST buffer 9th International Symposium on Quality Electronic Design, 2008.
Linear delay PST buffer [4] S. B. Kobenge and H. Yang, “A power efficient digitally programmable
delay element for low power VLSI applications,” in Asia Symposium on
Figure 12: Performance enhancement of critical paths in s1423 circuit Quality Electronic Design, 2009. ASQED 2009, 2009, pp. 83–87.
[5] J.-L. Tsai, L. Zhang, and C. C. Chen, “Statistical timing analysis driven
post-silicon-tunable clock-tree synthesis,” in ICCAD, 2005.
only in scenarios where both end point and start point slacks [6] V. Khandelwal and A. Srivastava, “Variability-Driven Formulation for
are close to zero. The improvement in functional and Simultaneous Gate Sizing and Postsilicon Tunability Allocation,” IEEE
parametric yield due to non-linear buffer delay depends on the TCAD, vol. 27, no. 4, pp. 610–620, 2008.
number of critical end-path/start-path pairs in the design. Non- [7] D. Tadesse, J. Grodstein, and R. I. Bahar, “AutoRex: An automated
post-silicon clock tuning tool,” in International Test Conference, 2009.
linear sizing also optimizes buffer size and leakage power. [8] M. R. Guthaus, G. Wilke, and R. Reis, “Non-uniform clock mesh
Compared to conventional sizing, non-linear delay buffers optimization with linear programming buffer insertion,” in DAC, 2010.
reduce configurable transistor area by more than 30% and [9] C. Zhuo, D. Blaauw, and D. Sylvester, “Variation-aware gate sizing and
leakage power by up to 20%. clustering for post-silicon optimized circuits,” in ISLPED, 2008.
[10] K. Nagaraj and S. Kundu, “An Automatic Post Silicon Clock Tuning
VI. CONCLUSION AND FUTURE WORK System for Improving System Performance based on Tester
Measurements,” in International Test Conference, 2008.
In this work, we present a variation aware device [11] Z. Lak and N. Nicolici, “On Using On-Chip Clock Tuning Elements to
sizing technique for Post-Si Tunable (PST) clock buffers. Address Delay Degradation Due to Circuit Aging,” IEEE TCAD, 2012.
[12] A. Chakraborty, et. al, “Dynamic Thermal Clock Skew Compensation
Conventional PST buffers that are designed for linear Using Tunable Delay Buffers,” IEEE Transaction on VLSI, 2008.

You might also like