0 ratings0% found this document useful (0 votes) 366 views13 pagesFailure Data Analysis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
3 Failure-Data Analysis
3-1 INTRODUCTION
The definition of reliability given in Chapter 2 states that it is the probability
of a device giving satisfactory performance for a specified period under
Specified operating conditions, When a unit or system does not perform
Satisfactorily, it is said to have failed, The pattern of failure can be obtained
from life-test results, i.e., by testing a fairly large number of models until
failure occurs, and observing the failure-rate characteristics as a function
of time. The first step, therefore, is to link reliability with experimental or
field-failure data. These data will also provide a basis for formulating or
Constructing mathematically a failure model for general analysis. In this
chapter, we shall discuss in detail the information that is obtained from
the analysis of failure data.
3-2 FAILURE DATA
Consider a series of tests conducted under certain stipulated conditions on
1000 electronic components. The total duration of the tests is 19 hours.
The number of components that fail during each hourly interval is noted.
The results obtained are tabulated as shown in Table 3-1. This table lists
the total number of failed components at the end of 1 hr, 2 hr, 3 hr,
The time interval is generally denoted by Ar, the number of failures during
the interval is represented by f, and the cumulative failures to the end of
the interval by F.
Since the number of components failed during a particular interval only
is noted at the end of the interval (or at the beginning of the next interval),
the values of f are entered between two values of tas shown in column Q)
first hour, 130 component
the first and second hours, an additional 83 components fail, leaving 787
total number of components failed from the
beginning of the test till a particular time is given in column (3). From the
‘Table, We see that the total number of units failed is 130 till the end of the
first hour, 213 till the end of the second hour, 288 till the end of the third
hour, and so on.
Based on the failure data or survival-
we can now define failure density, failu
of failure.
test results shown in Table 3-1,
re rate, reliability, and probability- > 0l }
pe FATLURE-DATA ANALYSIS . 21
TABLE 3-1 10002610 7
(1) @y @) @ ) © M
Time No. of Cumulative No. of Failure Failure Retiability
failures failures survivors density rate
t f, F Sued. fy Z R
9 +0 = 1900. 1
a au 0.130 0139 an
- k 7 js 0.8
to Bey =e 0.083 ool =
2 ke a3 2)3 <> 787 | 0.787
75 BS 102213, g.975 0.100
SS thle? ae ee 0.712
< a Voo-2t8 9.968 0.100
a 4 ~456 644 0.644
Rar 62 0.062 0.101
5 418 582 0.582
56 Si 0.056 0.101
ial 4 526 0.526
51 ; 0.051 0.101
VA. 525 AB ib 0.475
46 0.046 0.101
8.577 S71 429 0.429
41 0,041 0.100
yf 612 388 . 0.388
37 0.037 0.100
10 649.351 0.351
34 0.034 0.101
Va ee 683 317 0.317
j 0.031 0.103
tae 4 286 0.286
eae : 0.028 0.103
13 742 258 0.258
# 64 Z 0.064 0.283 ©
14-< 806 194 0.194
Beg te es 0.076 0.486
ent 882 118 O18,
62 0.062 0.714.
Ig 6 944 56 0.056
40 0.040 1.110
Ir APF 984 16 0.016
12 0.012 1.200
18 - 996 4 0.004
0.004 2.000
19 tae 1000 0 “Vo
sum = 1.00 mean = 0.37622 RELIABILITY ENGINEERING o& “
Failure Density fy [ fais interval of
This is the ratio of the number of failures arin = ae Test. Fox te
time to the total number of items at she very “peginning ot
example bei fered, the t i al populatio
the test was 1000. This is also known as the 1008 O's se fail is 130.
During the first unit interval, the number, of pains the second U
Hence, the failure density f, is 130/1000 = 0.130. Prailure density during
interval, 83 more components fail. Thus, the value of MMT the tenth
the second unit interval is 83/1000 = 0.083. simian 037, The values
unit interval, the failure density has a value 37/1000 = 1 atace density 1s
of fy are given in column (5) of Table 3-1.
also called the ratio part-failure rate. ty 53 ee
Let F be the Cee ‘of components that fail during ha oat e a
interval, n, be the number that fail during the second unit int +
on, Let N be the total population. Then,
the failure density during the Ist unit interval = fu, = nlN ee
= fa, = IN
Sometimes,
the failure density during the 2nd unit interval aaa
the failure density during the i-th unit interval = fy = n/N
Let J be the last interval after which there are no survivors, Then,
Su, = nd AE we add fs fay Sayr--++ Lay We Bet
Sa, + faz + fay + +--+ Sa, = n/N + nN + n/N +... 4 n/N
= (nj +mtnz+...+n)/N
= NIN = 1. (3-2)
_Hence. the sum of the values entered in column (5) will be one.
i
Failure Rate Z
Laat e:
This is the ratio of the number of failures during’...
Met i ii
to the average population during. that: interme
given in Table 3-1, the failure rate Z dy
interval
: Temg to the data
the first uit interval is
130
Zia pe BO te
= TI eH.
Cant
ay “AY,
= Be = 010,
The average population during any interval
at the beginning and at the end of the i
interval, the failure rate Z(12) ig
is the average of the Populations
i .
interval, During the twelfth unitFAILURE-DATA ANALYSIS 23
: 31 31
212) = ——3!_ . 34 20,103,
oy (317 +2862 ~ 301.5 7
_ Sometimes, the population at the beginning of the interval is taken
instead of the average population. If this procedure is followed, the failure
rate during the first unit interval will be
Z(1) = 130/1000 = 0.130
and that during the twelfth unit interval will be
Z(12) = 31/317 = 0.098.
In our discussion, we shall follow the former procedure, using the average
Population during the unit interval. The failure rates are entered in column
(6) of Table 3-1. The failure rate is also known as the hazard rate.
Sometimes, it is called the instantaneous failure rate. PT ee
Reliability R
This is the ratio of the survivors at any given time to the total initial
population. The reliability at the end of the first hour will be R(1) = 870/
1000 = 0.870. At the end of the second hour, R(2) = 787/1000 = 0.787.
Similarly, at the end of the twelfth hour, R(12) = 0.286. This calculation
of reliability conforms to our original definition. It will be recalled that
reliability is the probability of a device functioning satisfactorily for a
given period under stipulated operating conditions. For the series of
performance tests under consideration, we can appropriately modify this
definition to the extent that the device is required to function satisfactorily
for at least the given period. In Chapter 1, we defined probability as the
ratio of the number of successes to the number of trials. In the present
case, we started with 1000 items or components. At the end of the first
hour, the number of survivors was 870. This means that successful operation
was observed in 870 cases out of 1000. Hence, the reliability (i.e., the
probability of success) for the first hour is 0.870. At the end of the second
hour, the total number of components passing the test is 787. Hence, the
reliability for the second hour is 0.787. This is equivalent to saying that
the probability of the component functioning satisfactorily for at least two
hours is 0.787. Similarly, the reliability factor for the twelfth hour is
0.286.
These reliability factors are entered in column (7) of Table 3-1. As the
test proceeds, more components fail, with the result that the reliability
factor decreases progressively. Since all the components fail by the end of
the nineteenth hour, the corresponding reliability will be zero.
‘The reliability factors so obtained can also be called the pra}
of survival for the first hour, second hour, third hour,... .724 RELIABILITY ENGINEERING
St. it
Probability of Failure ;
The concept of the probability of failure is similar to that of the probability
of survival. This is the ratio of the number of units failed (within a certain
time) to the total population, For example, the probability of failure Cult
the first hour would be 130/1000 = 0.130 since 130 units fail during the
first hour out of a total population of 1000. Similarly, the probability.
failure between ¢ = 0 and 1 = 2 (j.e., the probability of a component failing
within two hours) is 213/100 = 0.213 since 213 components fail during
the first two hours. The probability of failure between ¢ = 0 and t = 5 is
(130 + 83 +75 + 68 + 62)/1000 = 418/1000 = 0.418.
We have seen that “probability of.
ival” is another term for reliability
factor. We can similarly Bes The oom ity factor for the probability
of failure. The sum of the reliability and-unréliability factors will obviously
be equal to one. Survival and failure are, therefore, complementary events.
If the reliability factor between t = 0 and 1 = ¢, is R(t,), the unreliability
factor for the same period will be 1 — R(t
3-3. MEAN FAILURE RATE h
The data in Table 3-1 show that the failure rate Z varies with time. In the
first hour, the failure rate is 0.139, in the second hour, it is 0.101, and so
on. It is also possible to calculate the mean failure rate forthe entire test
cycle. This is the overall failure rate. We started with 1000 components
and it took 19 hours for all of them to fail. Hence, the overall rate at
which failure has taken place per component = (1/19) x (1000/1000) =-1/19.
This, of course, is a very rough parameter. Since Tab!
failure rate for every hour, we can get a much better estimate by taking
the mean of these values. If Z, is the failure for the first hour, Z, the
failure rate for the second hou, and Z, the failure r:
le 3-1 gives the
: ‘ fate for the T-th hour,
the mean failure rate for.T-hours, will be.
hag that art ST. _~ (3-3)
1f the interval is made much smaller than one hour, we ger’ coe
value of the mean failure rate. This SIUM NETS accurate
Aspect will be discussed in Section
3-9.
3-4. MEAN TIME TO FAILURE (MTTF)
Consider the following example involving the life
“testing of a new device.
Example 3-1 In the life-testing of ten s
i Pecimens of a mini-mixer, the
timg, to failure for each specimen is recorded as given ee a
¢ the mean failure rate A for T as given in Table 3-2
Calculate the = 900 hours, and the mean time to
failure for all ten specimens,Fo et a
FAILURE-DATA ANALYSIS 25
TABLE 3-2,
Oe ee ee ee ee ee
Specimen Time to failure Specimen Time to failure
number hours number hours
1 805. 6 832
2 810 7 842
3 815 8 856
4 820 9 875
5 825 10 900
or a
In this example, since the number of samples tested is small, it is
possible to note the time to failure of each sample. When the number of
samples is large, we record the number of specimens failed in each interval
of time as shown in Table 3-1. The’ mean failure rate is obtained from the
formula gee gee TT ee
iMo-mnr \)
TNO)
nT) (3-4)
/
where A(T) isthe mean failure rate for T hours, N(0) is the total population
at T = 0, and M7) is the population remaining at time T. In other words,
N(O) - (7) is the number of specimens failed in T hours. In the present
case; we have
h(900) = (1/900)[(10 - 0)/10] = 1/900.
As indicated by the data, all ten specimens do not fail at the same time,
They have different times to? failure. Hence, we can calculate the- mean
time to failure for all ten specimens as
MTTF = 71(805 + 810 + 815 + 820 +... + 900)
= 8380/10 = 838 hours. Be
“In generaly if-4-is the time to failure for the first specimen, f; the time
to failure for the second speciffien, and ty the time to failure for the N-th
specimen, the mean time to failure for N specimens will be
an time to failure for N specimens
MTTF = (1, +) +... + ty/N
= hh. ? (3-5)
As noted earlier, it is difficult to record the time to failure for each
component when the number of specimens tested is very large. Instead,
we can record the number which fail during specific intervals of time. For
example, the interval of time for the, data given in Table 3-1 was chosen2% RELIABILITY raga
?
as one hour, and the number of specimens that failed during’ each hour
was recorded. We assumed that all the specimens which failed dut a
particular time interval took the same total time to failure. For example,
from Table 3-1, we see that 37 specimens failed during the tenth hour.
Although these 37 specimens might have failed at different instants during
that time interval, we assume that on the average all of them took ten
hours to fail. If n, is the number of specimens that failed during the first
hour, my the number that failed in the second hour, and 7, the number ‘tin
failed during the k-th hour, then the mean time to failure for N specimens
e
MTTF = (1, + 2ny + 3ny +... + kiN. (3-6)
If the time interval is At instead of one hour, the mean time to failure
becomes
MTTF = nyAt + Imdt +... + katt... + In,An/N
1
= ¥ knAt, 3-7)
Nee
where nj is the number of specimens that failed during the first interval,
‘ny the number of specimens that failed during the second interval, and so
on. This is‘illustrated in the next example.
Example 3-2 In the life-testing of 100 specimens of a particular device,
the number of failures during each time interval of twenty hours is shown
in Table 3-3. Estimate the MTTF for these specimens.
TABLE 3-3
Time interval Number of failures
hours during the interval
T < 1000 0
ae 1000 < 7 ¢ 020 >)
~ + 4020 < 7 < 1046) a :
1040 < Ts 1060 i
1060 < T< 1080 10
1080 < Ts 1100 i
As the number of specimens tested is large, it is tedious to record the
time to failure for each specimen. Instead, we note the number of specimens
that fail during each 20-hour interval. Therefore, the mean time to failure
from Eq. (3-6) is
”= if Kchange
~ ‘ | NALYSIS 27
a qa teh ~ - AE DATA Al
Ke:
MTTF = + [25(1920) + 41040) + 2011060) + 10(1080) §
100 we ¥
+ 5(1100)]
_ 104,600
~ 100
= 1046 ‘hours.
3-5 MEAN TIME BETWEEN FAILURES (MTBF)
In many situations, a unit or system can be repaired immediately after
breakdown. In such cases, the mean time between failures refers to the
average time of breakdown until the device is beyond repair. This topic
willbe-taken up again in Chapter 9 while discussing the concepts of
availability and maintainability.Gopect Is llustrated by the next example.
Example 3-3 Table 3-4 gives the results of tests conducted under severe
Adverse conditions on 1000 safety valves. Columns (3) and (4) illustrate
¥ how failure density f(¢) and hazard rate Z(t) are calculated when the time
interval is four hours instead of one hour. The cumulative failures and the
number of survivors are not given, but can be calculated easily. tpn, LY
TABLE 3-4 j
{ A
qd) (2) GB) @ <<
Time Number of Failure density Hazard rate
interval failures 0 L
fxd) “bed. et )
6 0 sao. 0 7
00° es 2671(1000 x 4) = 6.0668 267/(867 x 4
4 0-4, 793.267 Se
ae : 59/(1000 x 4) = 0.0150 59/(704 x 4
4-8, 174 59 Teo”
7m =~ —36/(1000 x 4) = 0.0090 36/(656 x 4) = 0.0137
8212 650/36 2's p
r-16 14/24 24/(1000 x 4) = 0.0060 - 24/(626 x 4) = 0.0096
12-
231000 x 4) = 0.0058 . 23/(603 x 4) = 0.0095
16-20 53) 23
11/(1000 x 4) = 0.0028 11/586 x 4) = 0.0047
20-24 4 11
anes
Note that the hazard rate is entered in between the time intervals.
The average population during the first interval is [1000 + (1000 - 267)/2
= 867. Hence, the hazard rate during the first 4-hour interval is e
Z = 267/(867 x 4) = 0.07699 = 0.0770. Ww
The average population during the second 4-hour interval is (733 + 674)/2
= 704. Therefore, the hazard rate is i
Z = 59/(704 x 4) = 0.02095 = 0.0210.
The remaining values are obtained similarly.
0770
02106 - s/2, 6,....
ample 3-5 A hard plastic box designed to house
for its impact strength by dropping it from a fixed
for any damage. A total of 500 boxes were tested
tabulated here:
a multimeter is tested
height and observing
and the results are as
15°17) 20/21 23,25
Number of drops 10-12 13 F
Number of boxes 30. 50 30 110 90 130 17 35
damaged eres
In this example, the criterion is the number of drops that a given box
can withstand without damage and not time. The probability that a box
will withstand at least 9 drops without damage is unity since the first
damage occurred at the end of 10 drops. Also, the probability that a box
can withstand 25 drops or more is zero. Further, the probability that a box
can withstand at least_15 drops is obtained by counting the number of
boxes surviving more than 15 drops and dividing this number by 500.
Thus, the probability is
500 - (30+ 50+30+110) 289
P(d < 15) = = =~ = 0.560.
— 500 200
The failure density for this case will be defined as the ratio of the
number of boxes failing per incremental drop after a given number of
drops, to the initial population. For instance, at the end of the 12th drop,
the number of boxes failing per incremental drop (i.e., for the 13th. drop)
is 30, and the ratio of this number to the initial popyflation of 500 gives
the failure density for the 13th drop. Thus,
£4(13) = 30/500 = 0.060,
However, the tabulation shows that
incremental drop. The number of failu;
aye oe for the 14th drop, i.e., for the next incremental drop after
we ae rop. So, according to the definition, the failure density for the
a . eae ra ohne Population at the end of the 14th drop
* Peet he failure density for the 15th drop would
failures do not occur after every
res for the 13th drop is 30, but no= S ae == FF
FAILURE-DATAANALYSIS 43 4
t
Similarly, for the 16th, 18th, 19th, 22nd, and 24th drops, the failure densities
are Zero as no failures are reported at the end of each of these drops. The
failure density for the 17th drop is
Sa(17) = 90/500 = 0.180.
The failure density values are given in column (3) of Table 3-8.
TABLE 3-8
a
Number of Nuhor vf Pale Hiceed Os
drops failures density | tate meee
d fs fe Zz Rd)
0 0 0.000 0.000 1.000
10 30 0.060 0.060 0.940
12 50 0.100 0.106 0.840 got]
(3 a0 0.060 0.071 0.780
15 110 0.220 ~~ 0.282 0560 grt
17 90 0.180 0.321 0.380
20 130 0.260 0.684 0.120
2 0 0.034 0.283 0.086
23. ‘55 0.070 0.814 0.016
25 8 0.014 1,000 0.000
“ sum = 500
For the present case, the hazard rate Z can be defined as the rate at
which failure occurs per incremental drop after a given number of impacts,
assuming that no failure has occurred prior to that incremental drop. Thus,
the hazard rate for the 13th drop (after the 12th drop) is the ratio of the
number of failures o¢curring for the 13th drop (which is given as 30 in
Table 3-8) to the population at the beginning of the 13th drop (which is
500 - 80 = 420). Hence,
Z,3 = 30/420 = 0.071.
As already noted, the failures do not occur after every incremental drop.
Accordingly,
Ziq = Zig Ziy = Ziy = 2p = Zr = 0.
and since the number of survivors at the end of the 14th drop is (500 —
110 =) 390,
Zys = 110/390 = 0.282.
i)“4 RELIABILITY ENGINEERING of
the number
Similarly, since at the end of the 18th and of ae 19th drop is
failures is zero, the number of survivors at
(500 — 310 =) 190, and hence
Zyq = 130/190 = 0.684.
The values of Z have been entered in column (
‘The values of reliability R(d) are given 1”
‘These ave the values of the probability that the
given number of drops. Thus, the probability tha
minimum of 15 drops is
total number of
initial population
ilures till end of 15 drops 5
RIS) = 1-
= 1-(30 + 50+ 30+ 110)/500 = 0.560.
‘As no failure is observed at the end of the 14th drop,
(14) = R(13) = 1 ~ (0+ 50 + 30)/500 = 0.780,
As Eqs. (3-15) and (3-20) show, the hazard rate or the failure rate is
well suited for continuous functions. In the case of discrete functions (like
if the test results show failures for
the number of impacts, drops, etc.),
‘m incremental loads, impacts, hours,
every incremental step (like unifori
tic), the calculation of failure rate is straightforward. This is also true for
the calculation of failure density. If the test results do not give non-zero
vals for every incremental step sin our example, then the calculations
of failure density and failure rate as per definitions do not yield satisfactory
crnes, However, in such cases, a different approach can be adopted.
‘Thus, instead of tabulating the test results for every incremental dro er
can choose a suitable class interval (like 2 consecutive dr O25
consecutive drops) and tabulate the number of failures for Sener ae
lass intervals. In our example, choosing a class interval of th eae
we can calculate the number of failures, failure density, fai a
reliability for each of these class intervals as shown i ie allure ats gan
case, the failure density will be defined as the rato. abe Sp tata
boxes failing per incremental class of drops after © of the number of
to the inital population. For instance, after th Rgiven class of drops
incremental class of drops i (15, 16, 17) and fon heb 2 13. 14), the
failures is 200. The ratio ofthis to the initial * this class, the number of
500 = 0.400, is the failure density forthe clase Ca naa oes ees 2001
hazard rate "Z can be defined 26 the rate ae pict gt? Similarly, the
incremental class of drops after a gi aa ?
assuming no failure haf oecutied ori toga
Thus, after the class (I2, 13, 14), he meget
390, and the number of failures in the me
17), is 200. Hence, ¢ next ine:
215, 16, 17) = 200/390 = 0.513,
which failure occurs per
7 of drops (or impacts),
cremental class of drops.
of survivors is 500 — 110 =
remental class, i.e., (15, 16,FAILURE-DATA ANALYSIS 45
On the same lines, the probability that the box will survive at least a given
number of class of drops is the reliability R(@). Thus, the probability that a
box will survive till the end of class (15, 16, 17) is
R(IS, 16, 17) = 1 - (30 + 80 + 200)/500 )
= 1 - 310/500 = 0.380.
The values of f,, Z, and R(d) are given in columns (3), (4), and (5),
respectively, of Table 3-9.
TABLE 3-9
qd) (2) GB) (4) (5)
Number of Number of Failure Hazard Reliability
drops failures density tate ‘~
n “fi Zz R@)
9,10, 11 30 . 0.060 0.060 9.940 /
12, 13, 14 80 0.160 0.170
15, 16, 17 * 200 0.400 0.513 0.380
18, 19, 20 130 0.260 0.684 0.120
21, 22, 23 52 0.104 0.867 0.016
24, 25, 26 8 0.016 1.000 0.000
sum = 500
PROBLEMS
3-1 In a printed page, it is observed that the frequency with which
different alphabets and spaces occur varies considerably. Consequently, it
becomes irrational for a small letterpress printer to stock up the same
number of all alphabets since some alphabets appear much more frequently
than the others. Pick a printed page at random and tabulate the frequency
with which each alphabet appears in it. Represent this graphically through
a histogram.
in a survival test conducted on 100 cardboard boxes for their strength
undes-impact loading, the following results were obtained:
= impacts 20°22 24 26 29 32 35 37 40
lumber of boxes 7 10.15 4 15 13 13 8 5
failed
For this case, how will you define failure density, failure rate, and
teliability? Tabulate these quantities and represent them graphically,
33 A series of tests were conducted to determina tha sonassaan= =A