11-11-2021
• Reliability, Availability, and Maintainability Unit 4
• Reliability, maintainability, and availability (RAM) are three system
attributes that are of great interest to systems engineers,
logisticians, and users.
• The origins of contemporary reliability engineering can be traced
to World War II. The discipline’s first concerns were electronic
and mechanical components.
• However, current trends point to a dramatic rise in the number of
industrial, military, and consumer products with integrated
computing functions.
• Because of the rapidly increasing integration of computers into
products and systems used by consumers, industry, governments,
and the military, reliability must consider both hardware, and
software.
1
11-11-2021
• Basic Definitions Unit 4
Reliability
Defined as the probability of a system or system element performing
its intended function under stated conditions without failure for a
given period of time (ASQ 2011).
A precise definition must include a detailed description of the
function, the environment, the time scale, and what constitutes a
failure.
2
11-11-2021
• Basic Definitions Unit 4
Maintainability
Defined as the probability that a system or system element can be
repaired in a defined environment within a specified period of time.
Maintainability is a characteristic of design, assembly and installation
that is the probability of restoration to normal operating state of failed
equipment or systems within a specific timeframe, using specified
repair techniques and procedures.
Maintainability is related to reliability because when a product or
system fails, there may be a process to restore the product or system
to operating condition.
3
11-11-2021
• Basic Definitions Unit 4
Availability
Defined as the probability that a repairable system or system
element is operational at a given point in time under a given set of
environmental conditions.
Availability depends on reliability and maintainability.
A failure is the event(s), or inoperable state, in which any item or part
of an item does not, or would not, perform as specified.
The failure mechanism is physical, chemical, electrical, thermal, or
other process that results in failure.
In computerized systems, a software defect or fault can be the cause of
a failure which may have been preceded by an error which was
internal to the item. The failure mode is the way or the consequence
of the mechanism through which an item fails.
The severity of the failure mode is the magnitude of its impact.
4
11-11-2021
• Reliability
Reliable is commonly used in the same sense as dependable.
When a statement is made to the effect that a particular
component is reliable, we mean that the component will behave in
the manner that is expected of it.
Reliability is the probability of a product / device performing its
purpose adequately for the period intended under the given
operating condition.
This definition brings into focus four key elements /factors:
• the reliability of a device is expressed as a probability
• the device is required to give adequate performance
• the duration of adequate performance is specified, and
• the environmental or operating conditions are prescribed
5
11-11-2021
• Reliability expressed as probability
Probability is the ratio of the number of times we can expect an
event to occur to the total number of trials undertaken.
The maximum number of fraction (probability) is one and minimum
is zero .
A reliability factor of ONE would mean that device perform
satisfactorily for the prescribed duration under the given
environmental conditions.
Similarly a reliability of zero would mean that in almost all cases,
the equipment would fail to meet the required performance.
A reliability factor of ONE would not guarantee that every unit of
the device which is used or tested will perform satisfactorily.
6
11-11-2021
• Reliability expressed adequate performance
It describes what is expected of a device or system.
Example:
We may specify that the performance of a safety valve of a high
pressure steam boiler would be adequate if the valve is released
when the pressure reaches a limit of 350 kg /cm2. It is also possible
to state that the valve operates between 350 to 351 kg /cm2.
This means that a particular valve may not be released every time
at the specified pressure of 350 kg /cm2. The valve may operate
once at 350.4, the second time at 350.6, the third time at 350.2 ad
so on.
However the upper limit for performance is 350 kg /cm2 & lower
limit is 350 kg /cm2 .
7
11-11-2021
• Reliability expressed adequate performance
Similarly a 2 ampere fuse is expected to blow when the current
passing through it exceed 2 ampere.
There may be instances of a system giving satisfactory performance
even when one or two of its components may not be functioning.
For example, consider the spark plugs of an 8 cylinder automobile.
If one plug is not functioning properly, the automobile may still
reach its destination in the prescribed time and thus meet the
requirement of satisfactory performance.
Hence a definite criterion must be established to clearly describe
or specify what is considered to be satisfactory or adequate
performance.
8
11-11-2021
• Duration of adequate performance
Duration is vital to the quantitative description of reliability.
Duration or Time could mean hours, years, cycles, mileage, shots,
actuations, trips, etc.
It is whatever is associated with the aging of the product. For
example, saying that the reliability should be 90% would be
incomplete without specifying the time window.
The correct way would be to say that, for example, the reliability
should be 90% at 10,000 cycles.
9
11-11-2021
• Operating conditions
The reliability specification must cover all aspects of the use
environment to which the item will be exposed and which can
influence the probability of failure.
The specification should establish in standard terminology the
“Operating Conditions” under which the item must provide the
required performance.
Operating conditions refer to all known operating / use conditions
under which the specified reliability is to be obtained.
This include temperature, pressure, humidity, light condition,
weather conditions, operating parameters, etc.
10
11-11-2021
• Reliability
The fundamental expectation from a customer’s point of view is for
the product to work as promised.
However, failure happens.
How to anticipate and deal with failure are cornerstones to a
successful reliability and maintainability (R&M) management
program.
11
11-11-2021
• Probability Distributions used in Reliability Analysis
Reliability can be thought of as the probability of the survival of a
component until time t.
Its complement is the probability of failure before or at time t.
If we define a random variable T as the time to failure, then:
R(t) = P (T > t) = 1 − F(t)
where R(t) is the reliability and F(t) is the failure probability.
The failure probability is the cumulative distribution function
(CDF) of a mathematical probability distribution.
Continuous distributions used for this purpose include
exponential, Weibull, log-normal, and generalized gamma.
Discrete distributions such as the Bernoulli, Binomial, and
Poisson are used for calculating the expected number of failures
or for single probabilities of success.
12
11-11-2021
• Probability Distributions used in Reliability Analysis
The same continuous distributions used for reliability can also be
used for maintainability although the interpretation is different
(i.e., probability that a failed component is restored to service prior
to time t).
However, predictions of maintainability may have to account for
processes such as administrative delays, travel time, sparing, and
staffing and can therefore be extremely complex.
The probability distributions used in reliability and maintainability
estimation are referred to as models because they only provide
estimates of the true failure and restoration of the items under
evaluation.
Ideally, the values of the parameters used in these models would be
estimated from life testing or operating experience.
13
11-11-2021
• Design for Reliability
System designs based on user requirements and system design
alternatives can then be formulated and evaluated.
Reliability engineering during this phase seeks to increase system
robustness through measures such as redundancy, diversity, built-in
testing, advanced diagnostics, and modularity to enable rapid
physical replacement.
In addition, it may be possible to reduce failure rates through
measures such as use of higher strength materials, increasing the
quality components, moderating extreme environmental conditions,
or shortened maintenance, inspection, or overhaul intervals.
Design analyses may include mechanical stress, corrosion, and
radiation analyses for mechanical components, thermal analyses for
mechanical and electrical components, and Electromagnetic
Interference (EMI) analyses or measurements for electrical
components and subsystems.
14
11-11-2021
Failure data analysis
Consider a series of tests conducted under certain stipulated
conditions on 1000 electronic components. The total duration of the
tests is 19 hours. The number of components that fail during each
hourly interval is noted. The results obtained are tabulated in Table .
The table lists the total number of failed components at the end of
1 h, 2 h, 3 h,...... The time interval is generally denoted by ∆t, the
number of failures during the interval is represented by f, and the
cumulative failures to the end of the interval by F.
Since the number of components failed during a particular interval
only is noted at the end of the interval (or at the beginning of the
next interval), the values of f are entered between two values of t as
shown in column (2) of Table.
Column (4) gives the number of survivors at the end of each time
interval.
15
11-11-2021
16
11-11-2021
17
11-11-2021
Failure data analysis
The time is reckoned from the start of the test.
During the first hour, 130 components fail, leaving behind 870
survivors.
Between the first and second hours, an additional 83 components
fail, leaving 787 survivors.
The cumulative failures of the total number of components failed
from the beginning of the test till a particular time is given in column
(3).
From the table, we see that the total number of units failed is 130
till the end of the first hour, 213 till the end of the second hour, 288
till the end of the third hour, and so on.
Based on the failure data or survival – test results shown in Table,
we can now define failure density, failure rate, reliability, and
probability of failure.
18
11-11-2021
Failure Density fd
This is the ratio of the number of failures during a given unit interval
of time to the total number of items at the very beginning of the test.
Here, the total number of items at the beginning of the test was 1000.
This is known as the total initial population.
During the first unit interval, the number of components that fail is 130.
Hence, the failure density fd is 130 / 1000 = 0.130.
During the second unit interval, 83 more components fail. Thus, the
value of failure density during the second unit interval is 83 / 1000 =
0.083.
Similarly, during the tenth unit interval, the failure density has a value
of 37 / 1000 = 0.037.
Sometimes, failure density is also called the ratio part – failure rate.
19
11-11-2021
Failure Density fd
20
11-11-2021
Failure Rate Z
21
11-11-2021
Reliability R
This is the ratio of the survivors at any given time to the total initial
population.
The reliability at the end of the first hour will be
R(1) = 870/1000 = 0.870.
At the end of the second hour
R(2) = 787/1000 = 0.787.
Similarly, the reliability factor for the twelfth hour is 0.286 (refer column 7)
This calculation of reliability confirms to original definition. For the series of
performance tests under consideration, we can appropriately modify this
definition to the extent that the device is required to function satisfactorily for
at least the given period.
The reliability factor decreases progressively and all the components fail by
the end of 19th hour, the corresponding reliability will be zero.
The reliability factors so obtained can also be called the probabilities of
survival for the first hour, second hour, third hour,........
22
11-11-2021
Probability of Failure
The concept of the probability of failure is similar to that of the
probability of survival.
This is the ratio of the number of units failed (within certain time) to
the total population.
For example, the probability of failure during the first hour would be
130 / 1000 = 0.130, since 130 units fail during the first hour out of a
total population of 1000.
Similarly, the probability of failure between t = 0 and t = 2 (i.e. The
probability of a component failing within two hours) is 213/1000 =
0.213 since 213 components fail during the first two hours.
The probability of failure between t = 0 and t = 5 is
(130 + 83 + 75 + 68 + 62)/1000 = 418/1000 = 0.418.
23
11-11-2021
Mean Failure Rate h
The data in Table show that the failure rate Z varies with time.
In the first hour, the failure rate is 0.139, in the second hour, it is 0.201,
and so on. It is also possible to calculate the mean failure rate for the
entire test cycle. This is the overall failure rate.
We started with 1000 components and it took 19 hours for all of them
to fail.
Hence, the overall rate at which failure has taken place per component
= (1/19) × (1000/1000) = 1/19.
If Z1 is the failure rate for the first hour, Z2 is the failure rate for the
second hour, and ZT the failure rate for the T – th hour,
The mean failure rate for T hours will be
h = (Z1 + Z2 + ..........+ZT) / T.
If interval is made much smaller than one hour, we get more accurate
value of the mean failure rate.
24
11-11-2021
Mean Time to Failure (MTTF)
Consider the following example involving the life – testing of a new
device.
Example: In the life- testing of ten specimens of a mini- mixer, the time
to failure for each specimen is recorded as given in the following Table.
Calculate the mean failure rate h for T = 900 hours, and the mean time
to failure for all ten specimens.
25
11-11-2021
Mean Time to Failure (MTTF)
The mean failure rate is obtained from the formula
Where h(T) is the mean failure rate for T hours,
N(0) is the total population at T = 0, and
N(T) is the population remaining at time T.
In other words, N(0) − N(T) is the number of specimens failed in T hours
Here, we have
h(900) = (1/900)[10 − 0)/10] = 1/900.
26
11-11-2021
Mean Time to Failure (MTTF)
As indicated by the data, all ten specimens do not fail at the same time.
They have different times to failure. Hence, we can calculate the mean
time to failure for all ten specimens as
MTTF = 1/10 (805 + 810 + 815 + 820 + .....+ 900) = 8380/10 =838 hours
In general, if t1 is the time to failure for the first specimen,
t2 is the time to failure for the second specimen, and
tN is the time to failure for the N-th specimen,
the mean time to failure for N specimens will be
27
11-11-2021
Mean Time to Failure (MTTF)
It is difficult to record the time to failure for each component when the
number of specimens tested is very large.
Instead, we can record the number which fail during intervals of time.
For example, the interval of time for the data given in Table was chosen
and the number of specimens that failed during each hour was
recorded.
It is assumed that all the specimens which failed during a particular
time interval took the same total time to failure.
If n1 is the number of specimens that failed during the first hour,
n2 is the number of specimens that failed during the second hour,
nk is the number of specimens that failed during the k-th hour,
& so on
The mean time to failure for N specimens will be
28
11-11-2021
Mean Time to Failure (MTTF)
If the time interval is ∆t instead of one hour, the mean time to failure
becomes
Where
n1 is the number of specimens that failed during the first interval,
n2 is the number of specimens that failed during the second interval,
and so on.
29
11-11-2021
Mean Time to Failure (MTTF)
Example 2: In the life testing of 100 specimens of a particular device,
the number of failures during each time interval of twenty hours is
shown in the following Table. Estimate the MTTF for these specimens.
30
11-11-2021
Mean Time to Failure (MTTF)
As the number of specimens tested is large, it is tedious to record the
time to failure for each specimen.
Instead, the number of specimens that fail during each 20 – hour
interval were noted.
Therefore, the mean time to failure is
MTTF = 1/100 [25(1020) + 40(1040) + 20(1060) +10(1080) + 5(1100)
= 104600/100 = 1046 hours
Mean Time Between Failures (MTBF)
In many situations, a unit or system can be repaired immediately after
breakdown. In such cases, the mean time between failures refers to
the average time of breakdown until the device is beyond repair.
31
11-11-2021
SYSTEM RELIABILITY
It is very difficult to analyze the system in its entirety.
This is due to the reason that the failure of the system as a whole can
be attributed to the failure of one or more components of the system
not functioning in the stipulated manner.
In practice, the system is broken down to sub-systems and elements
whose individual reliability factors can be estimated or determined.
Depending on the manner in which these sub-systems and elements
are connected to constitute the given system, the combinatorial rules
of probability are applied to obtain the system reliability.
Reliability engineers often need to work with systems having
elements connected in parallel, series, or complex (combination of
series and parallel), and to calculate their reliability.
32
11-11-2021
Series configuration
The simplest combination of units that form a system is a series
combination. This is also one of the most commonly used structures,
and is shown below:
Cause Effect
Here, the system consists of n units which are connected in series.
Let the successful operations of these individual units be represented
by
X1, X2, ......Xn, and their respective probabilities by
P(X1), P(X2), . ......P(Xn).
For the successful operation of the system, it is necessary that all n
units function satisfactorily.
The probability of simultaneous successful operation of all the units
will be P(X1 and X2 and ....................and Xn).
33
11-11-2021
Series configuration
Assuming that these units are not independent of one another, i.e., the
successful operation of unit 1 might affect the successful operation of all
other units, and so on.
For Example:
When heat dissipated by unit 1, which may be a resistor, affects the
performance characteristics of unit 2, 3,.......
According to the multiplication rule, the system reliability is given by
P(S) = P(X1 and X2 and ....................and Xn).
= P(X1) × P(X2/X1) × P(X3/X1 and X2) × .........
× P(Xn/X1 and X2 and........and Xn-1).
In this expression, P(X2/X1) represents the probability of the successful
operation of unit 2 under the condition that unit 1 operates successfully.
Similarly, P(Xn/X1 and X2 and........and Xn-1) represents the probability of
the successful operation of unit n under the condition that all remaining
units 1,2,......, n - 1 are working successfully.
34
11-11-2021
Series configuration
If the successful operation of each unit is independent of the
successful operation of the remaining units, then X1 and X2 and
....................and Xn are independent, and equation becomes
P(S) = = P(X1) × P(X2) × P(X3) × ......... × P(Xn).
Example: An electronic equipment is operated by four dry cells, each
giving 1.5 volts. The cells are connected in series. The probability of
the successful operation of each cell under the given operating
conditions is 0.90. Calculate the reliability of the power system.
Soln.: The batteries are connected in series and it is assumed that the
successful operation of one battery does not affect the operation of
other batteries. Hence the events are independent and
P(S) = 0.9 × 0.9 × 0.9 × 0.9 = 0.656
35
11-11-2021
Series configuration
If the system has n identical units in series, and if each unit has a
reliability factor p, then the system reliability factor assuming that all
units function independently is
P(S) = p × p × p × ........... × p = pn = (1 – q)n.
Where q is the probability of failure of each unit, such that p = 1 – q.
36
11-11-2021
Parallel configuration
Several systems exist in which successful operation depends on the
satisfactory functioning of any one of their n sub-systems or
elements. These are said to be connected in parallel.
There may be another parallel system in which several signal paths
perform the same operation, and the satisfactory performance of any
one of these paths is sufficient to ensure the successful operation of
the system.
Let X1, X2, ...........Xn, represents
the successful operations of the
Units 1, 2, ..........,n., respectively.
Similarly, let bar X1, bar X2, .......
.... Bar Xn, respectively, represent
Their unsuccessful operations .
i.e. Failure of n Units.
37
11-11-2021
Parallel configuration
If P(X1) is the probability of successful operation of unit 1, then P(bar
X1) is the probability of its failure.
Further, P(bar X1) = 1 - P(X1)
For the complete failure of the system, all n units have to fail
simultaneously.
If P(S bar) is the probability of failure of the system, then
Where represents the probability of failure of unit
3 under the condition that units 1 and 2 have failed.
38
11-11-2021
Parallel configuration
If the unit failures are independent of one another, then
39
11-11-2021
40
11-11-2021
41
11-11-2021
The components can have a very low reliability factor of 0.2581 and
still give the system a reliability factor as high as 0.95.
42
11-11-2021
Mixed configuration
43
11-11-2021
Mixed configuration
44
11-11-2021
Mixed configuration
45
11-11-2021
Mixed configuration
46
11-11-2021
Mixed configuration
47
11-11-2021
Mixed configuration
48
11-11-2021
Mixed configuration
49
11-11-2021
Mixed configuration
50
11-11-2021
Mixed configuration
51
11-11-2021
52
11-11-2021
53
11-11-2021
54
11-11-2021
55
11-11-2021
56
11-11-2021
57
11-11-2021
58
11-11-2021
59
11-11-2021
60
11-11-2021
61
11-11-2021
62