Reliability Basics
Why do we need reliability ?
The answer is simple. If a system has poor reliability,
a mission may not get completed,
a customer may get dissatisfied,
a company may loose reputation
Reliability Definition
The probability that a component or a system will consistently perform its intended or
required function for a given duration continuously, intermittently or on demand under
stated conditions without failure.
If quality is performance to specs at t = 0,
Then reliability is performance to specs at t > 0,
i.e. quality over a period of time
An Item’s intrinsic reliability is defined by the design.
Reliability Definition
This brings to focus the following five aspects of the definition
1. Reliability by definition is a probability.
2. The intended or required function should be clearly specified along with the
performance criteria.
3. The acceptable degradation in performance criteria should be specified quantitatively,
which will be a part of the failure definition.
Reliability Definition
4. The duration or usage needs to be specified.
[Link] operational environment under which the product or component is expected to
operate needs to be specified.
It will include all external and internal conditions (such as temperature, humidity,
radiation, magnetic and electric fields, shock vibration, etc.) either natural or man
made, or self-induced, that influence the form, operational performance, or survival of
an item.
What is a failure ?
Failure is defined as the lack of ability of a component, equipment, sub
system, or system to perform its intended function at the required
performance level.
What is a failure ?
The functions of the engine valve would
indicate the primary failure mode as
incomplete closure of the valve resulting
in leakage around the poppet seat.
This leakage rate, at which the valve is
considered to have failed, will depend on
the application and to what extent
leakage can be tolerated.
Reliability Definition: Example (non repairable)
Probability that a bearing of type ABC will survive 10000
cycles of continuous operation under a load which is 10% of
its dynamic load rating without failure (Failure Criteria: noise
level > 70 dB and/or bearing temperature > 75oC) with the
required lubricant viscosity maintained between 25 and 35
mm2/s, contaminant particle size less than 6 μm and
dissolved moisture content in lubricant less than 300 ppm.
Reliability Requirement: Example (repairable)
Probability that a weapon continuously fires 5000 rounds
when operated in the prescribed manner (as per operator’s
manual) in desert environment with planned stoppages of 30
min every 500 rounds is ≥ 0.95.
Note: The requirement is irrespective of the weapons age.
When the weapon will not be able to deliver this performance
irrespective of overhauls, it needs to be discarded.
Environment definition: Temperature, humidity, dust levels etc.
Reliability Comparison
There are two products of the same type but different brand.
For one product, R(100 hrs) = 0.95
For the other product, R (150 hrs) = 0.90
Which product is better ?
Generic Approach
Decision
Variables
Probability Consequence
Data Parameters
Model Model
Other Other
Sources Variables
Consequence
Generic Approach
Decision
Variables
Reliability Consequence
Data Parameters
Model Model
Other Other
Sources Variables
Consequence
A simple consequence model: Downtime cost
NF = expected number of failures during a period T
NPM = expected number of preventive maintenances during a period T
MDT = mean down time per repair
MTTR = mean active time to repair
MTTPM = mean active time to preventive maintenance
MLDTF = mean logistics delay time
MLDTPM = mean logistics delay time
P = profit per hour
Cf = expected cost of executing a repair
Cp = expected cost of executing a preventive maintenance
Down time cost
= (NF) (MTTR + MLDTF) P + NF Cf + (NPM) (MTTPM + MLDTPM) P + NPMCp
Consequence model: Warranty Analytics
Warranty Type Description
FRW-R Free Replacement Warranty - Renewable
PRW-R Pro Rata Warranty - Renewable
FRW-NR Free Replacement Warranty – Non Renewable
PRW-NR Pro Rata Warranty – Non Renewable
FRW-NR:
Total Warranty Cost = TWC = [Link]
Nw = number of failures during warranty period
Cw = warranty cost per failure, Tw = Warranty duration
Consequence model: Warranty Analytics
Warranty Type Description
FRW-R Free Replacement Warranty - Renewable
PRW-R Pro Rata Warranty - Renewable
FRW-NR Free Replacement Warranty – Non Renewable
PRW-NR Pro Rata Warranty – Non Renewable
PRW-NR:
Cw = warranty cost per failure, Tw = Warranty duration
n
Nw T w −∑ T i While Σ T ≤ Tw
i=1
TWC=∑ Cw Ti = ith Time To Failure (TTF)
n=1 Tw
System Reliability
A system can have different configurations
Series Systems
1 2 n
RS = R1 R2 ... Rn
A system can have different configurations
Parallel Systems
RS = 1 - (1 - R1) (1 - R2)... (1 - Rn)
A system can have different configurations
C
RA RB RD
RC
A B D
C
RC
Convert to equivalent series system
RA RB RD
A B C’ D
RC’ = 1 – (1-RC)(1-RC)
Series-Parallel Systems
Example
Consider a multi stage production line with multiple machines at each stage. Find the
reliability of this line if the reliability of each machine for the given period is known.
0.90
0.95 0.95
R=?
0.99 0.99 0.90 0.99
0.95 0.95
0.90
i-out of-m Systems
Consider five identical mining equipment at a site. To satisfy the
demand, the site requires three out of the five equipment to be in
operation. i.e. in a way, this is like a system where 3 out of 5 sub-
systems are required to survive.
If the reliability of each of these equipment is 0.95 for a 10 day
operating period, analyze the reliability of the group of equipment.
How did we get the value 0.95 for reliability will be clear once we learn repairable systems analysis
Binomial distribution
The binomial distribution is the discrete probability distribution of the number of successes in
a sequence of m independent yes/no experiments, each of which yields success with
probability p.
There are nCk different ways of distributing i successes in a sequence of m trials. In general, if
the random variable ‘i’ follows Binomial distribution with parameters m and p, the probability of
getting exactly ‘i’ successes in m trials is given by
m!
P(m, i ) p i (1 p) mi
i !(m i )!
i-out of-m Systems: Generic formulation
Consider a parallel system with ‘m’ components where more than one component is required
to survive for system success.
If ‘p’ is the overall survival probability (reliability), the probability that exactly ‘i’ components
will survive is
m! m i
P(m, i ) p (1 p)
i
i !(m i )!
i-out of-m Systems
What will be the system reliability ?
m
m!
R S =∑ Ri (1−R)(m−i)
i= k i !(m−i)!
This is the probability of i=k or more components surviving
Lets solve it
Consider five identical mining equipment at a site. To satisfy the
demand, the site requires three out of the five equipment to be in
operation. i.e. in a way, this is like a system where 3 out of 5 sub-
systems are required to survive.
If the reliability of each of these equipment is 0.95 for a 10 day
operating period, analyze the reliability of the group of equipment.
Example 2
Consider three systems. To satisfy the mission requirement, two out of the three systems
need to be in operation for the next time interval ‘t’. The target mission reliability is 0.95.
Assume that there are 3 components (in series) in each system with reliabilities given below.
System 1 System 2 System 3
Comp R(t) Comp R(t) Comp R(t)
A1 0.90 A2 0.70 A3 0.70
B1 0.80 B2 0.75 B3 0.90
C1 0.75 C2 0.90 C3 0.75
Is this achievable with the current state of the components (Reliability values given in the
above tables)?
Optimization: Homework problem
If no, which components need to be replaced?
Comp R(t) after Replacement Cost (Rs.)
A 0.99 20000
B 0.99 10000
C 0.99 15000
Formulate an optimization problem for cost minimization and solve using Excel solver.
Work out a heuristic and compare its performance with the solution obtained from the
solver.
What next?
These calculations give you an idea as to how system reliability can be determined.
But we did not learn how these reliability values are calculated
Apart from this, real life systems may need more complex reliability models.
So, let’s understand reliability models in more detail.
Non Parametric approach to Reliability
Let ‘N’ be the number of products in the field.
If ‘n’ is the number of products failing up to time ‘t’, the reliability of the product is
R (t) = (N-n) / N at time ‘t’.
Non Parametric approach to Reliability
21 components were tested. The time to failure data till 6th failure is given below.
Calculate the reliability at the end of each failure time.
failure Time to
number failure, hr
1 125
2 1260
3 2080
4 2825
5 3550
6 4670
Non Parametric approach to Reliability
21 components were tested. The time to failure data till 6th failure is given below.
Calculate the reliability at the end of each failure time.
failure Time to 1.00
Reliability 0.95
number failure, hr 0.90
Reliability
0.85
1 125 0.95
0.80
2 1260 0.90 0.75
3 2080 0.86 0.70
0.65
4 2825 0.81 0.60
5 3550 0.76 0 1000 2000 3000 4000 5000
Time to failure, hr
6 4670 0.71
Non Parametric approach to Reliability
Grouped data Analysis
Time
No. of
interval in
failures
months
0 to 10
10 to 20
20 to 30
30 to 40
40 to 50
50 to 60
over 60
Non Parametric approach to Reliability
Grouped data Analysis
Total No. of units in field = 900
Time Reliability ath
No. of No. of
interval in the end of
failures survivals
months interval
0 to 10 300
10 to 20 200
20 to 30 140
30 to 40 90
40 to 50 60
50 to 60 38
over 60 72
Non Parametric approach to Reliability
Grouped data Analysis
Total No. of units in field = 900
Time Reliability ath
No. of No. of
interval in the end of
failures survivals
months interval
0 to 10 300 600 600/900=0.67
10 to 20 200 400 400/900=0.44
20 to 30 140 260 260/900=0.29
30 to 40 90 170 170/900=0.19
40 to 50 60 110 110/900=0.12
50 to 60 38 72 72/900=0.08
over 60 72 0
We are not using this information.
However, in parametric data analysis,
sensored data is taken in to account.
Non Parametric approach to Reliability
Two hundred identical units of a product are tested for 50 hrs.
One unit fails before completing 12 hrs of operation.
Another two units fail before completing 20 hrs of operation.
Two more units fail before completing 50 hrs of operation.
What is the reliability estimate of the product at the end of 50 hrs ?
What is the reliability estimate at the end of each failure period ?
What is the reliability estimate for each failure period ?
Non Parametric approach to Reliability
Answer 1:
R(50 hrs) = (200 – 5) / 200 = 0.975
Answer 2:
R(0 to 12 hr) = (200 – 1) / 200 = 0.995
R(0 to 20 hr) = (200 – 3) / 200 = 0.985
R(0 to 50 hr) = (200 – 5) / 200 = 0.975
Conditional
Answer 3: Reliability
R(0 to 12 hr) = (200 – 1) / 200 = 0.995
R(12 to 20 hr) = (199 – 2) / 199 = 0.9899
Revisit the
R(20 to 50 hr) = (197 – 2) / 197 = 0.9898 production
line problem
Example
If the machines are not be new, the reliability figures are actually to be determined from
the conditional reliability values of the components inside the machines.
R(t|T7) = 0.90
R(t|T2) = 0.95 R(t|T3) = 0.95
R=?
R(t|T1) = 0.99 R(t|T6) = 0.99 R(t|T8) = 0.90 R(t|T10) = 0.99
R(t|T4) = 0.95 R(t|T5) = 0.95
R(t|T9) = 0.90
Non Parametric approach for calculating hazard rates
Hazard rate can be defined as probability of failure per unit operational time given that the
unit has been working till that point of time.
Hazard rate = NF / NB ( ΔT)
NF = No. of units failed during the time period
NB = No. of units at the beginning of time period
Δ T = Time period
Non Parametric approach for calculating hazard rates
Hazard rate calculations
Total No. of units in field = 900
Time
No. of No. of Hazard
interval in
failures survivals rate
months
0 to 10 300 600
10 to 20 200 400
20 to 30 140 260
30 to 40 90 170
40 to 50 60 110
50 to 60 38 72
over 60 72 0
Non Parametric approach for calculating hazard rates
Hazard rate calculations
Total No. of units in field = 900
Time
No. of No. of Hazard
interval in
failures survivals rate
months
0 to 10 300 600 0.0333
Hazard rate
10 to 20 200 400 0.0333
20 to 30 140 260 0.0350
30 to 40 90 170 0.0346
40 to 50 60 110 0.0353
50 to 60 38 72 0.0345
over 60 72 0 -
Non Parametric approach for calculating hazard rates
Hazard rate calculations
Total No. of units in field = 900
Time
No. of No. of Hazard
interval in
failures survivals rate
months
0 to 10 40 860
10 to 20 210 650
20 to 30 300 350
30 to 40 250 100
40 to 50 80 20
50 to 60 20 0
Non Parametric approach for calculating hazard rates
Hazard rate calculations
Total No. of units in field = 900
Time
No. of No. of Hazard
interval in
failures survivals rate
months
0 to 10 40 860 0.0044
10 to 20 210 650 0.0244
20 to 30 300 350 0.0462
30 to 40 250 100 0.0714
40 to 50 80 20 0.0800
50 to 60 20 0 0.1000
What do these trends indicate ?
QA
Reliability Optimization: Design Stage
Component A B C
Alternative 1 RA1 RB1 RC1
Reliability
Alternative 2 RA2 RB2 RC2
Alternative 3 RA3 RB3 RC3
Component A B C
Alternative 1 CA1 CB1 CC1
Cost
Alternative 2 CA2 CB2 CC2
Alternative 3 CA3 CB3 CC3
Formulate an optimization problem